Experimental Design and Statistical Parametric Mapping Ch3
Experimental Design and Statistical Parametric Mapping Ch3
Contents
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2.1 A Maximum A Posteriori Solution . . . . . . . . . . . . . . . . . . . 4
3.2.2 Affine Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.3 Nonlinear Registration . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.4 Linear Regularization for Nonlinear Registration . . . . . . . . . . . . 13
3.2.5 Templates and Intensity Transformations . . . . . . . . . . . . . . . . 16
3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Abstract
This chapter describes the steps involved in registering images of different subjects
into roughly the same co-ordinate system, where the co-ordinate system is defined
by a template image (or series of images). The method only uses up to a few hun-
dred parameters, so can only model global brain shape. It works by estimating the
optimum coefficients for a set of bases, by minimizing the sum of squared differ-
ences between the template and source image, while simultaneously maximizing the
smoothness of the transformation using a maximum a posteriori (MAP) approach.
1
2 CHAPTER 3. SPATIAL NORMALIZATION USING BASIS FUNCTIONS
3.1 Introduction
Sometimes it is desirable to warp images from a number of individuals into roughly the same
standard space to allow signal averaging across subjects. This procedure is known as spatial
normalization. In functional imaging studies, spatial normalization of the images is useful for de-
termining what happens generically over individuals. A further advantage of using spatially nor-
malized images is that activation sites can be reported according to their Euclidean co-ordinates
within a standard space [21]. The most commonly adopted co-ordinate system within the brain
imaging community is that described by [32], although new standards are now emerging that are
based on digital atlases [18, 19, 27].
Methods of registering images can be broadly divided into label based and intensity based.
Label based techniques identify homologous features (labels) in the source and reference images
and find the transformations that best superpose them. The labels can be points, lines or sur-
faces. Homologous features are often identified manually, but this process is time consuming and
subjective. Another disadvantage of using points as landmarks is that there are very few readily
identifiable discrete points in the brain. Lines and surfaces are more readily identified, and in
many instances they can be extracted automatically (or at least semi-automatically). Once they
are identified, the spatial transformation is effected by bringing the homologies together. If the
labels are points, then the required transformations at each of those points is known. Between
the points, the deforming behavior is not known, so it is forced to be as ‘smooth’ as possible.
There are a number of methods for modeling this smoothness. The simplest models include fitting
splines through the points in order to minimize bending energy [5, 4]. More complex forms of
interpolation, such as viscous fluid models, are often used when the labels are surfaces [35, 16].
Intensity (non-label) based approaches identify a spatial transformation that optimizes some
voxel-similarity measure between a source and reference image, where both are treated as unla-
beled continuous processes. The matching criterion is usually based upon minimizing the sum of
squared differences or maximizing the correlation between the images. For this criterion to be
successful, it requires the reference to appear like a warped version of the source image. In other
words, there must be correspondence in the grey levels of the different tissue types between the
source and reference images. In order to warp together images of different modalities, a few inten-
sity based methods have been devised that involve optimizing an information theoretic measure
[31, 33]. Intensity matching methods are usually very susceptible to poor starting estimates, so
more recently a number of hybrid approaches have emerged that combine intensity based methods
with matching user defined features (typically sulci).
A potentially enormous number of parameters are required to describe the nonlinear trans-
formations that warp two images together (i.e., the problem is very high dimensional). However,
much of the spatial variability can be captured using just a few parameters. Some research groups
use only a nine- or twelve-parameter affine transformation to approximately register images of
different subjects, accounting for differences in position, orientation and overall brain dimensions.
Low spatial frequency global variability of head shape can be accommodated by describing defor-
mations by a linear combination of low frequency basis functions. One widely used basis function
registration method is part of the AIR package [36, 37], which uses polynomial basis functions to
model shape variability. For example, a two dimensional third order polynomial basis function
3.2. METHOD 3
Other low-dimensional registration methods may employ a number of other forms of basis function
to parameterize the warps. These include Fourier bases [10], sine and cosine transform basis
functions [9, 3], B-splines [31, 33], and piecewise affine or trilinear basis functions (see [25] for a
review). The small number of parameters will not allow every feature to be matched exactly, but
it will permit the global head shape to be modeled. The rationale for adopting a low dimensional
approach is that it allows rapid modeling of global brain shape.
The deformations required to transform images to the same space are not clearly defined.
Unlike rigid-body transformations, where the constraints are explicit, those for warping are more
arbitrary. Regularization schemes are therefore necessary when attempting image registration
with many parameters, thus ensuring that voxels remain close to their neighbors. Regularization
is often incorporated by some form of Bayesian scheme, using estimators such as the maximum a
posteriori (MAP) estimate or the minimum variance estimate (MVE). Often, the prior probability
distributions used by registration schemes are linear, and include minimizing the membrane energy
of the deformation field [1, 24], the bending energy [5] or the linear-elastic energy [28, 16]. None
of these linear penalties explicitly preserve the topology1 of the warped images, although cost
functions that incorporate this constraint have been devised [17, 2]. A number of methods involve
repeated Gaussian smoothing of the estimated deformation fields [14]. These methods can be
classed among the elastic registration methods because convolving a deformation field is a form
of linear regularization [8].
An alternative, to using a Bayesian scheme incorporating some form of elastic prior, could
be to use a viscous fluid model [11, 12, 8, 34] to estimate the warps. In these models, finite
difference methods are often used to solve the partial differential equations that model one image
as it “flows” to the same shape as the other. The major advantage of these methods is that
they are able to account for large deformations and also ensure that the topology of the warped
image is preserved. Viscous fluid models are almost able to warp any image so that it looks
like any other image, while still preserving the original topology. These methods can be classed
as “plastic” as it is not the deformation fields themselves that are regularized, but rather the
increments to the deformations at each iteration.
3.2 Method
This chapter describes the steps involved in registering images of different subjects into roughly
the same co-ordinate system, where the co-ordinate system is defined by a template image (or
series of images).
1 The word “topology” is used in the same sense as in “Topological Properties of Smooth Anatomical Maps”
[13]. If spatial transformations are not one-to-one and continuous, then the topological properties of different
structures can change.
4 CHAPTER 3. SPATIAL NORMALIZATION USING BASIS FUNCTIONS
where p(q) is the prior probability of parameters q, p(b|q) is the conditional probability that
b is observed given q and p(q|b) is the posterior probability of q, given that measurement b
has been made. The maximum a posteriori (MAP) estimate for parameters q is the mode of
p(q|b). The maximum likelihood (ML) estimate is a special case of the MAP estimate, in which
3.2. METHOD 5
Figure 3.1: This figure illustrates a hypothetical example with one parameter, where the prior
probability distribution is better described than the likelihood. The solid Gaussian curve (a)
represents the prior probability distribution (p.d.f), and the dashed curve (b) represents a maxi-
mum likelihood parameter estimate (from fitting to observed data) with its associated certainty.
The true parameter is known to be drawn from distribution (a), but it can be estimated with
the certainty described by distribution (b). Without the MAP scheme, a more precise estimate
would probably be obtained for the true parameter by taking the most likely a priori value, rather
than the value obtained from a maximum likelihood fit to the data. This would be analogous to
cases where the number of parameters is reduced in a maximum likelihood registration model in
order to achieve a better solution (e.g., see page 7). The dotted line (c) shows the posterior p.d.f
obtained using Bayesian statistics. The maximum value of (c) is the MAP estimate. It combines
previously known information with that from the data to give a more accurate estimate.
6 CHAPTER 3. SPATIAL NORMALIZATION USING BASIS FUNCTIONS
p(q) is uniform over all values of q. For our purposes, p(q) represents a known prior probability
distribution from which the parameters are drawn, p(b|q) is the likelihood of obtaining the data
b given the parameters, and p(q|b) is the function to be maximized. The optimization can be
simplified by assuming that all probability distributions can be approximated by multi-normal
(multidimensional and normal) distributions, and can therefore be described by a mean vector
and a covariance matrix.
A probability is related to its Gibbs form by p(a) ∝ e−H(a) . Therefore the posterior probability
is maximized when its Gibbs form is minimized. This is equivalent to minimizing H(b|q) +
H(q) (the posterior potential). In this expression, H(b|q) (the likelihood potential) is related
to the residual sum of squares. If the parameters are assumed to be drawn from a multi-normal
distribution described by a mean vector q0 and covariance matrix C0 , then H(q) (the prior
potential) is simply given by:
T
H(q) = (q − q0 ) C0 −1 (q − q0 )
Eqn. ?? gives the following maximum likelihood updating rule for the parameter estimation:
−1
qML (n+1) = q(n) − AT A AT b (3.2)
Assuming equal variance for each observation (σ 2 ) and ignoring covariances among them, the
formal covariance matrix of the fit on the assumption of normally distributed errors is given by
−1
σ 2 AT A . When the distributions are normal, the MAP estimate is simply the average of the
prior and likelihood estimates, weighted by the inverses of their respective covariance matrices:
−1 −1
q(n+1) = C0 −1 + AT A/σ 2 C0 q0 + AT A/σ 2 qML (n+1) (3.3)
The MAP optimization scheme is obtained by combining Eqns. 3.2 and 3.3:
−1 −1
q(n+1) = C0 −1 + AT A/σ 2 C0 q0 + AT Aq(n) /σ 2 − AT b/σ 2 (3.4)
For the sake of the registration, it is assumed that the exact form for the prior probability
distribution (N (q0 , C0 )) is known. However, because the registration may need to be done on a
wide range of different image modalities, with differing contrasts and signal to noise ratios, it is
not possible to easily and automatically know what value to use for σ 2 . In practice, σ 2 is assumed
to be the same for all observations, and is estimated from the sum of squared differences from
the current iteration:
XI
σ2 = bi (q)2 /ν
i=1
where ν refers to the degrees of freedom. If the sampling is sparse relative to the smoothness,
then ν ' I − J, where I is the number of sampled locations in the images and J is the number
of estimated parameters 2 .
However, complications arise because the images are smooth, resulting in the observations not
being independent, and a reduction in the effective number of degrees of freedom. The degrees of
freedom are corrected using the principles described by [23] (although this approach is not strictly
correct [38], it gives an estimate that is close enough for these purposes). The effective degrees of
freedom are estimated by assuming that the difference between f and g approximates a continuous,
zero-mean, homogeneous, smoothed Gaussian random field. The approximate parameter of a
2 Strictly speaking, the computation of the degrees of freedom should be more complicated than this, as this
Gaussian point spread function describing the smoothness in direction k (assuming that the axes
of the Gaussian are aligned with the axes of the image co-ordinate system) can be obtained by
[29]: v
u PI
u bi (q)2
wk = t PI i=1
2 i=1 (∇k bi (q))2
p
Multiplying wk by 8loge (2) produces an estimate of the full width at half maximum of the
Gaussian. If the images are sampled on a regular grid where the spacing in each direction is sk ,
the number of effective degrees of freedom 3 becomes approximately:
Y sk
ν = (I − J) 1/2
k
w k (2π)
Almost all between subject co-registration or spatial normalization methods for brain images
begin by determining the optimal nine or twelve parameter affine transformation that registers the
images together. This step is normally performed automatically by minimizing (or maximizing)
some mutual function of the images. The objective of affine registration is to fit the source image
f to a template image g, using a twelve parameter affine transformation. The images may be
scaled quite differently, so an additional intensity scaling parameter is included in the model.
Without constraints and with poor data, simple ML parameter optimization (similar to that
described in Section ??) can produce some extremely unlikely transformations. For example,
when there are only a few slices in the image, it is not possible for the algorithms to determine
an accurate zoom in the out of plane direction. Any estimate of this value is likely to have very
large errors. When a regularized approach is not used, it may be better to assign a fixed value
for this difficult-to-determine parameter, and simply fit for the remaining ones.
By incorporating prior information into the optimization procedure, a smooth transition be-
tween fixed and fitted parameters can be achieved. When the error for a particular fitted param-
eter is known to be large, then that parameter will be based more upon the prior information. In
order to adopt this approach, the prior distribution of the parameters should be known. This can
be derived from the zooms and shears determined by registering a large number of brain images
to the template.
3 Note that this only applies when s < w (2π)1/2 , otherwise ν = I − J. Alternatively, to circumvent this
k k
problem the degrees of freedom can be better estimated by (I − J) k erf(2−3/2 sk /wk ). This gives a similar result
Q
to the approximation by [23] for smooth images, but never allows the computed value to exceed I − J.
8 CHAPTER 3. SPATIAL NORMALIZATION USING BASIS FUNCTIONS
The nonlinear spatial normalization approach described here assumes that the image has al-
ready been approximately registered with the template according to a twelve-parameter affine
registration. This section illustrates how the parameters describing global shape differences (not
accounted for by affine registration) between an image and template can be determined.
The model for defining nonlinear warps uses deformations consisting of a linear combination
of low-frequency periodic basis functions. The spatial transformation from co-ordinates xi , to
co-ordinates yi is:
X
y1i = x1i + u1i = x1i + qj1 dj (xi )
j
X
y2i = x2i + u2i = x2i + qj2 dj (xi )
j
X
y3i = x3i + u3i = x3i + qj3 dj (xi )
j
where qjk is the jth coefficient for dimension k, and dj (x) is the jth basis function at position x.
The choice of basis functions depend upon the distribution of warps likely to be required, and
also upon how translations at borders should behave. If points at the borders over which the
transform is computed are not required to move in any direction, then the basis functions should
consist of the lowest frequencies of the three dimensional discrete sine transform (DST). If there
are to be no constraints at the borders, then a three dimensional discrete cosine transform (DCT)
is more appropriate. Both of these transforms use the same set of basis functions to represent
warps in each of the directions. Alternatively, a mixture of DCT and DST basis functions can
be used to constrain translations at the surfaces of the volume to be parallel to the surface only
(sliding boundary conditions). By using a different combination of DCT and DST basis functions,
the corners of the volume can be fixed and the remaining points on the surface can be free to
move in all directions (bending boundary conditions) [9]. These various boundary conditions are
illustrated in Figure 3.2.
The basis functions used here are the lowest frequency components of the three (or two)
dimensional DCT. In one dimension, the DCT of a function is generated by pre-multiplication
with the matrix DT , where the elements of the I × M matrix D are defined by:
di1 = √1 i = 1..I
q I
2 π(2i−1)(m−1)
dim = I cos 2I i= 1..I, m = 2..M
A set of low frequency two dimensional DCT basis functions are shown in Figure 3.3, and a
schematic example of a two dimensional deformation based upon the DCT is shown in Figure
3.4.
As for affine registration, the optimization involves minimizing the sum of squared differences
between a source (f ) and template image (g). The images may be scaled differently, so an
additional parameter (w) is needed to accommodate this difference. The minimized function is
then:
X
(f (yi ) − wg(xi ))2
i
Figure 3.2: Different boundary conditions. Above left: fixed boundaries (generated purely from
DST basis functions). Above right: sliding boundaries (from a mixture of DCT and DST basis
functions). Below left: bending boundaries (from a different mixture of DCT and DST basis
functions). Below right: free boundary conditions (purely from DCT basis functions).
10 CHAPTER 3. SPATIAL NORMALIZATION USING BASIS FUNCTIONS
Figure 3.3: The lowest frequency basis functions of a two dimensional Discrete Cosine Transform.
3.2. METHOD 11
Figure 3.4: In two dimensions, a deformation field consists of two scalar fields. One for horizontal
deformations, and the other for vertical deformations. The images on the left show deformations
as a linear combination of basis images (see Figure 3.3). The center column shows the same
deformations in a more intuitive sense. The deformation is applied by overlaying it on a source
image, and re-sampling (right).
12 CHAPTER 3. SPATIAL NORMALIZATION USING BASIS FUNCTIONS
α= 0
β= 0
f or j = 1 . . . J
C = d2j,: T d2j,:
E1 = diag(∇1 f :,j )D1
E2 = diag(∇2 f :,j )D1
Figure 3.5: A two dimensional illustration of the fast algorithm for computing AT A (α) and
AT b (β).
The approach involves iteratively computing AT A and AT b. However, because there are
many parameters to optimize, these computations can be very time consuming. There now
follows a description of a very efficient way of computing these matrices.
A Fast Algorithm
A fast algorithm for computing AT A and AT b is shown in Figure 3.5. The remainder of this
section explains the matrix terminology used, and why it is so efficient.
For simplicity, the algorithm is only illustrated in two dimensions. Images f and g are consid-
ered as I × J matrices F and G respectively. Row i of F is denoted by fi,: , and column j by f:,j .
The basis functions used by the algorithm are generated from a separable form from matrices D1
and D2 , with dimensions I × M and J × N respectively. By treating the transform coefficients
as M × N matrices Q1 and Q2 , the deformation fields can be rapidly constructed by computing
D1 Q1 D2 T and D1 Q2 D2 T .
Between each iteration, image F is re-sampled according to the latest parameter estimates.
The derivatives of F are also re-sampled to give ∇1 F and ∇2 F. The ith element of each of these
matrices contain f (yi ), ∂f (yi )/∂y1i and ∂f (yi )/∂y2i respectively.
The notation diag(∇1 f :,j )D1 simply means multiplying each element of row i of D1 by ∇1 f i,j ,
3.2. METHOD 13
and the symbol ‘⊗’ refers to the Kronecker tensor product. If D2 is a matrix of order J × N , and
D1 is a second matrix, then:
d211 D1 . . . d21N D1
D2 ⊗ D1 =
.. .. ..
. . .
d2J1 D1 . . . d2JN D1
The advantage of the algorithm shown in Figure 3.5 is that it utilizes some of the useful
properties of Kronecker tensor products. This is especially important when the algorithm is
implemented in three dimensions. The limiting factor to the algorithm is no longer the time
taken to create the curvature matrix (AT A), but is now the amount of memory required to store
it and the time taken to invert it.
Membrane Energy
The simplest model used for linear regularization is based upon minimizing the membrane energy
of the deformation field u [1, 24]. By summing over i points in three dimensions, the membrane
energy of u is given by:
3 X3 2
XX ∂uji
λ
i j=1
∂xki
k=1
4 Although the cost function associated with these priors is quadratic, the priors are linear in the sense that
they minimize the sum of squares of a linear combination of the model parameters. This is analogous to solving a
set of linear equations by minimizing a quadratic cost function.
14 CHAPTER 3. SPATIAL NORMALIZATION USING BASIS FUNCTIONS
Figure 3.6: The image shown at the top-left is the template image. At the top-right is an
image that has been registered with it using a 12-parameter affine registration. The image at
the bottom-left is the same image registered using the 12-parameter affine registration, followed
by a regularized global nonlinear registration. It should be clear that the shape of the image
approaches that of the template much better after nonlinear registration. At the bottom right is
the image after the same affine transformation and nonlinear registration, but this time without
using any regularization. The mean squared difference between the image and template after
the affine registration was 472.1. After the regularized nonlinear registration this was reduced to
302.7. Without regularization, a mean squared difference of 287.3 is achieved, but this is at the
expense of introducing a lot of unnecessary warping.
3.2. METHOD 15
where λ is simply a scaling constant. The membrane energy can be computed from the coefficients
of the basis functions by q1 T Hq1 + q2 T Hq2 + q3 T Hq3 , where q1 , q2 and q3 refer to vectors
containing the parameters describing translations in the three dimensions. The matrix H is
defined by:
T
H = λ Ḋ3 Ḋ3 ⊗ D2 T D2 ⊗ D1 T D1
T
+ λ D3 T D3 ⊗ Ḋ2 Ḋ2 ⊗ D1 T D1
T
+ λ D3 T D3 ⊗ D2 T D2 ⊗ Ḋ1 Ḋ1
0 0 0 0
H is all zeros, except for the diagonal. Elements on the diagonal represent the reciprocal of the a
priori variance of each parameter. If all the DCT matrices are I × M , then each diagonal element
is given by:
hj+M (k−1+M (l−1)) = λπ 2 I −2 (j − 1)2 + (k − 1)2 + (l − 1)2
over j = 1 . . . M , k = 1 . . . M and l = 1 . . . M .
Bending Energy
Bookstein’s thin plate splines [6, 5] minimize the bending energy of deformations. For a two
dimensional deformation, the bending energy is defined by:
X ∂ 2 u1i 2 ∂ 2 u1i 2 2 2 !
∂ u1i
λ + +2 +
i
∂x21i ∂x22i ∂x1i ∂x2i
X ∂ 2 u2i 2 ∂ 2 u2i 2 2 2 !
∂ u2i
λ + +2
i
∂x21i ∂x22i ∂x1i ∂x2i
λq1 T (D̈2 ⊗ D1 )T (D̈2 ⊗ D1 )q1 + λq1 T (D2 ⊗ D̈1 )T (D2 ⊗ D̈1 )q1 +
2λq1 T (Ḋ2 ⊗ Ḋ1 )T (Ḋ2 ⊗ Ḋ1 )q1 + λq2 T (D̈2 ⊗ D1 )T (D̈2 ⊗ D1 )q2 +
λq2 T (D2 ⊗ D̈1 )T (D2 ⊗ D̈1 )q2 + 2λq2 T (Ḋ2 ⊗ Ḋ1 )T (Ḋ2 ⊗ Ḋ1 )q2
where the notation Ḋ1 and D̈1 refer to the column-wise first and second derivatives of D1 . This
is simplified to q1 T Hq1 + q2 T Hq2 where:
T T T T
H = λ D̈2 D̈2 ⊗ D1 T D1 + D2 T D2 ⊗ D̈1 D̈1 + 2 Ḋ2 Ḋ2 ⊗ Ḋ1 Ḋ1
4 4 2 2 !
π(j − 1) π(k − 1) π(j − 1) π(k − 1)
hj+(k−1)×M = λ + +2
I I I I
over j = 1 . . . M and k = 1 . . . M
Linear-Elastic Energy
where λ and µ are the Lamé elasticity constants. The elastic energy of the deformations can be
computed by:
(µ + λ/2)q1 T (D2 ⊗ Ḋ1 )T (D2 ⊗ Ḋ1 )q1 + (µ + λ/2)q2 T (Ḋ2 ⊗ D1 )T (Ḋ2 ⊗ D1 )q2
+µ/2q1 T (Ḋ2 ⊗ D1 )T (Ḋ2 ⊗ D1 )q1 + µ/2q2 T (D2 ⊗ Ḋ1 )T (D2 ⊗ Ḋ1 )q2
+µ/2q1 T (Ḋ2 ⊗ D1 )T (D2 ⊗ Ḋ1 )q2 + µ/2q2 T (D2 ⊗ Ḋ1 )T (Ḋ2 ⊗ D1 )q1
+λ/2q1 T (D2 ⊗ Ḋ1 )T (Ḋ2 ⊗ D1 )q2 + λ/2q2 T (Ḋ2 ⊗ D1 )T (D2 ⊗ Ḋ1 )q1
A regularization based upon this model requires an inverse covariance matrix that is not a
simple diagonal matrix. This matrix is constructed as follows:
H1 H3 0
C0 −1 = H3 T H2 0
0 0 0
where:
T T
H1 = (µ + λ/2)(D2 T D2 ) ⊗ (Ḋ1 Ḋ1 ) + µ/2(Ḋ2 Ḋ2 ) ⊗ (D1 T D1 )
T T
H2 = (µ + λ/2)(Ḋ2 Ḋ2 ) ⊗ (D1 T D1 ) + µ/2(D2 T D2 ) ⊗ (Ḋ1 Ḋ1 )
T T
H3 = λ/2(D2 T Ḋ2 ) ⊗ (Ḋ1 D1 ) + µ/2(Ḋ2 D2 ) ⊗ (D1 T Ḋ1 )
Sections 3.2.2 and 3.2.3 have modeled a single intensity scaling parameter (q13 and w respectively),
but more generally, the optimization can be assumed to minimize two sets of parameters: those
that describe spatial transformations (qs ), and those for describing intensity transformations (qt ).
This means that the difference function can be expressed in the generic form:
where f is the source image, s() is a vector function describing the spatial transformations based
upon parameters qs and t() is a scalar function describing intensity transformations based on
parameters qt . xi represents the co-ordinates of the ith sampled point.
The previous subsections simply considered matching one image to a scaled version of another,
in order to minimize the sum of squared differences between them. For this case, t(xi , qt ) is simply
3.2. METHOD 17
Figure 3.7: Example template images. Above: T1 weighted MRI, T2 weighted MRI and PD
weighted MRI. Below: Grey matter probability distribution, White matter probability distribu-
tion and CSF probability distribution. All the data were generated at the McConnel Brain Imag-
ing Center, Montréal Neurological Institute at McGill University, and are based on the averages
of about 150 normal brains. The original images were reduced to 2mm resolution and convolved
with an 8mm FWHM Gaussian kernel to be used as templates for spatial normalization.
18 CHAPTER 3. SPATIAL NORMALIZATION USING BASIS FUNCTIONS
Figure 3.8: Two dimensional histograms of template images (intensities shown as log(1+n), where
n is the value in each bin). The histograms were based on the whole volumes of the template
images shown in the top row of Figure 3.7.
equal to qt1 g(xi ), where qt1 is a simple scaling parameter and g is a template image. This is most
effective when there is a linear relation between the image intensities. Typically, the template
images used for spatial normalization will be similar to those shown in the top row of Figure 3.7.
The simplest least squares fitting method is not optimal when there is not a linear relationship
between the images. Examples of nonlinear relationships are illustrated in Figure 3.8, which
shows histograms (scatter-plots) of image intensities plotted against each other.
An important idea is that a given image can be matched not to one reference image, but to
a series of images that all conform to the same space. The idea here is that (ignoring the spatial
differences) any given image can be expressed as a linear combination of a set of reference images.
For example these reference images might include different modalities (e.g., PET, SPECT, 18 F-
DOPA, 18 F-deoxy-glucose, T1 -weighted MRI T∗2 -weighted MRI .. etc.) or different anatomical
tissues (e.g., grey matter, white matter, and CSF segmented from the same T1 -weighted MRI)
or different anatomical regions (e.g., cortical grey matter, sub-cortical grey mater, cerebellum ...
etc.) or finally any combination of the above. Any given image, irrespective of its modality could
be approximated with a function of these images. A simple example using two images would be:
bi (q) = f (s(xi , qs )) − (qt1 g1 (xi ) + qt2 g2 (xi ))
In Figure 3.9, a plane of a T1 weighted MRI is modeled by a linear combination of the five other
template images shown in Figure 3.7. Similar models were used to simulate T2 and PD weighted
MR images. The linearity of the scatter-plots (compared to those in Figure 3.8) shows that MR
images of a wide range of different contrasts can be modeled by a linear combination of a limited
number of template images. Visual inspection shows that the simulated images are very similar
to those shown in Figure 3.7.
Alternatively, the intensities could vary spatially (for example due to inhomogeneities in the
MRI scanner). Linear variations in intensity over the field of view can be accounted for by
optimizing a function of the form:
bi (q) = f (s(xi , qs )) − (qt1 g(xi ) + qt2 x1i g(xi ) + qt3 x2i g(xi ) + qt4 x3i g(xi ))
More complex variations could be included by modulating with other basis functions (such as the
DCT basis function set described in Section 3.2.3) [22]. The examples shown so far have been
linear in their parameters describing intensity transformations. A simple example of an intensity
transformation that is nonlinear would be:
bi (q) = f (s(xi , qs )) − qt1 g(xi )qt2
3.2. METHOD 19
Figure 3.9: Simulated images of T1, T2 and PD weighted images, and histograms of the real
images versus the simulated images.
20 CHAPTER 3. SPATIAL NORMALIZATION USING BASIS FUNCTIONS
[15] suggested that – rather than matching the image itself to the template – some function of
the image should be matched to a template image transformed in the same way. He found that
the use of gradient magnitude transformations lead to more robust solutions, especially in cases
of limited brain coverage or intensity inhomogeneity artifacts (in MR images). Other rotationally
invariant moments also contain useful matching information [30]. The algorithms described here
perform most efficiently with smooth images. Much of the high frequency information in the
images is lost in the smoothing step, but information about important image features may be
retained in separate (smoothed) moment images. Simultaneous registrations using these extracted
features may be a useful technique for preserving information, while still retaining the advantages
of using smooth images in the registration.
Another idea for introducing more accuracy by making use of internal consistency would be to
simultaneously spatially normalize co-registered images to corresponding templates. For example,
by simultaneously matching a PET image to a PET template, at the same time as matching a
structural MR image to a corresponding MR template, more accuracy could be obtained than
by matching the images individually. A similar approach could be devised for simultaneously
matching different tissue types from classified images together [26], although a more powerful
approach is to incorporate tissue classification and registration into the same Bayesian model
[20].
3.3 Discussion
The criteria for ‘good’ spatial transformations can be framed in terms of validity, reliability and
computational efficiency. The validity of a particular transformation device is not easy to define
or measure and indeed varies with the application. For example a rigid body transformation may
be perfectly valid for realignment but not for spatial normalization of an arbitrary brain into a
standard stereotaxic space. Generally the sorts of validity that are important in spatial transfor-
mations can be divided into (i) Face validity, established by demonstrating the transformation
does what it is supposed to and (ii) Construct validity, assessed by comparison with other tech-
niques or constructs. Face validity is a complex issue in functional mapping. At first glance, face
validity might be equated with the co-registration of anatomical homologues in two images. This
would be complete and appropriate if the biological question referred to structural differences or
modes of variation. In other circumstances however this definition of face validity is not appro-
priate. For example, the purpose of spatial normalization (either within or between subjects) in
functional mapping studies is to maximize the sensitivity to neuro-physiological change elicited
by experimental manipulation of sensorimotor or cognitive state. In this case a better definition
of a valid normalization is that which maximizes condition-dependent effects with respect to error
(and if relevant inter-subject) effects. This will probably be effected when functional anatomy is
congruent. This may or may not be the same as registering structural anatomy.
Because the deformations are only defined by a few hundred parameters, the nonlinear regis-
tration method described here does not have the potential precision of some other methods. High
frequency deformations cannot be modeled because the deformations are restricted to the lowest
spatial frequencies of the basis functions. This means that the current approach is unsuitable for
attempting exact matches between fine cortical structures (see Figures 3.10 and 3.11).
The current method is relatively fast, (taking in the order of 30 seconds per iteration –
depending upon the number of basis functions used). The speed is partly a result of the small
number of parameters involved, and the simple optimization algorithm that assumes an almost
quadratic error surface. Because the images are first matched using a simple affine transformation,
there is less ‘work’ for the algorithm to do, and a good registration can be achieved with only a
few iterations (less than 20). The method does not rigorously enforce a one-to-one match between
3.3. DISCUSSION 21
Figure 3.10: Images of six subjects registered using a 12-parameter affine registration (see also
Figure 3.11). The affine registration matches the positions and sizes of the images.
22 CHAPTER 3. SPATIAL NORMALIZATION USING BASIS FUNCTIONS
Figure 3.11: Six subjects brains registered with both affine and basis function registration (see
also Figure 3.10). The basis function registration estimates the global shapes of the brains, but
is not able to account for high spatial frequency warps.
3.3. DISCUSSION 23
the brains being registered. However, by estimating only the lowest frequency deformations and
by using appropriate regularization, this constraint is rarely broken.
The approach in this chapter searches for a MAP estimate of the parameters defining the
warps. However, optimization problems for complex nonlinear models such as those used for
image registration can easily get caught in local minima, so there is no guarantee that the estimate
determined by the algorithm is globally optimum. Even if the best MAP estimate is achieved,
there will be many other potential solutions that have similar probabilities of being correct. A
further complication arises from the fact that there is no one-to-one match between the small
structures (especially gyral and sulcal patterns) of any two brains. This means that it is not
possible to obtain a single objective high frequency match however good an algorithm is for
determining the best MAP estimate. Because of these issues, registration using the minimum
variance estimate (MVE) may be more appropriate. Rather than searching for the single most
probable solution, the MVE is the average of all possible solutions, weighted by their individual
posterior probabilities. Although useful approximations have been devised [28, 9], this estimate is
still difficult to achieve in practice because of the enormous amount of computing power required.
The MVE is probably more appropriate than the MAP estimate for spatial normalization, as it
is (on average) closer to the “true” solution. However, if the errors associated with the parameter
estimates and also the priors are normally distributed, then the MVE and the MAP estimate are
identical. This is partially satisfied by smoothing the images before registering them.
When higher spatial frequency warps are to be fitted, more DCT coefficients are required to
describe the deformations. There are practical problems that occur when more than about the
8 × 8 × 8 lowest frequency DCT components are used. One of these is the problem of storing and
inverting the curvature matrix (AT A). Even with deformations limited to 8 × 8 × 8 coefficients,
there are at least 1537 unknown parameters, requiring a curvature matrix of about 18Mbytes
(using double precision floating point arithmetic). High-dimensional registration methods that
search for more parameters should be used when more precision is required in the deformations.
In practice however, it may be meaningless to even attempt an exact match between brains
beyond a certain resolution. There is not a one-to-one relationship between the cortical structures
of one brain and those of another, so any method that attempts to match brains exactly must
be folding the brain to create sulci and gyri that do not really exist. Even if an exact match is
possible, because the registration problem is not convex, the solutions obtained by high dimen-
sional warping techniques may not be truly optimum. High-dimensional registrations methods
are often very good at registering grey matter with grey matter (for example), but there is no
guarantee that the registered grey matter arises from homologous cortical features.
Also, structure and function are not always tightly linked. Even if structurally equivalent
regions can be brought into exact register, it does not mean that the same is true for regions that
perform the same or similar functions. For inter-subject averaging, an assumption is made that
functionally equivalent regions lie in approximately the same parts of the brain. This leads to
the current rationale for smoothing images from multi-subject functional imaging studies prior
to performing statistical analyses. Constructive interference of the smeared activation signals
then has the effect of producing a signal that is roughly in an average location. In order to
account for substantial fine scale warps in a spatial normalization, it is necessary for some voxels
to increase their volumes considerably, and for others to shrink to an almost negligible size. The
contribution of the shrunken regions to the smoothed images is tiny, and the sensitivity of the
tests for detecting activations in these regions is reduced. This is another argument in favor of
spatially normalizing only on a global scale.
The constrained normalization described here assumes that the template resembles a warped
version of the image. Modifications are required in order to apply the method to diseased or
lesioned brains. One possible approach is to assume different weights for different brain regions
24 CHAPTER 3. SPATIAL NORMALIZATION USING BASIS FUNCTIONS
[7]. Lesioned areas can be assigned lower weights, so that they have much less influence on the
final solution.
The registration scheme described in this chapter is constrained to describe warps with a
few hundred parameters. More powerful and less expensive computers are rapidly evolving,
so algorithms that are currently applicable will become increasingly redundant as it becomes
feasible to attempt more precise registrations. Scanning hardware is also improving, leading
to improvements in the quality and diversity of images that can be obtained. Currently, most
registration algorithms only use the information from a single image from each subject. This is
typically a T1 MR image, which provides limited information that simply delineates grey and
white matter. For example, further information that is not available in the more conventional
sequences could be obtained from diffusion weighted imaging. Knowledge of major white matter
tracts should provide structural information more directly related to connectivity and implicitly
function, possibly leading to improved registration of functionally specialized areas.
Bibliography
[1] Y. Amit, U. Grenander, and M. Piccioni. Structural image restoration through deformable
templates. Journal of the American Statistical Association, 86:376–387, 1991.
[2] J. Ashburner and K. J. Friston. High-dimensional nonlinear image registration. NeuroImage,
7(4):S737, 1998.
[3] J. Ashburner and K. J. Friston. Nonlinear spatial normalization using basis functions. Human
Brain Mapping, 7(4):254–266, 1999.
[4] F. L. Bookstein. Principal warps: Thin-plate splines and the decomposition of deformations.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(6):567–585, 1989.
[5] F. L. Bookstein. Landmark methods for forms without landmarks: Morphometrics of group
differences in outline shape. Medical Image Analysis, 1(3):225–243, 1997.
[6] F. L. Bookstein. Quadratic variation of deformations. In J. Duncan and G. Gindi, editors,
Proc. Information Processing in Medical Imaging, pages 15–28, Berlin, Heidelberg, New
York, 1997. Springer-Verlag.
[7] M. Brett, A. P. Leff, C. Rorden, and J. Ashburner. Spatial normalization of brain images
with focal lesions using cost function masking. NeuroImage, 14(2):486–500, 2001.
[8] M. Bro-Nielsen and C. Gramkow. Fast fluid registration of medical images. Lecture Notes
in Computer Science, 1131:267–276, 1996.
[9] G. E. Christensen. Deformable shape models for anatomy. Doctoral thesis, Washington
University, Sever Institute of Technology, 1994.
[10] G. E. Christensen. Consistent linear elastic transformations for image matching. In A. Kuba
et al., editor, Proc. Information Processing in Medical Imaging, pages 224–237, Berlin, Hei-
delberg, 1999. Springer-Verlag.
[11] G. E. Christensen, R. D. Rabbitt, and M. I. Miller. 3D brain mapping using using a de-
formable neuroanatomy. Physics in Medicine and Biology, 39:609–618, 1994.
[12] G. E. Christensen, R. D. Rabbitt, and M. I. Miller. Deformable templates using large
deformation kinematics. IEEE Transactions on Image Processing, 5:1435–1447, 1996.
BIBLIOGRAPHY 25