0% found this document useful (0 votes)

5 views

A Fast Diffeomorphic Image Registration Algorithm

Uploaded by

Cheery Guo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

A Fast Diffeomorphic Image Registration Algorithm

Uploaded by

Cheery Guo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

www.elsevier.

com/locate/ynimg
NeuroImage 38 (2007) 95 – 113

A fast diffeomorphic image registration algorithm

John Ashburner
Wellcome Trust Centre for Neuroimaging, 12 Queen Square, London, WC1N 3BG, UK
Received 26 October 2006; revised 14 May 2007; accepted 3 July 2007
Available online 18 July 2007

This paper describes DARTEL, which is an algorithm for diffeo- Many registration approaches still use a small deformation
morphic image registration. It is implemented for both 2D and 3D model. These models parameterise a displacement field (u), which
image registration and has been formulated to include an option for is simply added to an identity transform (x).
estimating inverse consistent deformations. Nonlinear registration is
considered as a local optimisation problem, which is solved using a ΦðxÞ ¼ x þ uðxÞ ð1Þ
Levenberg–Marquardt strategy. The necessary matrix solutions are
obtained in reasonable time using a multigrid method. A constant In such parameterisations, the inverse transformation is sometimes
Eulerian velocity framework is used, which allows a rapid scaling and approximated by subtracting the displacement. It is worth noting that
squaring method to be used in the computations. DARTEL has been this is only a very approximate inverse, which fails badly for larger
applied to intersubject registration of 471 whole brain images, and the deformations. As shown in Fig. 1, compositions of these forward
resulting deformations were evaluated in terms of how well they encode and “inverse” deformations do not produce an identity transform.
the shape information necessary to separate male and female subjects Small deformation models do not necessarily enforce a one-to-one
and to predict the ages of the subjects.
mapping, particularly if the model assumes the displacements are
© 2007 Elsevier Inc. All rights reserved.
drawn from a multivariate Gaussian probability density.
The large-deformation or diffeomorphic setting is a much more
elegant framework. A diffeomorphism is a globally one-to-one
(objective) smooth and continuous mapping with derivatives that
Introduction are invertible (i.e. nonzero Jacobian determinant). If the mapping is
not diffeomorphic, then topology1 is not necessarily preserved. A
At its simplest, image registration involves estimating a smooth, key element of a diffeomorphic setting is that it enforces
continuous mapping between the points in one image and those in consistency under compositions of the deformations. A composi-
another. The relative shapes of the images can then be determined tion of two functions is essentially taking one function of the other
from the parameters that encode the mapping. The objective is in order to produce a new function. For two functions, Φ2 and Φ1
usually to determine the single “best” set of values for these this would be denoted by
parameters. There are many ways of modelling such mappings, but
Φ2 BΦ1 Bx ¼ Φ2 ðΦ1 ðxÞÞ ð2Þ
these fit into two broad categories of parameterisation (Miller et al.,
1997). For deformations, the composition operation is achieved by
resampling one deformation field by another.2 If the deformations
• The small-deformation framework does not necessarily preserve are diffeomorphic, then the result of the composition will also be
topology—although if the deformations are relatively small, diffeomorphic. In reality though, deformations are generally
then it may still be preserved. represented discretely with a finite number of parameters, so there
• The large-deformation framework generates deformations (dif- may be some small violations—particularly if the composition is
feomorphisms) that have a number of elegant mathematical done using low degree interpolation methods. Perfect (i.e.
properties, such as enforcing the preservation of topology. infinitely dimensional) diffeomorphisms form a Lie group under
the composition operation, as they satisfy the requirements of
closure, associativity, inverse and identity (see Fig. 2).

1
The word “topology” is used here in the same sense as in “Topological
Properties of Smooth Anatomical Maps” (Christensen et al., 1995).
2
E-mail address: [email protected]. Particular care is needed when dealing with the boundaries—
Available online on ScienceDirect (www.sciencedirect.com). particularly if the boundary conditions are circulant.

1053-8119/$ - see front matter © 2007 Elsevier Inc. All rights reserved.
doi:10.1016/j.neuroimage.2007.07.007
96 J. Ashburner / NeuroImage 38 (2007) 95–113

different time periods over the course of the evolution of the

diffeomorphism. If u(t) is a velocity field at time t, then the
diffeomorphism evolves by

dΦ
¼ uðtÞ ΦðtÞ ð3Þ
dt

Diffeomorphisms are generated by initialising with an identity

transform (Φ(0) = x) and integrating over unit time to obtain Φ(1).
The framework described in this paper involves a single flow
(velocity) field, which remains constant over unit time. It is similar
to the log-Euclidean framework of Arsigny et al. (2006b,a). The
algorithm is called DARTEL, standing for “Diffeomorphic
Anatomical Registration using Exponentiated Lie algebra”.
DARTEL has the advantage, over the small deformation
setting, that the resulting deformations are diffeomorphic, easily
invertible and can be rapidly computed. It does, however, have a
number of disadvantages when compared to variable velocity
models. To further understand these limitations, one needs to
consider a single point in a brain as the deforming image evolves
over unit time. As this point passes different locations of the flow
field, then it will be assigned different velocities. Therefore, each
Fig. 1. Inversion and composition in a small deformation setting. Top-left: a
of the parameters of such a model will relate to a position in the
diffeomorphic deformation field. A displacement field was derived by
background space over which the brain deforms, rather than to
subtracting the identity transform: u(x) = Φ(x) − x. Top-right: an attempt at
obtaining an inverse by subtracting the displacement. Although a forward points within the brain itself. Each voxel in the flow field
transform may be one-to-one, an inverse obtained by subtracting the corresponds to different brain structures at different times during
displacement may not be. Bottom row: compositions of the forward and the propagation of the deforming image. Because there is no
“inverse” transformations. If the inverse was correct, then these would both simple association between a point in the flow field, with a point in
be identity transforms. the brain, this makes the model parameterisation less ideally suited
to computational anatomy studies.
The parameterisation of the variable velocity framework has a
The early diffeomorphic registration approaches were based on more useful physical interpretation, which relates to the velocity of
the greedy “viscous fluid” registration method of Christensen et al.
(1994, 1996). In these models, finite difference methods are used to
solve the differential equations that model one image as it “flows”
to match the shape of the other. At the time, the advantage of these
methods was that they were able to account for large displacements
while ensuring that the topology of the warped image was
preserved. They also provided a useful foundation from which later
methods arose. Viscous fluid methods require the solutions to large
sets of partial differential equations. The earliest implementations
were computationally expensive because solving the equations
used successive over-relaxation. Such relaxation methods are
inefficient when there are large low frequency components to
estimate. Since then, a number of faster ways of solving the
differential equations have been devised. These include the use of
Fourier transforms to convolve with the impulse response of the
linear regularisation operator (Bro-Nielsen and Gramkow, 1996),
or by convolving with a separable approximation (Thirion, 1995).
More recent algorithms for large deformation registration aim to
find the smoothest possible solution. For example, the LDDMM
(large deformation diffeomorphic metric mapping) algorithm (Beg
et al., 2005) does not fix the deformation parameters once they have
been estimated. It continues to update them using a gradient descent
algorithm such that a geodesic distance measure is minimised. In
principle, such models could be parameterised by an initial
“momentum” field (Miller et al., 2006; Vaillant et al., 2004), which Fig. 2. Inversion and composition in a diffeomorphic setting. Top-left: a
fully specifies how the velocities – and hence the deformations – forward deformation. Top-right: the corresponding inverse deformation.
evolve over unit time. Unfortunately though, the differential Both the forward and inverse transforms are one-to-one. Bottom row:
equations involved are difficult to work with, and it is easier to compositions of the forward and inverse transformations produce deforma-
parameterise using a number of velocity fields corresponding to tions that are close to the identity transform.
J. Ashburner / NeuroImage 38 (2007) 95–113 97

each point in the brain at each time during the course of the In the Method section, the basic theory behind the constant velo-
evolution. Registration involves simultaneously minimising a mea- city framework used by DARTEL will be covered. The remainder of
sure of difference between the image and the warped template, this section describes the algorithm that can be used to warp one
while also minimising an “energy” measure of the deformations image to match another. This algorithm involves optimising an
used to warp the template. This energy, often thought of as a objective function that consists of a prior term and a likelihood term.
squared geodesic distance, is obtained by integrating the energy of Optimisation is done using a method that uses the first and second
the velocity fields over unit time. The fixed velocity field used by derivatives of these terms, with respect to the parameterisation of the
DARTEL has to encode the whole trajectory of an evolving deformation. The large number of parameters means that compu-
diffeomorphism. This constraint may force the diffeomorphism to tationally efficient methods are needed for solving the equations, so
take very circuitous and high energy trajectories in order to achieve there is a specific focus on computationally efficient schemes that
good correspondence between images. In fact, some diffeomorphic can handle extremely large, if sparse, matrices. Although the
configurations, which would easily be achieved if velocities could DARTEL model is technically inferior to variable velocity
vary over time, are impossible to reach using DARTEL's constant diffeomorphic models, it does have practical advantages in terms
velocity framework. of the speed of execution.
A further limitation of the DARTEL model can be seen by The Results and discussion section applies the DARTEL regis-
registering an image pair and then registering the same image pair, tration scheme to 471 anatomical MR images. The resulting flow
but after first translating one of the images by a few pixels. fields are used in order to assess the level of internal consistency of
Providing translations are not explicitly penalised, an ideal the method. The same 471 MR images are also brought into register
registration approach should produce deformation energy measures with a small-deformation model, and the parameterisation of the
that are the same in both cases. Unfortunately, this does not happen small-deformation and DARTEL models is compared in terms of
within the fixed velocity DARTEL framework. Similarly, the shape how well the information encoded can be used by pattern recog-
of the deforming template at particular times during the evolution nition procedures. A quantitative comparison of fixed velocity
of the diffeomorphism is not invariant with respect to such an DARTEL registration with variable velocity diffeomorphic regis-
initial translation. tration methods will be left for future work.

Method

The DARTEL model assumes a flow field (u) that remains constant over time. With this model, the differential equation describing the
evolution of a deformation is

dΦ
¼ u ΦðtÞ ð4Þ
dt
Generating a deformation involves starting with an identity transform (Φ(0) = x) and integrating over unit time to obtain Φ(1). The Euler
method is a simple integration approach, which involves computing new solutions after many successive small time-steps (h).

ΦðtþhÞ ¼ ΦðtÞ þ huðΦðtÞ Þ ð5Þ

Each of these Euler steps is equivalent to

ΦðtþhÞ ¼ ðx þ huÞBΦðtÞ ð6Þ

The small deformation setting can be conceptualised as an Euler integration with a single time step. The use of a large number of small
time steps will produce a more accurate solution, such that the trajectory of the points follows a curved path over unit time (Fig. 3). For
example, with eight time steps, the Euler integration method is equivalent to

Φð1=8Þ ¼ x þ uðxÞ=8
Φð2=8Þ ¼ Φð1=8Þ BΦð1=8Þ
Φð3=8Þ ¼ Φð1=8Þ BΦð2=8Þ ð7Þ
v v
Φð8=8Þ ¼ Φð1=8Þ BΦð7=8Þ

If the number of time steps is a power of two, then the solution can be determined by a scaling and squaring approach (Moler and Van Loan,
2003; Arsigny et al., 2006b,a).
Φð1=8Þ ¼ x þ uðxÞ=8
Φð1=4Þ ¼ Φð1=8Þ BΦð1=8Þ
ð8Þ
Φð1=2Þ ¼ Φð1=4Þ BΦð1=4Þ
Φð1Þ ¼ Φð1=2Þ BΦð1=2Þ
In practice, rather more than eight time steps would be used to compute a more accurate solution. In Group theory, the flow field may be
considered as a member of the Lie algebra, which is exponentiated to produce a deformation, which is a member of a Lie group. A useful
98 J. Ashburner / NeuroImage 38 (2007) 95–113

Fig. 3. Points follow a curved trajectory as the differential equation is integrated.

heuristic here is that the Jacobian of a deformation that conforms to an exponentiated flow field is always positive (in the same way that the
exponential of a real number is always positive). This ensures the mapping is diffeomorphic and, implicitly, assures that the forward and
inverse transformations can be generated from the same flow field (Fig. 4):

Φð1Þ ¼ ExpðuÞ ð9Þ

Inverse consistency (Christensen, 1999) is an area of interest within the field of image registration. The extreme case of an inconsistency
between a forward and inverse transformation is when the one-to-one mapping between the images breaks down. This can be avoided by
using a framework that is diffeomorphic. In order to implement inverse consistent algorithms, it is useful to be able to integrate backwards as
well as forwards (see Fig. 5). The inverse of the spatial transformation Φ(− 1) can be achieved by backward integration

Φð1=8Þ ¼ x uðxÞ=8
Φð1=4Þ ¼ Φð1=8Þ BΦð1=8Þ
ð10Þ
Φð1=2Þ ¼ Φð1=4Þ BΦð1=4Þ
Φð1Þ ¼ Φð1=2Þ BΦð1=2Þ

Fig. 4. A scaling and squaring procedure can be used for computing a deformation by exponentiating a flow field (left), as well as an inverse deformation (right).
J. Ashburner / NeuroImage 38 (2007) 95–113 99

Fig. 5. A deformation at different times (top-left), shown next to the logarithms of the corresponding Jacobian determinants (top-right). A one-to-one mapping is
preserved, as illustrated by the Jacobian determinants being greater than zero. The bottom row shows a pair of images transformed with the deformations shown
at the top. Note that f (Φ(0)) (the undeformed version) matches g(Φ(− 1)) and f (Φ(1)) matches g(Φ(0)). In general, g(Φ(t)) matches f (Φ(t+1)).

If Φ(0) = x (the identity transform) and sufficient time steps are used, then the following should hold within this framework.

Φð1Þ BΦð1Þ ¼ Φð1Þ BΦð1Þ ¼ Φð0Þ ð11Þ

The derivatives (Jacobian matrices) of the deformations form a second order tensor field.
0 1
B/1 ðxÞ B/1 ðxÞ B/1 ðxÞ
B Bx Bx2 Bx3 C
B 1 C
B B/2 ðxÞ B/2 ðxÞ B/2 ðxÞ C
JΦ ðxÞ ¼ jΦT Bx ¼ B
B Bx
C ð12Þ
B 1 Bx2 Bx3 C C
@ B/3 ðxÞ B/3 ðxÞ B/3 ðxÞ A
Bx1 Bx2 Bx3
These Jacobian matrices encode the local stretching, shearing and rotating of the deformation field. Useful measures that can be derived from
the matrices are the determinants, which indicate relative volumes before and after spatially transforming. A region of negative determinants
would indicate that the one-to-one mapping has been lost.
100 J. Ashburner / NeuroImage 38 (2007) 95–113

If ΦC is the deformation that results from the composition of two deformations ΦB and ΦA (i.e. ΦC = ΦB ○ ΦA), then the resulting Jacobian
field can be obtained by the matrix multiplication JΦC = (JΦB ○ ΦA) JΦA. This leads to a similar scaling and squaring approach that can be used
for computing the Jacobian matrices of deformations.

Optimisation

Image registration procedures use a mathematical model to explain the data. Such a model will contain a number of unknown parameters
that describe how an image is deformed. A true diffeomorphism has an infinite number of dimensions and is infinitely differential. The
implementation described here, and which is used to generate the examples, is based on a finite dimensional approximation for a fixed lattice.
Bi- or trilinear interpolation is used so that the functions can be treated as continuous, but this renders them differentiable only once. It would
be possible to use a higher-degree interpolation (see e.g. Thévenaz et al., 2000), but linear interpolation was used for speed. This discrete
parameterisation of the velocity field, u(x), can be considered as a linear combination of basis functions
X
uðxÞ ¼ vi ρi ðxÞ ð13Þ
i

where v is a vector of coefficients and ρi (x) is the ith first degree B-spline basis function at position x. The algorithm is implemented so that
functions wrap around at the boundary, so as a point disappears off the right side of field of view, it will appear again on the left. Fixed or sliding
boundary conditions could also have been used, but boundaries that are completely free to move are precluded because the necessary
compositions can not easily be performed.
The aim is to estimate the single “best” set of values for these parameters (v). The objective function, which is the measure of “goodness”,
is formulated as the most probable deformation, given the data (D).

pðDjvÞpðvÞ
pð vjDÞ ¼ ð14Þ
pðDÞ

This posterior probability of the parameters given the image data (p(v|D)) is proportional to the probability of the image data given the
parameters (p(D|v)—the likelihood), times the prior probability of the parameters (p(v)). The probability of the data (p(D)) is a constant.
The objective is to find the most probable parameter values and not the actual probability density, so this factor is ignored. The single
most probable estimate of the parameters is known as the maximum a posteriori (MAP) estimate. There is a monotonic relationship
between a value and its logarithm and, in practice, the objective function is normally the logarithm of the probability (in which case it is
maximised) or the negative logarithm (which is minimised). It can therefore be considered as the sum of two terms: a prior term and a
likelihood term.

log pðv; DÞ ¼ log pðvÞ log pðDjvÞ ð15Þ

EðvÞ ¼ E 1 ðvÞ þ E 2 ðvÞ ð16Þ

Many nonlinear registration approaches search for a maximum a posteriori (MAP) estimate of the parameters defining the warps, which
corresponds to the mode of the probability density. In practice, there are a number of technical difficulties that can preclude a simple Bayesian
interpretation of the problem, as probability densities of continuous functions do not really exist. For this reason, it is more straightforward to
interpret registration as a minimum energy estimation procedure. There are many optimisation algorithms that try to find the mode, but most of
them only perform a local search. It is possible to use relatively simple strategies for fitting models with few parameters, but as the number of
parameters increases, the time required to estimate them will increase dramatically.
The Levenberg–Marquardt (LM) algorithm is a very good general purpose optimisation strategy (see Press et al. (1992) for more
information). The procedure is a local optimisation, so it needs reasonable initial starting estimates. It uses an iterative scheme to update the
parameter estimates in such a way that the objective function is usually improved each time. Each iteration requires the first and second
derivatives of the objective function, with respect to the parameters. In the following scheme, I is an identity matrix and ζ is a scaling factor.
The choice of ζ is a trade-off between speed of convergence and stability. A value of zero for ζ gives the Newton–Raphson or Gauss–Newton
optimisation scheme, which may be unstable if the probability density is not well approximated by a Gaussian. Increasing ζ will slow down
the convergence, but increase the stability of the algorithm.

1
B2 EðvÞ BEðvÞ
vðnþ1Þ ¼ vðnÞ þ fI ð17Þ
Bv2 vðnÞ Bv vðnÞ

The prior term and its derivatives

The prior term reflects the prior probability of a deformation occurring—effectively biasing the deformations to be realistic. The
probability of the parameterisation of a flow field (v) can most easily be approximated by a probability density that is close to a zero-mean
J. Ashburner / NeuroImage 38 (2007) 95–113 101

multivariate Gaussian distribution. In the maximum entropy characterisation of Pennec et al. (2006), the matrix H is known as a
concentration matrix and is analogous to the inverse of a covariance matrix. Z is a normalisation constant.

1 1
pðvÞ ¼ exp vT Hv ð18Þ
Z 2
By taking the negative logarithm of this probability, we obtain

1
E 1 ðvÞ ¼ log pðvÞ ¼ log Z þ vT Hv ð19Þ
2
The first and second derivatives of E 1 (v), with respect to the parameters, are required for the registration. These are

BE 1 B2 E 1
¼ Hv and ¼H ð20Þ
Bv Bv2
In most implementations, the matrix H has a simple numerical form that assumes a similar amount of variability in all spatial locations. In
reality, the best model of anatomical variability is very likely to differ from region to region (Lester et al., 1999), so a matrix that models
nonstationary variability could, in theory, be a more accurate model. If the true variability of the parameters is known (somehow derived from
a large number of subjects), then a suitable model could be determined empirically. The choice of prior will influence how the estimated
deformations interpolate between features in the images. As this variability is unknown, the implementation of DARTEL is currently able to
use a variety of priors (defined by matrix H). These are based on either membrane, bending or linear elastic energy.

• The membrane energy model is also known as the Laplacian model and is given in 3D by

Z X 3
3 X
k Bui ðxÞ 2
E 1 ðvÞ ¼ dx ð21Þ
2 xaX i¼1 j¼1 Bxj

In the above equations, λ is a constant that encodes the amount of variability. Larger values of λ indicate that the flow field should be
smoother. The matrix H is very large and sparse, but because the operation of Hv is actually a convolution, it is relatively straightforward to
compute. The function with which v is convolved can be derived from the rows of H. For example, in the case of the membrane energy
model in two dimensions, Hv would be obtained by convolving the horizontal and vertical components of v by
0 1
0 kd2
1 0
@ kd2 2kðd2 2 2 A
ð22Þ
2 1 þ d2 Þ kd2
2
0 kd1 0

where δ1 is the height of a voxel and δ2 is the width.

• The bending energy (biharmonic or thin plate model) is given by

Z X
3 X 3 2
3 X 2
k B ui ðxÞ
E 1 ðvÞ ¼ dx ð23Þ
2 xaX i¼1 j¼1 k¼1 Bxj Bxk

In two dimensions, the multiplication Hv is obtained by convolving each component of v with

0 1
0 0 kd4
1 0 0
B0 2kd2 2
4kd2 2 2
2kd2 2 C
B 4 1 d2 1 ðd1 þ d2 Þ 1 d2 0 C
B kd 4kd2 ðd1 þ d2
2 2
kð6d1 þ 6d2 þ 8d2
4 4 2
4kd2 ðd1 þ d2
2 2 4 C
ð24Þ
B 2 2 Þ 1 d2 Þ 2 Þ kd2 C
@0 2kd2 2
4kd2 ðd2
þ 2
Þ 2kd2 2 A
1 d2 1 1 d2 1 d2 0
0 0 kd4
1 0 0

• The linear elastic energy is given by

Z X
3 X !
1 3
Buj ðxÞ Buk ðxÞ μ Buj ðxÞ Buk ðxÞ 2
E 1 ðvÞ ¼ k þ þ dx ð25Þ
2 xaX j¼1 k¼1 Bxj Bxk 2 Bxk Bxj

Here, λ encodes the variance of the trace of the Jacobian matrix (the divergence) at each point in v. Larger values will tend to cause volumes
to be preserved during the transformation. Jacobian matrices can be decomposed into the sum of symmetric and skew-symmetric (anti-
symmetric) matrices. The μ parameter encodes the amount of variance in the elements of the symmetric component and this tends toward
penalising scaling and shearing, while allowing rotations to occur more freely. Again, the multiplication Hv is performed as a convolution
102 J. Ashburner / NeuroImage 38 (2007) 95–113

operation (see Fig. 6), but it is more complex as it involves mixing computations on the vertical and horizontal components of the flow
fields. In order to obtain the convolved vertical component, it is convolved with
0 1
0 ð2μ þ kÞd2
1 0
@ μd2 μð4d2 2 2
μd2 A ð26Þ
2 1 þ 2d2 Þ þ 2kd1 2
2
0 ð2μ þ kÞd1 0
and this is added to the horizontal component convolved with
0 1
μ þ k 1 1 μ þ k 1 1
d1 d2 0 d1 d2
B C
B0 4 0 0
4 C ð27Þ
@ A
μ þ k 1 1 μ þ k 1 1
d1 d2 0 d1 d2
4 4
The convolved horizontal component is by convolving the vertical component with the array in Eq. (27) and adding it to the horizontal
component convolved with
0 1
0 μd2
1 0
@ ð2μ þ kÞd2 μð4d2 þ 2d2 Þ þ 2kd2 ð2μ þ kÞd2 A ð28Þ
2 2 1 2 2
0 μd2
1 0

Currently, the best form of regularisation is unknown. Future work will attempt to learn the optimal settings for the priors from image data
itself. In principle, this is just a Type-II Maximum Likelihood problem (with Laplace approximations). Unfortunately, there are a number of
technical challenges to overcome before the approach could become practically feasible for problems of this scale.

The likelihood term and its derivatives

This section only considers a likelihood term based upon the mean-squared difference between a pair of images. The model assumes that
the individual image g is generated from the template image f by

gðxÞ ¼ f ðΦð1Þ ðxÞÞ þ ðxÞ ð29Þ

where ϵ(x) is drawn from a zero mean Gaussian distribution, which is assumed to be independent and identically distributed over voxels.
Ignoring the constant terms and assuming the variance of ϵ(x) is one, the negative log-likelihood is obtained by summing over the centres of
the i voxels
!2
1X I
E2 ¼ gi fi Φð1Þ ð30Þ
2 i¼1

where fi (Φ(1)) denotes the ith voxel of the warped template.

Fig. 6. The H matrix for computing the linear elastic energy of a 2D 6 × 6 flow field, with wrapped boundaries. This matrix uses a value for μ that is twice that
used for λ.
J. Ashburner / NeuroImage 38 (2007) 95–113 103

For clarity, in what follows, the dependence of the flow and other quantities on location x is dropped. The diffeomorphic mapping, Φ(1) is the
solution to Φ ˙ = u(Φ) at unit time. The starting point of the integration is an identity transform (Φ(0) = x). For ease of terminology, this section
assumes that the images, flow fields, deformations, etc., are all smooth continuous fields. Implementational details relating to interpolation and
boundary conditions are ignored.
The first derivatives of the likelihood term, with respect to changes in velocity are a vector field b(x). Within a continuous time
representation, the first derivative at any point is given by
Z 1
ðtÞ
b¼ jJΦ jðg ðtÞ f ð1tÞ Þðjf ð1tÞ Þdt ð31Þ
t¼0

where g(−t) ≡ g(Φ(−t)), and f (1−t) ≡ f (Φ(1−t)) ≡ f (Φ(1) ○ Φ(−t)) ≡ f (Φ(− t) ○ Φ(1)). The image gradients and Jacobian matrices are denoted by the
j and J operators. At each point in a vector field, there is assumed to be a column vector of values.
The second derivatives can be treated as a symmetric second order tensor field A(x). Ignoring second derivatives in the image data,
these can be obtained in a similar way (see Appendix A).
Z 1
ðtÞ
A¼ jJΦ jðjf ð1tÞ ÞT ðjf ð1tÞ Þdt ð32Þ
t ¼0

Within a discrete time representation, the registration can be conceptualised as a series of intermediate small deformation registration
steps, which are optimised simultaneously. The first and second derivatives are then

1XN −1
ðn=N Þ
b¼ jJΦ j g ðn=N Þ f ððN nÞ=N Þ
N n¼0

ðjf ððN 1nÞ=N Þ ÞBΦð1=N Þ ð33Þ

1XN −1 T
ðn=N Þ
A¼ jJΦ j ðjf ððN 1nÞ=N Þ ÞBΦð1=N Þ
N n¼0

ðjf ððN 1nÞ=N Þ ÞBΦð1=N Þ ð34Þ

where g(−n/N) ≡ g(Φ(−n/N)). Note that if N = 1, these are equivalent to the derivatives for registration within the small-deformation setting.
The DARTEL algorithm uses a recursive procedure for computing an approximation to the derivatives, using a value for N which
is a power of two (N = 2K). It begins by first computing Φ>(1) and the Jacobian matrix J(1) Φ , according to the current estimates of the
flow field. This is done by a scaling and squaring procedure, which begins with a small deformation approximation.

1
Φð1=2 Þ
K
¼xþ u ð35Þ
2K
ð1=2K Þ 1
JΦ ¼Iþ Ju ð36Þ
2K

Then for k = 0..K − 1 steps, the small deformation approximation is recursively squared in order to generate Φ(1) and J(1)
Φ .

Φð2 =2K Þ
¼ Φð1Þ BΦð2 =2 Þ
kþ1 k K
ð37Þ
k K
ð2 kþ1
=2 ÞK
ð2k =2K Þ ð2 =2 Þ
ÞBΦð2 =2 Þ JΦ
k K
JΦ ¼ ðJΦ ð38Þ

The first and second derivatives are initialised by

1 ð0Þ
bð0Þ ¼ K g f ð1Þ h ð39Þ
2
1
Að0Þ ¼ K hT h ð40Þ
2

where

ð1Þ ð1=2K Þ 1 T
h ¼ ðJΦ ÞðJΦ Þ ðjf ð0Þ ÞBΦð1Þ ð41Þ
104 J. Ashburner / NeuroImage 38 (2007) 95–113

Backward transforms are initialised by

1
Φð1=2 Þ
K
¼x u ð42Þ
2K
ð1=2K Þ 1
JΦ ¼ I K Ju ð43Þ
2
Then the following are computed recursively for k = 0..K–1 steps

ð2k =2K Þ T
ð2k =2K Þ
bðkþ1Þ ¼ bðkÞ þ jJΦ bðkÞ BΦð2 =2 Þ
k K
j JΦ ð44Þ

ð2k=2K Þ ð2k=2K Þ T ð2k =2K Þ
Aðkþ1Þ ¼ AðkÞ þ j JΦ AðkÞ BΦð2 =2 Þ JΦ
k K
j JΦ ð45Þ

Φð2 =2K Þ
¼ Φð2 =2 Þ BΦð2 =2 Þ
kþ1 k K k K
ð46Þ

ð2 kþ1
=2 ÞK
ð2k =2K Þ ð2k =2K Þ
ÞBΦð2 =2 Þ JΦ
k K
JΦ ¼ ðJΦ ð47Þ

If K = 0, the derivatives are exactly equivalent to those used for small deformation registration. Larger values of K produce the derivatives
for diffeomorphic registration. In practice, these recursively computed derivatives are only an approximation because of the effect of
iteratively resampling (Eqs. (44) and (45)). In particular, it is not really clear how to optimally and efficiently resample (interpolate) the tensor
field A(x) such that the positive definite (and other) properties are best retained (Pennec et al., 2006; Arsigny et al., 2006c). Currently, the
individual scalar fields that comprise both b(x) and A(x) are sampled using trilinear interpolation.
DARTEL has been implemented to include an option for inverse consistent registration. In this formulation, the likelihood part of the
objective function is

I 2 1 XI 2
1X
E2 ¼ gi fi ðΦð1Þ Þ þ fi gi ðΦð1Þ Þ ð48Þ
2 i¼1 2 i¼1

This inverse consistency was achieved by making the first and second derivatives exactly symmetric by adding them to derivatives computed by
integrating the other way. This forward integration is very similar to that shown for backward integration, except for a few small changes. The
results of such a formulation are exactly inverse consistent spatial transformations.
This section has treated the first and second derivatives as smooth continuous vector and tensor fields. In the next section, the vector field
of first derivatives will be treated as a column vector (b) and the tensor field of second derivatives as a large sparse matrix (A). This discrete
representation corresponds to sampling the fields on a fine regular grid and assumes a good lattice approximation.

Solving the equations

Each Levenberg–Marquardt iteration involves the update

vðnþ1Þ ¼ vðnÞ ðA þ H þ fIÞ1 ðb þ HvðnÞ Þ ð49Þ

This requires the solution to the following set of equations

ðA þ H þ fIÞ1 b þ HvðnÞ ð50Þ

The model is very high-dimensional, so storing a full matrix of second derivatives is not possible because of memory limitations. For this
reason, the optimisation uses a method for solving systems of sparse equations. Initial attempts used a conjugate gradient approach (Gilbert et
al., 1992), but this was found to be slow. Instead, a full multigrid (FMG) approach (Haber and Modersitzki, 2006) is used to solve the update
equations. This is based upon the explanations in Chapter 19 of Press et al. (1992).
FMG approaches are based on relaxation methods, which are performed at multiple scales in order to enhance the speed. Relaxation
methods for obtaining a least-squares solution to a set of equations of the form Mw = c involve splitting the matrix into M = E + F, where E is
easy to invert and F is the remainder. The procedures are iterative and involve assigning initial estimates for w, and then updating at iteration
n according to

wðnþ1Þ ¼ E1 c FwðnÞ ð51Þ

Usually, E is simply a diagonal matrix. Providing M is strictly diagonally dominant3, then the updates of Eq. (51) are guaranteed to converge.
This is the case when using a membrane energy model for the prior potential.

3
For each row, the magnitude of the diagonal element must be greater than the sum of the magnitudes of the off-diagonal elements.
J. Ashburner / NeuroImage 38 (2007) 95–113 105

A different update strategy is required if diagonal dominance conditions are not satisfied, as is the case when modelling the prior potential
with bending energy or linear elasticity. This can be derived by re-writing Eq. (51) as

wðnþ1Þ ¼ wðnÞ þ E1 c FwðnÞ EwðnÞ ð52Þ

Providing M is positive definite, then the following regularised update strategy will ensure convergence, where s is chosen to ensure diagonal
dominance of M + sI. This is a similar stabilising strategy to that used by Levenberg–Marquardt optimisation.

wðnþ1Þ ¼ wðnÞ þ ðE þ sIÞ1 c MwðnÞ ð53Þ

In practice, the updates are performed in place so that the updated values can be used immediately in the current iteration. This is the
Gauss–Seidel method, as opposed to Jacobi's method, which uses only the values of w from the previous iteration. The Gauss–Seidel method
is faster than Jacobi's method and also requires less memory (only one copy of w instead of two). The ordering of the updates of a Gauss–
Siedel iteration can be tuned to optimise performance. A red–black checkerboard updating scheme is best if using membrane energy. This
involves alternating between updates of all the “red” voxels and then all the “black” voxels. For other prior potential models, the updates can
be done by sweeping through w along a variety of different directions.
In most descriptions of relaxation methods, the E matrix is diagonal, but this is not the case in the current implementation. For volumetric
registration, inverting E consists of inverting a series of symmetric 3 × 3 matrices, whereas a series of 2 × 2 matrices would be inverted for a
2D implementation (see Fig. 7).
Relaxation methods take a very long time to estimate the low spatial frequency components of w, whereas the higher frequency
components are estimated relatively quickly. Multigrid methods are a way of achieving more rapid convergence by using relaxation methods
at various different spatial scales. The full-multigrid (FMG) method is a recursive approach, which involves cycling through the scales. Press
et al. (1992) describe the full-multigrid method for solving a relatively simple problem. This algorithm was extended so that the FMG method
can be applied to 3D images of any dimensions, with circulant boundary conditions and more complex second derivatives of the types
described above. The full details of the approach are omitted, but a brief summary of the procedure is illustrated in Fig. 8 and some of the
ideas are elaborated below.
Multigrid methods usually begin by estimating the field at the coarsest scale, and then zooming this coarse estimate to the next higher
resolution (prolongation). The lower frequencies of the zoomed version tend to be fairly accurate, but the high-frequencies require a few
iterations of relaxation to refine them. This refined version is then prolongated to the next higher resolution and so on until the highest
resolution solution is reached.
Such a single ascent through the various scales is rarely enough to achieve an accurate solution. Further refinement is needed, and this is
obtained by computing the field that needs to be added to w, such that the defect (the residuals, c − Mw) is minimised. This is achieved by

Fig. 7. This figure shows a schematic of the matrices involved in the optimisation. Because of the large dimensions involved, the matrices shown here are only
for 2D registration of 4 × 4 images. Top-left: the H matrix (for linear elasticity, where μ = 1 and λ = 0), which is used to regularise the registration. Top-right:
the A matrix that encodes the second derivatives of the likelihood term. Bottom-left: the E matrix, which contains selected diagonals of M (where M consists
of A + H + ζI). Inverting this matrix involves inverting a series of symmetric 2 × 2 matrices. Bottom-right: the F matrix (M − E).
106 J. Ashburner / NeuroImage 38 (2007) 95–113

Fig. 8. A schematic of the FMG algorithm for two cycles and four different scales. The algorithm proceeds from left to right. The different heights of the boxes
indicate the grid coarseness, where the coarsest grid is at the bottom, and the finest at the top. (A) A coarse solution for w is interpolated up to the resolution of the
current grid (prolongation), and added to the current estimate for w. This solution is refined by a few relaxation iterations, and the residuals (defect) computed.
This defect is then projected down to a coarser grid (restriction) for use as the c vector in the next step. (B) A coarse solution for w is prolongated to the current
grid and added to the current w. Relaxation is used to refine the solution, before it is prolongated for use in the next step. (C) The c vector is a restricted version of
the residuals from the previous step. The initial estimate for w is uniformly zero, but this is refined by relaxation. The residuals are computed, and these restricted
to a coarser grid for use as the c vector in the next step. (E) An exact solution is obtained on the coarsest grid.

computing the defect and restricting it to a coarser grid. At this coarse scale, it serves as the c vector. The equations are solved on this grid,
and the solution is prolongated back to a finer grid and added to the existing w. This step reduces more of the low frequency signal in the
defect. A few relaxation steps are then performed in order to reduce the high frequencies. This procedure can be cycled through a number of
times, each time reducing the defect.
Solving the equations on the coarse grid would involve restricting the defect to an even coarser grid, solving, prolongating the coarser
solution and adding it to the coarse solution. This would be done at all spatial scales, but at the coarsest scale, the solution can be obtained
directly because the matrices are much smaller.

Results and discussion priate for proper validation of nonlinear registration methods. This
is particularly true if the deformation model is the same for the
Evaluation of warping methods is a complex area. Generally, simulations as it is for the registration, because this only illustrates
the results of an evaluation are specific only to the data used to whether or not the optimisation strategy works. Another commonly
evaluate the model. MR images vary a great deal with different used form of “evaluation” involves examining the residual diffe-
subjects, field strengths, scanners, sequences, etc., so a model that rence after registration. Such a strategy would ignore the possi-
is good for one set of data may not be appropriate for another. bility of over-fitting and tends to favour those models with less
Validation should therefore relate to both the data and the regularisation.
algorithm. The question should be about whether it is appropriate The appropriateness of an evaluation depends on the particular
to apply a model to a data set, given the assumptions made by the application that the deformations are to be used for. For example, if
model. the application was spatial normalisation of functional images of
Very soon, the Non-rigid Image Registration Evaluation different subjects, then the most appropriate evaluation may be
Project (Christensen et al., 2006) will be ready for use. This based on assessing the sensitivity of voxel-wise statistical tests
framework will allow the developers of nonlinear registration (Gee et al., 1997; Miller et al., 2005). Because the warping
algorithms to compare their algorithms on the same data sets, using procedure is based only on structural information, it is blind to the
the same measures of accuracy. Currently, most developers use locations of functional activation. If the locations of activations can
their own data and measures to assess registration accuracy, which be brought into close correspondence in different subjects, then it is
precludes proper comparisons between competing models. safe to say that the spatial normalisation procedure works well.
There are various approaches that are currently used for Another application of intersubject registration may involve
evaluating registration models. An assessment based on colocalisa- identifying shape differences among populations of subjects. In
tion of manually defined landmarks would be a useful validation this case, the usefulness of the warping algorithm could be
strategy, because it allows the models to be compared with human assessed by how well the deformation fields can be used to
expertise (Hellier et al., 2001, 2002, 2003). The use of simulated distinguish between the populations (Lao et al., 2004). This
images with known underlying deformations is not really appro- approach can be considered as forms of cross-validation, because it
J. Ashburner / NeuroImage 38 (2007) 95–113 107

assesses how well the registration helps to predict additional size 121 × 145 × 121 voxels, and had an isotropic resolution of
information that it is not explicitly provided with. This is the main 1.5 mm.
evaluation strategy described in this section. To illustrate an application of internally consistent registration,
the DARTEL algorithm is demonstrated through the construction
Group-wise registration of average image templates. The scheme involves iterations of
DARTEL to map the scans above to their average, to form a new
Instead of simply matching a pair of images, the objective of average. This cycle is repeated 18 times in the hope of improving
intersubject registration is often to align the images of multiple the spatial precision of the average and selecting those features that
subjects. This is sometimes done by registering all the images with are conserved and are informative for registering over subjects.
a single template image. Such a procedure would produce different Intensity averages of the grey and white matter images were
results depending upon the choice of template, so this is an area generated to serve as an initial template for DARTEL registration
where internal consistency should be considered. A more optimal (see top row of Fig. 10). An inverse-consistent formulation4 was
template would be some form of average (Avants and Gee, 2004; used to register each individual brain with the template. The
Davis et al., 2004; Lorenzen et al., 2004). Registering such a likelihood term of the registration was based on the sum of squared
template with a brain image generally requires smaller (and difference between the grey matter and grey matter mean, plus that
therefore less error prone) deformations than would be necessary of the white matter and that of the remainder (i.e. one minus grey
for registering to an unusually shaped template. Such averages and white). The prior term was based on linear elasticity, with a
generally lack some of the detail present in the individual subjects' value for μ that was 10 times greater than the value for λ. A value
images. Structures that are more difficult to match are generally for K of 6 was used, which would be analogous to an Euler
slightly blurred in the average, whereas those structures that can be integration scheme using 64 time points. Registration was done
more reliably matched are sharper. Such an average generated from with eighteen Gauss–Newton iterations and, after every three
a large population of subjects would be ideal for use as a general iterations, the mean was re-computed. For the first six iterations, μ
purpose template. was set to 0.5. For the second six, it was 0.25, and for the last six, it
Four hundred and seventy-one T1 weighted MRI scans were was set to 0.125.
used to create such a template. Details of acquisition parameters, The initial template was quite smooth, but it became sharper
etc. can be found in Good et al. (2001). This experiment used the each time it was re-generated, resulting in a natural coarse-to-fine
same 465 scans, plus a few others. The subjects consisted of registration scheme (see Fig. 10). The aim of the heavier
264 males and 207 females, with ages ranging from 17 to 79. The regularisation in the early iterations was to avoid some of the
mean age was 31.8 (see Fig. 9 for more details). potential local minima. Registration of all 471 images took 2
These data were segmented using the algorithm in SPM5 weeks on a standard 3 year old desktop PC5 Spatially normalised
(Ashburner and Friston, 2005). This procedure includes a com- images of selected subjects are shown in Fig. 11.
ponent whereby pre-defined tissue probability maps (generated The whole procedure was repeated in an identical way, except
from a large number of subjects) are approximately registered with that a small deformation setting was used. All settings were
each image undergoing segmentation. A rigid body transformation identical, except that K was set to zero in order to achieve small
was extracted from the nonlinear deformations estimated by the deformation registrations. The resulting displacement fields were
segmentation algorithm using a Procrustes method, weighted by a later compared with those generated using the diffeomorphic
grey matter tissue probability map (Ashburner et al., 1998). These setting.
rigid-body transformations were used to reslice tissue probability
images of grey and white matter for each subject, such that they Exponentiation
were in approximate alignment. The resliced images were of
Computational precision is finite. For example, using double
precision floating point representation (64 bits), a value of about 1 +
2.2 × 10− 16 is indistinguishable from a value of 1. Similarly, for

4
From a generative modelling perspective, it would have been more
appropriate to use an asymmetric formulation whereby the template was
warped to match each individual image. The objective, however, was to
demonstrate the ease with which exactly inverse consistent registration
could be achieved with DARTEL. Within the functional imaging field, it is
also common practice to “spatially normalise” by warping the individual
images to match a common template, rather than match the template to the
individual images. The recommended strategy would normally be to use the
correct asymmetric model.
5
A single iteration of the asymmetric formulation of DARTEL is rather
faster than the symmetric formulation. On the same PC, each iteration (with
K = 6) takes 1 min. An iteration of the small deformation model (K = 0) is
faster than this, taking about 8.7 s. Much of the work in many current
registration methods consists of convolving gradients with the Green's
function of the regularisation operator. In three-dimensions, this requires six
3D Fourier transforms. To obtain an idea of the speed of the PC, the
MATLAB fftn function requires 8 s to compute these six Fourier transforms
Fig. 9. The age distribution of the 471 subjects. on a 128 × 128 × 128 volume.
108 J. Ashburner / NeuroImage 38 (2007) 95–113

Fig. 10. This figure shows the intensity averages of the grey (left) and white (right) matter images after different numbers of iterations. The top row shows the
average after initial rigid-body alignment. The middle row shows the images after three iterations, and the bottom row shows them after all 18 iterations.

single precision representations (32 bits), the relative accuracy is Inverse consistency
about 1.1 × 10− 7. For this reason, a scaling and squaring algorithm
for exponentiating a deformation can only involve squaring a finite This section assesses the inverse consistency of the deforma-
number of times. Exponentiating with too many squaring steps tions. The composition of a transform with its inverse should result
leads to numerical problems. The ensuing challenge is to determine in an identity transform. In practice, this is rarely achieved exactly
a suitable number (K) of steps. because of the discrete representation of the deformations. The
A typical flow field used for matching brains was exponen- resulting disparity (with the identity transform) was compared with
tiated using a range of values of K. Image sampling during each the inverse consistency that would be achieved by using a small
squaring step was done using trilinear interpolation. The root- deformation approximation.
mean squared (RMS) difference between the deformations derived A typical flow field is exponentiated to produce a forward
using K steps and K − 1 steps was then computed. An optimal deformation Φ(1), and the negative of the flow field is expo-
value for K was chosen around the point where the RMS nentiated to produce the inverse deformation Φ(− 1). Six squaring
difference was minimal. The results are plotted in Fig. 12, steps (i.e. 64 time points) were used during the exponentiation.
showing that, for these data and using single precision floating These were composed both ways (i.e. Φ(1) ○ Φ(− 1) and Φ(− 1) ○ Φ(1))
point representations, a value of around 6 or 7 appears to be and the mean and maximum RMS deviation from an identity
optimal (i.e. 64 to 128 time steps). transform was measured. The RMS differences were found to be
J. Ashburner / NeuroImage 38 (2007) 95–113 109

Fig. 11. The left panel shows rigid-body aligned grey matter tissue probability maps of four subjects: an 18 year old female (top left), a 79 year old female (top
right), a 17 year old male (bottom left) and a 67 year old male (bottom right). These represent the extremes in age of the subjects. The right panel shows the same
subjects data, but after spatial normalisation by warping to their average using the DARTEL algorithm.

0.023 and 0.022 voxels (0.034 and 0.032 mm), and the maximum was used (setting C, the regularisation constant, to infinity). The
differences anywhere within the volumes were 0.40 and 0.30 kernel matrix was generated from inner products of the flow fields,
voxels. such that
A small deformation inverse was generated by 2x − Φ(1), and this
was composed both ways with Φ(1). Similarly, (2x − Φ(− 1)) ○ Φ(− 1) K ¼ VT HV ð54Þ
and Φ(− 1) ○ (2x − Φ(− 1)) were computed. The RMS deviations of
where V is a matrix, with each column containing the parameters of
these small deformation approximations from the identity were 0.15,
the flow field for a subject. H is as in Eq. (19), and encodes linear
0.16, 0.16 and 0.17 voxels, and the maximum differences were 2.4,
elasticity with μ = 1 and λ = 0.
3.4, 2.5 and 4.0 voxels.
Cross-validation (with smoothing) was used to assess the
This demonstrates a clear advantage of the current framework
classification accuracy. This involved training with a random
over that of the small deformation setting.
selection of 400 of the subjects, and then making predictions about
the remaining 71 subjects. Training and testing were done by
Kernel pattern recognition
picking out the appropriate rows and columns of the K matrix for
the whole data set. Accuracy was assessed by how well the
In this section, we address one aspect of validity using pattern
predictions matched known information about those 71 subjects.
recognition schemes. The idea here is that large-scale deformations
Cross-validation was repeated 50 times in order to obtain a more
should capture or encode relevant and important anatomical
precise measure of accuracy.
features. This means that we can use classification performance
Nonlinear classification was also performed using a radial basis
as a surrogate measure of the quality of the features encoded by
function (RBF) classifier. The “kernel trick”7 was used to convert
DARTEL. To demonstrate this validation approach, support-vector
the inner products into distance measures, which were then used to
machines were used to classify images according to sex, and
compute the radial basis function kernels. For flow fields para-
relevance-vector machines to estimate the ages of subjects based
meterised by vA and vB, the value in the corresponding element of
upon their images. In brief, the assessment is of whether the
the kernel matrix is
diffeomorphic setting will enable pattern recognition approaches to

attain better performance, relative to the small-deformation setting. 1
Clearly, this does not represent an exhaustive validation of exp 2 ðvA vB ÞT HðvA vB Þ ð55Þ
2r
DARTEL; however, it does show how one can establish the utility
of DARTEL in the context of pattern recognition problems. A range of values for σ2 were tried, which varied from half the
The first challenge was to predict the sexes of the subjects. An variance of the distances, through to 32 times this variance. Results
off-the-shelf linear support vector classification (SVC) algorithm6 are shown in Table 1.

6
The quadratic programming algorithm was the implementation of A. J. 7
The “kernel trick” is based on (vA − vB)T H(vA − vB) being equivalent to
Smola, using the wrapper written by R. Vanderbei and S. Gunn. It can be vTAHvA + vTBHvB − 2vTAHvB, so the required distances can be derived from the
downloaded from http://www.isis.ecs.soton.ac.uk/isystems/kernel/. inner products. Note that H is symmetric.
110 J. Ashburner / NeuroImage 38 (2007) 95–113

Table 2
Sex prediction from the small deformation model
Percent M Percent F Percent Percent Overall κ statistic
identified identified classed classed percent
as M as F as M as F correct
being M being F
Linear 90.9 82.2 86.4 87.9 87.0 0.736
RBF 0.5 90.9 80.2 85.1 87.6 86.1 0.717
RBF 1.0 90.9 81.7 86.1 87.8 86.8 0.731
RBF 2.0 90.8 82.2 86.4 87.8 87.0 0.734
RBF 4.0 90.8 82.0 86.3 87.7 86.9 0.733
RBF 8.0 90.8 82.2 86.4 87.8 87.0 0.734
RBF 16 90.9 82.3 86.5 87.8 87.0 0.736
RBF 32 90.9 82.3 86.5 87.8 87.1 0.737

efficients. Brain shape changes with age tend to require higher

spatial frequency distortions to encode them (cortical thinning,
Fig. 12. Determining the optimal number of squaring steps by finding the ventricular enlargements, etc.) than the sex effects (total brain size
value of K that produces the lowest RMS difference between deformations encodes much of the sex differences). This means that predicting
generated with K and K − 1 squaring steps. The RMS difference is given in the ages of subjects may be a better test for the high-spatial
units of voxels.
frequency deformations. The results of these tests are presented in
Table 3. A plot of true ages, versus estimated ages using the
A virtually identical procedure was repeated, but using the diffeomorphic framework with the optimal RBF regression is
displacement fields derived from a small deformation setting. The shown in Fig. 13. The small deformation model gave slightly better
objective was to compare the classification accuracy in the predictions for linear regression, whereas the predictions were
diffeomorphic setting, with the accuracy obtained from a slightly more accurate for the diffeomorphic model when a RBF
comparable small-deformation model. Cross-validation was done kernel was used. Again, the differences are small and may not be
for linear, as well as RBF classification, and the results are shown significant.
in Table 2. Overall, the DARTEL registration produced slightly The constant velocity framework of DARTEL may limit
more accurate results than the small deformation model, but the the power of using such flow fields with pattern recognition
improvement was only in the region of about half of a percent approaches. Others have suggested a variable velocity framework
and may not be significant. for computational anatomy, whereby the analyses are based upon
The second challenge involved a comparison of how accurately “initial momentum” maps (Miller et al., 2006; Vaillant et al., 2004).
the subjects' ages could be predicted both with and without using the Future work will evaluate DARTEL with respect to a variable
diffeomorphic setting. Relevance-vector regression (Tipping, 2001) velocity registration strategy, and examine the feasibility of using
was used for making the predictions. This approach is based on DARTEL registration results to approximate such initial momen-
kernel matrices similar to those employed by SVMs, and the kernels tum maps.
that were used were the same as those for the sex classification.
Cross-validation was performed in a similar way to that for the Conclusions
classification (i.e. repeatedly training with 400 scans and testing
with 71—repeating 50 times). In this paper, we have described DARTEL, a principled and
Both linear and RBF regression were performed, both for small efficient diffeomorphic framework for registering images. Opti-
deformation and diffeomorphic models, and the results reported as misation is performed by a Levenberg–Marquardt strategy, and
the root mean squared error (in years) and as correlation co-

Table 3
Age prediction accuracy for both the small deformation and diffeomorphic
Table 1
models
Sex prediction from the diffeomorphic model
Small deformation Large deformation
Percent M Percent F Percent Percent Overall κ statistic
identified identified classed classed percent RMS error Correlation RMS error Correlation
as M as F as M as F correct
Linear 7.55 0.826 7.90 0.813
being M being F
RBF 0.5 7.64 0.816 7.34 0.830
Linear 91.0 83.6 87.4 88.1 87.7 0.749 RBF 1.0 7.07 0.842 6.84 0.850
RBF 0.5 91.0 80.7 85.5 87.7 86.4 0.722 RBF 2.0 6.84 0.851 6.64 0.857
RBF 1.0 91.1 82.4 86.6 88.1 87.2 0.739 RBF 4.0 6.74 0.854 6.56 0.859
RBF 2.0 91.1 82.9 86.9 88.2 87.5 0.745 RBF 8.0 6.70 0.856 6.52 0.860
RBF 4.0 91.0 83.2 87.1 88.1 87.5 0.746 RBF 16 6.68 0.856 6.50 0.861
RBF 8.0 91.0 83.3 87.2 88.1 87.5 0.747 RBF 32 6.80 0.849 6.64 0.854
RBF 16 91.0 83.4 87.2 88.1 87.6 0.748
The standard deviation of the subjects' ages was 12.24, so the RMS errors all
RBF 32 91.0 83.4 87.2 88.2 87.6 0.748
show clear improvements over this figure.
J. Ashburner / NeuroImage 38 (2007) 95–113 111

The introduction of a second, arbitrary, diffeomorphism (θ),

renders the objective function unchanged—provided that the
Jacobian determinant of θ is accounted for by a Jacobian change
of variables.
Z 2
1
E2 ¼ jJθ ðxÞj gðθðxÞÞ f ðΦð1Þ ðθðxÞÞÞ dx ð57Þ
2 xaX

Similarly, it can also be obtained by considering the evolution of

some θ over time
Z1 Z 2
1 ðtÞ
E2 ¼ jJθ ðxÞj gðθðtÞ ðxÞÞ f ðΦð1Þ ðθðtÞ ðxÞÞÞ dxdt
2
t¼0 xaX
ð58Þ

Within a discrete time representation, a large deformation can

be considered as a composition of a series of small deformations.
This is analogous to an Euler integration, and becomes
increasingly accurate as N, the number of time steps, approaches
infinity. First and second derivatives of E 2 will be derived for a
variable velocity framework, before constraining the model to
constant velocity. In the following, each of the small deforma-
Fig. 13. A plot of true versus estimated ages derived from diffeomorphic
flow fields and relevance vector regression (RBF 16).
tion displacements will be denoted by un, where n runs from 0
to N − 1. The notation Φ(A,B) is used to denote the composition
of (x + uA) ○ (x + uA − 1)○…○(x + uB). If the number of compo-
nents is zero, then Φ(A,B) is simply the identity transform.
requires matrix solutions for some very large sparse matrices. The
Similarly, for the evolving second diffeomorphism, θ(B,A) is
main contribution of this work is the efficient recursive approach
used to denote (x − uB) ○ (x − uB+1)○…○(x − uA).
used to compute the first and second derivatives used by the
optimisation, and the use of a full-multigrid method for solving Z
1 X
N 1
ð0;n1Þ
the equations. This report has focused on underlying theory, the E2 ¼ jJθ ðxÞj gðθð0;n1Þ ðxÞÞ
algorithm and operational details. 2N n¼0
xaX
The performance of this constant velocity diffeomorphic 2
registration scheme has been evaluated in relation to a small- f ðΦ ðN 1;0Þ
Bθð0;n1Þ BxÞ dx ð59Þ
deformation approach, using classification and regression based
upon anatomical features encoded by the deformations. The flow For any value of n, Φ(N − 1,0) is equivalent to Φ(N −1,n+1) ○ (x +
fields computed within this constant velocity diffeomorphic un) ○ Φ(n−1,0). Under the assumption of infinitesimally small
framework appeared to confer only a slight advantage for pattern steps, (x + un) ○ (x − un) will approach the identity transform, so
recognition approaches, when compared to displacement fields of Φ(n−1,0) ○ θ(0,n−1) will also approach the identity.
a small deformation model.
Z
1 X
N 1
ð0;n1Þ
Acknowledgments E2 ¼ jJθ ðxÞj gðθð0;n1Þ ðxÞÞ
2N n¼0
xaX
I would like to thank Prof. Karl Friston and three reviewers for 2
f ðΦ ðN 1;nþ1Þ
Bðx þ un ðxÞÞÞ dx ð60Þ
reading through this manuscript and suggesting a number of
improvements. This work was supported by the Wellcome Trust,
and much of the writing was done while based in the Psychology The discrete parameterisation of a field, un (x), is normally by a
Department at Maastricht University. linear combination of basis functions. Even the so-called free-form
models, which usually obtain continuity via trilinear interpolation,
Appendix A. Deriving derivatives are essentially parameterised by a set of first degree B-spline basis
functions.
Rigorous derivations of first derivatives in a continuous time X
representation are given by Beg et al. (2005), but an alternative un ðxÞ ¼ vkn ρk ðxÞ ð61Þ
k
derivation is provided here. Derivatives are computed with respect to
the parameterisation of a flow field (v), from which the mapping Φ(1) Therefore
is computed. Within a continuous spatial representation, the
objective function is obtained by B ðN 1;nþ1Þ
f Φ Bðx þ un ðxÞÞ
Bvin T
Z 2
1 ð1Þ ¼ jf ðΦ ðN 1;nþ1Þ
Bðx þ un ðxÞÞ ρi ðxÞ ð62Þ
E2 ¼ gðxÞ f ðΦ ðxÞÞ dx ð56Þ
2 xaX
112 J. Ashburner / NeuroImage 38 (2007) 95–113

For a variable velocity framework, the first derivatives of E 2 are (WBIR'06), Utrecht, The Netherlands, 9–11 July 2006. Lecture Notes
therefore in Computer Science, vol. 4057. Springer Verlag, pp. 120–127.
Z Arsigny, V., Fillard, P., Pennec, X., Ayache, N., 2006c. Geometric means in a
BE 2 1 ð0;n1Þ novel vector space structure on symmetric positive-definite matrices.
¼ jJθ ðxÞj gðθð0;n1Þ ðxÞÞ f ðΦðN 1;nÞ ðxÞÞ
Bvin N SIAM J. Matrix Anal. Appl. 29 (1), 328–347.
xaX
T Ashburner, J., Friston, K.J., 2005. Unified segmentation. NeuroImage 26,
jf ðΦðN 1;nþ1Þ Þ Bðx þ un ðxÞÞ ρi ðxÞdx 839–851.
Ashburner, J., Hutton, C., Frackowiak, R.S.J., Johnsrude, I., Price, C.,
ð63Þ Friston, K.J., 1998. Identifying global anatomical differences: deforma-
tion-based morphometry. Hum. Brain Mapp. 6 (5), 348–357.
Rather than using the exact second derivatives for optimisation,
Avants, B., Gee, J.C., 2004. Geodesic estimation for large deformation
it is more practical to use an approximation that is guaranteed to be anatomical shape averaging and interpolation. NeuroImage 23,
positive definite. This is the approximation used by the Gauss– S139–S150.
Newton optimisation algorithm, as opposed to the Newton– Beg, M.F., Miller, M.I., Trouvé, A., Younes, L., 2005. Computing large
Raphson algorithm. Press et al. (1992) says more about the pros deformation metric mappings via geodesic flows of diffeomorphisms.
and cons of one version over the other. Int. J. Comput. Vis. 61 (2), 139–157 (February).
Bro-Nielsen, M., Gramkow, C., 1996. Fast fluid registration of medical
Z T
B2 E 2 1 ð0;n1Þ
images. In: Hhne, K-H., Kikinis, R. (Eds.), Proc. Visualization in
¼ jJθ ðxÞj jf ðΦðN 1;nþ1Þ Þ Bðx þ un ðxÞÞ ρi ðxÞ Biomedical Computing (VBC). Lecture Notes in Computer Science, vol.
Bvin Bvjn N

xaX
T 1131. Springer-Verlag, Berlin, pp. 267–276.
jf ðΦðN1;nþ1Þ Þ Bðx þ un ðxÞÞ ρj ðxÞ dx Christensen, G.E., 1999. Consistent linear elastic transformations for image
matching. In: Kuba, A., Samal, M., Todd-Pokropek, A. (Eds.), Proc.
ð64Þ Information Processing in Medical Imaging (IPMI). Lecture Notes in
Computer Science, vol. 1613. Springer-Verlag, Berlin, pp. 224–237.
Christensen, G.E., Rabbitt, R.D., Miller, M.I., 1994. 3D brain mapping using
The derivatives for a constant velocity framework are simply a deformable neuroanatomy. Phys. Med. Biol. 39, 609–618.
obtained by summing over the derivatives that would be used Christensen, G.E., Rabbitt, R.D., Miller, M.I., Joshi, S.C., Grenander, U.,
for variable velocity. Note that the notation is changed to the Coogan, T.A., Van Essen, D.C., 1995. Topological properties of smooth
simpler version that can be used for the constant velocity anatomic maps. In: Bizais, Y., Barillot, C., Di Paola, R. (Eds.), Proc.
model. Information Processing in Medical Imaging (IPMI). Kluwer Academic
Publishers, Dordrecht, The Netherlands, pp. 101–112.
N 1 Z
BE 2 1X ðn=N Þ
Christensen, G.E., Rabbitt, R.D., Miller, M.I., 1996. Deformable templates
¼ jJθ j gðΦðn=N Þ ðxÞÞ f ðΦððN nÞ=N Þ ðxÞÞ using large deformation kinematics. IEEE Trans. Image Process. 5,
Bvi N n¼0
1435–1447.
xaX T Christensen, G.E., Geng, X., Kuhl, J.G., Bruss, J., Grabowski, T.J.,
jf ðΦððN 1nÞ=N Þ Þ BΦð1=N Þ ðxÞ ρi ðxÞdx
Pirwani, I.A., Vannier, M.W., Allen, J.S., Damasio, H., 2006.
ð65Þ Introduction to the non-rigid image registration evaluation project
(NIREP). In: Pluim, J.P.W., Likar, B., Gerritsen, F.A. (Eds.), Proc.
and .27wThird International Workshop on Biomedical Image Registration.
Lecture Notes in Computer Science, vol. 4057. Springer-Verlag, Berlin,
N 1 Z T
BE 2 1X ðn=N Þ pp. 128–135.
¼ jJΦ j jf ðΦððN 1nÞ=N ÞÞ BΦð1=N ÞðxÞ ρi ðxÞ Davis, B., Lorenzen, P., Joshi, S., 2004. Large deformation minimum mean
Bvi Bvj N n¼0
xaX squared error template estimation for computational anatomy. Proc.
T IEEE Int. Symp. Biomed. Imag. (ISBI) 173–176.
jf ðΦððN 1nÞ=N Þ Þ BΦð1=N Þ ðxÞ ρj ðxÞ dx Gee, J.C., Alsop, D.C., Aguirre, G.K., 1997. Effect of spatial normalization
on analysis of functional data. In: Hanson, K.M. (Ed.), Proc. SPIE
ð66Þ Medical Imaging 1997: Image Processing, pp. 312–322.
Gilbert, J.R., Moler, C., Schreiber, R., 1992. Sparse matrices in MATLAB:
When working with continuous functions, the main text treats design and implementation. SIAM J. Matrix Anal. Appl. 13 (1), 333–356
the first derivatives as a continuous vector field (b(x)), and the (URL http://citeseer.csail.mit.edu/article/gilbert92sparse.html).
second derivatives as a tensor field (A(x)). For the actual Good, C.D., Johnsrude, I.S., Ashburner, J., Henson, R.N.A., Friston, K.J.,
optimisation of the parameters (v), these derivatives are considered Frackowiak, R.S.J., 2001. A voxel-based morphometric study of ageing
as a vector and a square matrix, respectively. For simplification, the in 465 normal adult human brains. NeuroImage 14, 21–36.
indexing by x is often omitted. Haber, E., Modersitzki, J., 2006. A multilevel method for image registration.
SIAM J. Sci. Comput. 27, 1594–1607.
Hellier, P., Barillot, C., Corouge, I., Gibaud, B., Le Goualher, G., Collins,
References D.L., Evans, A.C., Malandain, G., Ayache, N., 2001. Retrospective
evaluation of inter-subject brain registration. In: Niessen, W.J., Viergever,
Arsigny, V., Commowick, O., Pennec, X., Ayache, N., 2006a. A Log– M.A. (Eds.), Proc. Medical Image Computing and Computer-Assisted
Euclidean framework for statistics on diffeomorphisms. Proc. of the 9th Intervention (MICCAI). Lecture Notes in Computer Science, vol. 2208.
International Conference on Medical Image Computing and Computer Springer-Verlag, Berlin, pp. 258–265.
Assisted Intervention (MICCAI'06), Lecture Notes in Computer Hellier, P., Ashburner, J., Corouge, I., Barillot, C., Friston, K.J., 2002. Inter
Science, 2–4 October 2006a. To appear. subject registration of functional and anatomical data using SPM. Proc.
Arsigny, V., Commowick, O., Pennec, X., Ayache, N., 2006b. A Log– Medical Image Computing and Computer-Assisted Intervention (MIC-
Euclidean polyaffine framework for locally rigid or affine registration. CAI). Lecture Notes in Computer Science, vol. 2489. Springer-Verlag,
In: Pluim, J.P.W., Likar, B., Gerritsen, F.A. (Eds.), Proceedings of the Berlin, pp. 587–590.
Third International Workshop on Biomedical Image Registration Hellier, P., Barillot, C., Corouge, I., Gibaud, B., Le Goualher, G., Collins, D.
J. Ashburner / NeuroImage 38 (2007) 95–113 113

L., Evans, A., Malandain, G., Ayache, N., Christensen, G.E., Johnson, Miller, M.I., Trouvé, A., Younes, L., 2006. Geodesic shooting for
H.J., 2003. Retrospective evaluation of inter-subject brain registration. computational anatomy. J. Math. Imaging Vis. 24 (2), 209–228 (ISSN
IEEE Trans. Med. Imag. 22 (9), 1120–1130. 0924-9907).
Lao, Z., Shen, D., Xue, Z., Karacali, B., Resnick, S.M., Davatzikos, C., Moler, C., Van Loan, C., 2003. Nineteen dubious ways to compute the
2004. Morphological classification of brains via high-dimensional shape exponential of a matrix, twenty-five years later. SIAM Rev. 45 (1),
transformations and machine learning methods. NeuroImage 21 (1), 3–49.
46–57. Pennec, X., Fillard, P., Ayache, N., 2006. A Riemannian framework for tensor
Lester, H., Arridge, S.R., Jansons, K.M., Lemieux, L., Hajnal, J.V., computing. Int. J. Comput. Vis. 66 (1), 41–66 (January) URL http://
Oatridge, A., 1999. Non-linear registration with the variable viscosity springerlink.metapress.com/openurl.asp?genre=article&issn=0920-
fluid algorithm. In: Kuba, A., Samal, M., Todd-Pokropek, A. (Eds.), 5691&volume=66&issue=1&spage=41. A preliminary version appeared
Proc. Information Processing in Medical Imaging (IPMI). Lecture Notes as INRIA Research Report 5255, July 2004.
in Computer Science, vol. 1613. Springer-Verlag, Berlin, pp. 238–251. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 1992.
Lorenzen, P., Davis, B., Gerig, G., Bullitt, E., Joshi, S., 2004. Multi-class Numerical Recipes in C, 2nd ed. Cambridge Univ. Press, Cambridge,
posterior atlas formation via unbiased Kullback–Leibler template UK.
estimation. In: Barillot, C., Haynor, D.R., Hellier, P. (Eds.), Proc. Thévenaz, P., Blu, T., Unser, M., 2000. Interpolation revisited. IEEE Trans.
Medical Image Computing and Computer-Assisted Intervention (MIC- Med. Imag. 19 (7), 739–758.
CAI). Lecture Notes in Computer Science, vol. 3216. Springer-Verlag, Thirion, J.-P. 1995. Fast non-rigid matching of 3D medical images. Technical
Berlin, pp. 95–102. Report 2547, Institut National de Recherche en Informatique et en
Miller, M., Banerjee, A., Christensen, G., Joshi, S., Khaneja, N., 1997. Automatique, May 1995. Available from http://www.inria.fr/rrrt/rr-2547.
Statistical methods in computational anatomy. Stat. Methods Med. Res. html.
6, 267–299. Tipping, M.E., 2001. Sparse bayesian learning and the relevance vector
Miller, M.I., Beg, M.F., Ceritoglu, C., Stark, C.E.L., 2005. Increasing the machine. J. Mach. Learn. Res. 1, 211–244.
power of functional maps of the medial temporal lobe using large Vaillant, M., Miller, M.I., Younes, L., Trouvé, A., 2004. Statistics on
deformation metric mapping. Proc. Natl. Acad. Sci. U. S. A. 102, diffeomorphisms via tangent space representations. NeuroImage 23,
9685–9690. S161–S169.