Elastix Manual v4.4
Elastix Manual v4.4
T = arg min
T
C(T; I
F
, I
M
), with (2.1)
C(T; I
F
, I
M
) = S(T; I
F
, I
M
) +P(T), (2.2)
where weighs similarity against regularity. To solve the above minimisation problem, there are basically
two approaches: parametric and nonparametric. The reader is referred to Fischer and Modersitzki [2004]
for an overview on nonparametric methods, which are not discussed in this manual. The elastix software
is based on the parametric approach. In parametric methods, the number of possible transformations is
limited by introducing a parametrisation (model) of the transformation. The original optimisation problem
thus becomes:
= arg min
T
C(T
; I
F
, I
M
), (2.3)
where the subscript indicates that the transform has been parameterised. The vector contains the
values of the transformation parameters. For example, when the transformation is modelled as a 2D rigid
transformation, the parameter vector contains one rotation angle and the translations in x and y direction.
We may write Equation (2.3) also as:
= arg min
C(; I
F
, I
M
). (2.4)
From this equation it becomes clear that the original problem (2.1) has been simplied. Instead of optimising
over a space of functions T, we now optimise over the elements of . Examples of other transformation
models are given in Section 2.6.
Figure 2.2 shows the general components of a parametric registration algorithm in a block scheme. The
scheme is a slightly extended version of the scheme introduced in Iba nez et al. [2005]. Several components
can be recognised from Equations (2.1)-(2.4); some will be introduced later. First of all, we have the images.
The concept of an image needs to be dened. This is done in Section 2.2. Then we have the cost function
C, or metric, which denes the quality of alignment. As mentioned earlier, the cost function consists of
a similarity measure S and a regularisation term P. The regularisation term P is not discussed in this
chapter, but in Chapter 6. The similarity measure S is discussed in Section 2.3. The denition of the
similarity measure introduces the sampler component, which is treated in Section 2.4. Some examples of
transformation models T
are given in Section 2.6. The optimisation procedure to actually solve the problem
4
0 100 50 150 200
0
50
100
150
200
250
300
30.0
20.0
Size=7x6
Spacing=( 20.0, 30.0 )
Physical extent=( 140.0, 180.0 )
Origin=(60.0,70.0)
Image Origin
Voronoi Region
Pixel Coverage
Delaunay Region
Linear Interpolation Region
Pixel Coordinates
Spacing[0]
S
p
a
c
i
n
g
[
1
]
Figure 2.3: Geometrical concepts associated with the ITK image. Adopted from Iba nez et al. [2005].
(2.4) is explained in Section 2.7. During the optimisation, the value I
M
(T
xiF
(I
F
(x
i
) I
M
(T
(x
i
)))
2
, (2.5)
with
F
the domain of the xed image I
F
, and |
F
| the number of voxels. Given a transformation T,
this measure can easily be implemented by looping over the voxels in the xed image, taking I
F
(x
i
),
calculating I
M
(T
(x
i
)) by interpolation, and adding the squared dierence to the sum.
Normalised Correlation Coecient (NCC): (AdvancedNormalizedCorrelation) The NCC is dened
as:
NCC(; I
F
, I
M
) =
xiF
I
F
(x
i
) I
F
I
M
(T
(x
i
)) I
M
xiF
I
F
(x
i
) I
F
2
xiF
I
M
(T
(x
i
)) I
M
2
, (2.6)
with the average grey-values I
F
=
1
|F |
xiF
I
F
(x
i
) and I
M
=
1
|F |
xiF
I
M
(T
(x
i
)).
Mutual Information (MI): (AdvancedMattesMutualInformation) For MI [Maes et al., 1997, Viola and
Wells III, 1997, Mattes et al., 2003] we use a denition given by Thevenaz and Unser [2000]:
MI(; I
F
, I
M
) =
mLM
fLF
p(f, m; ) log
2
p(f, m; )
p
F
(f)p
M
(m; )
, (2.7)
where L
F
and L
M
are sets of regularly spaced intensity bin centres, p is the discrete joint probability,
and p
F
and p
M
are the marginal discrete probabilities of the xed and moving image, obtained by
summing p over m and f, respectively. The joint probabilities are estimated using B-spline Parzen
windows:
p(f, m; ) =
1
|
F
|
xiF
w
F
(f/
F
I
F
(x
i
)/
F
)
w
M
(m/
M
I
M
(T
(x
i
))/
M
),
(2.8)
where w
F
and w
M
represent the xed and moving B-spline Parzen windows. The scaling constants
F
and
M
must equal the intensity bin widths dened by L
F
and L
M
. These follow directly from the
grey-value ranges of I
F
and I
M
and the user-specied number of histogram bins |L
F
| and |L
M
|.
Normalized Mutual Information (NMI): (NormalizedMutualInformation)
NMI is dened by NMI = (H(I
F
) + H(I
M
))/H(I
F
, I
M
), with H denoting entropy. This expression
can be compared to the denition of MI in terms of H: MI = H(I
F
) + H(I
M
) H(I
F
, I
M
). Again,
with the joint probabilities dened by 2.8 (using B-spline Parzen windows), NMI can be written as:
NMI(; I
F
, I
M
) =
fLF
p
F
(f) log
2
p
F
(f) +
mLM
p
M
(m; ) log
2
p
M
(m; )
mLM
fLF
p(f, m; ) log
2
p(f, m; )
=
mLM
fLF
p(f, m; ) log
2
(p
F
(f)p
M
(m; ))
mLM
fLF
p(f, m; ) log
2
p(f, m; )
. (2.9)
6
Kappa Statistic (KS): (AdvancedKappaStatistic) KS is dened as:
KS(; I
F
, I
M
) =
2
xiF
1
IF (xi)>0,IM(T(xi))>0
xiF
1
IF (xi)>0
+1
IM(T(xi))>0
, (2.10)
where 1 is the indicator function.
The SSD measure is a measure that is only suited for two images with an equal intensity distribution,
i.e. for images from the same modality. NCC is less strict, it assumes a linear relation between the intensity
values of the xed and moving image, and can therefore be used more often. The MI measure is even more
general: only a relation between the probability distributions of the intensities of the xed and moving image
is assumed. For MI it is well-known that it is suited not only for mono-modal, but also for multi-modal image
pairs. This measure is often a good choice for image registration. The NMI measure is, just like MI, suitable
for mono- and multi-modality registration. Studholme et al. [1999] seems to indicate better performance
than MI in some cases. The KS measure is specically meant to register binary images (segmentations). It
measures the overlap of the segmentations.
2.4 Image samplers
In Equations (2.5)-(2.8) we observe a loop over the xed image:
xiF
. Until now, we assumed that the
loop goes over all voxels of the xed image. In general, this is not necessary. A subset may suce [Thevenaz
and Unser, 2000, Klein et al., 2007]. The subset may be selected in dierent ways: random, on a grid, etc.
The sampler component represents this process.
The following samplers are often used:
Full: (Full) A full sampler simply selects all voxel coordinates x
i
of the xed image.
Grid: (Grid) The grid sampler denes a regular grid on the xed image and selects the coordinates x
i
on
the grid. Eectively, the grid sampler thus downsamples the xed image (not preceded by smoothing).
The size of the grid (or equivalently, the downsampling factor, which is the original xed image size
divided by the grid size) is a user input.
Random: (Random) A random sampler randomly selects a user-specied number of voxels from the xed
image, whose coordinates form x
i
. Every voxel has equal chance to be selected. A sample is not
necessarily selected only once.
Random Coordinate: (RandomCoordinate) A random coordinate sampler is similar to a random sampler.
It also randomly selects a user-specied number of coordinates x
i
. However, the random coordinate
sampler is not limited to voxel positions. Coordinates between voxels can also be selected. The grey-
value I
F
(x
i
) at those locations must of course be obtained by interpolation.
While at rst sight the full sampler seems the most obvious choice, in practice it is not always used,
because of its computational costs in large images. The random samplers are especially useful in combination
with a stochastic optimisation method [Klein et al., 2007]. See also Section 2.7. The use of the random
coordinate sampler makes the cost function C a more smooth function of , which makes the optimisation
problem (2.4) easier to solve. This has been shown in Thevenaz and Unser [2008].
2.5 Interpolators
As stated previously, during the optimisation the value I
M
(T
(x) = x +
x
k
Nx
p
k
x x
k
, (2.15)
with x
k
the control points,
3
(x) the cubic multidimensional B-spline polynomial [Unser, 1999], p
k
the
B-spline coecient vectors (loosely speaking, the control point displacements), the B-spline control
point spacing, and N
x
the set of all control points within the compact support of the B-spline at x.
The control points x
k
are dened on a regular grid, overlayed on the xed image. In this context we
talk about the control point grid that is put on the xed image, and about control points that are
moved around. Note that T
(x
k
) = x
k
+ p
k
, a common misunderstanding. Calling p
k
the control
point displacements is, therefore, actually somewhat misleading. Also note that the control point grid
is entirely unrelated to the grid used by the Grid image sampler, see Section 2.4.
The control point grid is dened by the amount of space between the control points = (
1
, . . . ,
d
)
(with d the image dimension), which can be dierent for each direction. B-splines have local support
(|N
x
| is small), which means that the transformation of a point can be computed from only a couple
of surrounding control points. This is benecial both for modelling local transformations, and for fast
computation. The parameters are formed by the B-spline coecients p
k
. The number of control
points P = (P
1
, . . . , P
d
) determines the number of parameters M, by M = (P
1
. . . P
d
) d. P
i
in
turn is determined by the image size s and the B-spline grid spacing, i.e. P
i
s
i
/
i
(where we use
since some additional control points are placed just outside the image). For 3D images, M 10000
parameters is not an unusual case, and M can easily grow to 10
5
10
6
. The parameter vector (for 2D
images) is composed as follows: = (p
1x
, p
2x
, . . . , p
P1
, p
1y
, p
2y
, . . . , p
P2
)
T
.
Thin-plate splines: (SplineKernelTransform) Thin-plate splines are another well-known representation
for nonrigid transformations. The thin-plate spline is an instance of the more general class of kernel-
based transforms Davis et al., Brooks and Arbel [2007]. The transformation is based on a set of
k = 1 . . . K corresponding landmarks in xed and moving image. The landmark displacements d
k
=
x
mov
k
x
x
k
form the parameter vector . The xed landmark positions x
x
k
are given by the user. The
transformation is expressed as a sum of an ane component and a nonrigid component:
T
(x) = x +Ax +t +
x
k
c
k
G(x x
x
k
), (2.16)
9
(a) xed (b) moving (c) translation
(d) rigid (e) ane (f) B-spline
Figure 2.5: Dierent transformations. (a) the xed image, (b) the moving image with a grid overlayed, (c)
the deformed moving image I
M
(T
(x) = T
NR
(x) +T
0
(x) x (2.17)
composition: T
(x) = T
NR
(T
0
(x)) = (T
NR
T
0
)(x) (2.18)
10
Figure 2.6: Iterative optimisation. Example for registration with a translation transformation model. The
arrows indicate the steps a
k
d
k
taken in the direction of the optimum, which is the minimum of the cost
function.
The latter method is in general to be preferred, because it makes several postregistration analysis tasks
somewhat more straightforward.
2.7 Optimisers
To solve the optimisation problem (2.4), i.e. to obtain the optimal transformation parameter vector ,
commonly an iterative optimisation strategy is employed:
k+1
=
k
+a
k
d
k
, k = 0, 1, 2, , (2.19)
with d
k
the search direction at iteration k, a
k
a scalar gain factor controlling the step size along the search
direction. The optimisation process is illustrated in Figure 2.6. Klein et al. [2007] give an overview of
various optimisation routines the literature oers. Examples are quasi-Newton (QN), nonlinear conjugate
gradient (NCG), gradient descent (GD), and Robbins-Monro (RM). Gradient descent and Robbins-Monro
are discussed below. For details on other optimisation methods we refer to [Klein et al., 2007, Nocedal and
Wright, 1999].
Gradient descent (GD): (StandardGradientDescent or RegularStepGradientDescent) Gradient de-
scent optimisation methods take the search direction as the negative gradient of the cost function:
k+1
=
k
a
k
g(
k
), (2.20)
with g(
k
) = C/ evaluated at the current position
k
. Several choices exist for the gain factor a
k
.
It can for example be determined by a line search or by using a predened function of k.
11
Robbins-Monro (RM): (StandardGradientDescent or FiniteDifferenceGradientDescent) The RM
optimisation method replaces the calculation of the derivative of the cost function g(
k
) by an ap-
proximation g
k
.
k+1
=
k
a
k
g
k
, (2.21)
The approximation is potentially faster to compute, but might deteriorate convergence properties of
the GD scheme, since every iteration an approximation error g(
k
) g
k
is made. Klein et al. [2007]
showed that using only a small random subset of voxels ( 2000) from the xed image accelerates regis-
tration signicantly, without compromising registration accuracy. The Random or RandomCoordinate
samplers, described in Section 2.4, are examples of samplers that pick voxels randomly. It is important
that a new subset of xed image voxels is selected every iteration k, so that the approximation error
has zero mean. The RM method is usually combined with a
k
as a predened decaying function of k:
a
k
=
a
(k +A)
, (2.22)
where a > 0, A 1, and 0 1 are user-dened constants. In our experience, a reasonable choice
is 0.6 and A approximately 10% of the user-dened maximum number of iterations, or less. The
choice of the overall gain, a, depends on the expected ranges of and g and is thus problem-specic. In
our experience, the registration result is not very sensitive to small perturbations of these parameters.
Section 5.3.6 gives some more advice.
Note that GD and RM are in fact very similar. Running RM with a Full sampler (see Section 2.4),
instead of a Random sampler, is equivalent to performing GD. We recommend the use of RM over GD, since
it is so much faster, without compromising on accuracy. In that case, the parameter a is the parameter
that is to be tuned for your application. A more advanced version of the StandardGradientDescent is
the AdaptiveStochasticGradientDescent, which requires less parameters to be set and tends to be more
robust Klein et al. [2009].
Other optimisers available in elastix are: FullSearch, ConjugateGradient, ConjugateGradientFRPR,
QuasiNewtonLBFGS, RSGDEachParameterApart, SimultaneousPerturbation, CMAEvolutionStrategy.
2.8 Multi-resolution
For a good overview of multi-resolution strategies see Lester and Arridge [1999]. Two hierarchical methods
are distinguished: reduction of data complexity, and reduction of transformation complexity.
2.8.1 Data complexity
It is common to start the registration process using images that have lower complexity, e.g., images that are
smoothed and possibly downsampled. This increases the chance of successful registration. A series of images
with increasing amount of smoothing is called a scale space. If the images are not only smoothed, but also
downsampled, the data is not only less complex, but the amount of data is actually reduced. In that case,
we talk about a pyramid. However, confusingly, the word pyramid is used by us also to refer to a scale
space. Several scale spaces or pyramids are found in the literature, amongst others Gaussian and Laplacian
pyramids, morphological scale space, and spline and wavelet pyramids. The Gaussian pyramid is the most
common one. In elastix we have:
Gaussian pyramid: (FixedRecursiveImagePyramid and MovingRecursiveImagePyramid) Applies smooth-
ing and down-sampling.
Gaussian scale space: (FixedSmoothingImagePyramid and MovingSmoothingImagePyramid) Applies smooth-
ing and no down-sampling.
12
(a) resolution 0 (b) resolution 1 (c) resolution 2 (d) original
(e) resolution 0 (f) resolution 1 (g) resolution 2 (h) original
Figure 2.7: Two multi-resolution strategies using a Gaussian pyramid ( = 8.0, 4.0, 2.0 voxels). The rst
row shows multi-resolution with down-sampling (FixedRecursiveImagePyramid), the second row without
(FixedSmoothingImagePyramid). Note that in the rst row, for each dimension, the image size is halved
every resolution, but that the voxel size increases with a factor 2, so physically the images are of the same
size every resolution.
Shrinking pyramid: (FixedShrinkingImagePyramid and MovingShrinkingImagePyramid) Applies no smooth-
ing, but only down-sampling.
Figure 2.7 shows the Gaussian pyramid with and without downsampling. In combination with a Full
sampler (see Section 2.4), using a pyramid with downsampling will save a lot of time in the rst resolution
levels, because the image contains much fewer voxels. In combination with a Random sampler, or Random-
Coordinate, the downsampling step is not necessary, since the random samplers select a user-dened number
of samples anyway, independent of the image size.
2.8.2 Transformation complexity
The second multiresolution strategy is to start the registration with fewer degrees of freedom for the trans-
formation model. The degrees of freedom of the transformation equals the length (number of elements) of
the parameter vector .
An example of this was already mentioned in Section 2.6: the use of a rigid transformation prior to
nonrigid (B-spline) registration. We may even use a three-level strategy: rst rigid, then ane, then nonrigid
B-spline.
Another example is to increase the number of degrees of freedom within the transformation model. With
a B-spline transformation, it is often good practice to start registration with a coarse control point grid, only
capable of modelling coarse deformations. In subsequent resolutions the B-spline grid is gradually rened,
thereby introducing the capability to match smaller structures. See Section 5.3.5.
2.9 Evaluating registration
How do you verify that your registration was successful? This is a dicult problem. In general, you dont
know for each voxel where it should map to. Here are some hints:
13
The deformed moving image I
M
(T
that relates the xed and the moving image (TransformParameters.?.txt), and,
optionally, the resulting registered image I
M
(T
(x) at some points x. If you want to deform a set of user-specied points, the appropriate call
is:
transformix -def inputPoints.txt -out outputDirectory -tp TransformParameters.txt
This will create a le outputpoints.txt containing the input points x and the transformed points T
(x)
(both given as voxel indices of the xed image and as physical coordinates), the displacement vector T
(x)x
(in physical coordinates), and, if -in inputImage.ext is also specied, the transformed output points as
indices of the input image
1
. The inputPoints.txt le should have the following structure:
1
The downside of this is that the input image is also deformed, which consumes time and may not be needed by the user.
If this is a problem, just run transformix without -in and compute the voxel indices yourself, based on the T(x) physical
coordinate data.
20
<index, point>
<number of points>
point1 x point1 y [point1 z]
point2 x point2 y [point2 z]
. . .
The rst line indicates whether the points are given as indices (of the xed image), or as points (in
physical coordinates). The second line stores the number of points that will be specied. After that the
point data is given.
Instead of the custom .txt format for the input points, transformix also supports .vtk les:
transformix -def inputPoints.vtk -out outputDirectory -tp TransformParameters.txt
The output is then saved as outputpoints.vtk. The support for .vtk les is still a bit limited. Currently,
only ASCII les are supported, with triangle meshes. Any meta point data is lost in the output le.
If you want to know the deformation at all voxels of the xed image, simply use -def all:
transformix -def all -out outputDirectory -tp TransformParameters.txt
The deformation eld is stored as a vector image deformationField.mhd. Each voxel contains the displace-
ment vector T
(x) x in physical coordinates. The elements of the vectors are stored as float values.
In addition to computing the deformation eld, transformix has the capability to compute the spatial
Jacobian of the transformation. The determinant of the spatial Jacobian identies the amount of local
compression or expansion and can be quite useful, for example in lung ventilation studies. The determinant
of the spatial Jacobian can be computed on the entire image only using:
transformix -jac all -out outputDirectory -tp TransformParameters.txt
The complete spatial Jacobian matrix can also be computed:
transformix -jacmat all -out outputDirectory -tp TransformParameters.txt
where each voxel is lled with a dd matrix, with d the image dimension, instead of a simply a scalar value.
With the command-line option -threads unsigned int the user can specify the maximum number of
threads that transformix will use.
4.3 The transform parameter le
The result of a registration is the transformation T
; I
F
, I
M
) =
1
N
i=1
i
N
i=1
i
C
i
(T
; I
F
, I
M
), (6.1)
with
i
the weights. This way the same xed and moving image is used for every sub-metric C
i
. This
way one can for example simultaneously optimise the SSD and MI during a registration.
elastix should be called like:
elastix -f fixed.ext -m moving.ext -out outDir -p parameterFile.txt
multi-image In this case the registration cost function is dened as:
C(T
; I
F
, I
M
) =
1
N
i=1
i
N
i=1
i
C(T
; I
i
F
, I
i
M
). (6.2)
This way one can simultaneously register all channels of multi-spectral input data, using a single type
of cost function for all channels.
elastix should be called like:
elastix -f0 fixed0.ext -f1 fixed1.ext -f<>... -m0 moving0.ext -m1 moving1.ext
-m<>... -out outDir -p parameterFile.txt
both In this case the registration cost function is dened as:
C(T
; I
F
, I
M
) =
1
N
i=1
i
N
i=1
i
C
i
(T
; I
i
F
, I
i
M
). (6.3)
32
This is the most general way of registration supported by elastix. This will make it possible for
example to register two lung CT data sets with MI, while simultaneously registering the ssure seg-
mentations with the kappa statistic. The two may help each other in getting a better registration
compared to only using a single channel.
All three scenarios use the multi-metric registration method, which is selected in the parameter le with:
(Registration "MultiMetricMultiResolutionRegistration")
Other parts of the parameter le should look like:
(FixedImagePyramid "FixedSmoothingImagePyramid" "FixedSmoothingImagePyramid" ...)
(MovingImagePyramid "MovingSmoothingImagePyramid" "MovingSmoothingImagePyramid" ... )
(Interpolator "BSplineInterpolator" "BSplineInterpolator" ...)
(Metric "AdvancedMattesMutualInformation" "AdvancedMeanSquareDifference" ...)
(ImageSampler "RandomCoordinate" "RandomCoordinate" ...)
(Metric0Weight 0.125)
(Metric1Weight 0.125)
(Metric2Weight 0.125)
etc
Another way of registering multi-spectral data is to use the -mutual information measure, described
below.
6.1.2 -mutual information
The -mutual information metric computes true multi-channel -mutual information. It does not use high-
dimensional joint histograms, but instead relies on k-nearest neighbour graphs to estimate -MI. Details can
be found in Staring et al. [2009]. It is specied in the parameter le with:
(Registration "MultiResolutionRegistrationWithFeatures")
(FixedImagePyramid "FixedSmoothingImagePyramid" "FixedSmoothingImagePyramid")
(MovingImagePyramid "MovingSmoothingImagePyramid" "MovingSmoothingImagePyramid")
(Interpolator "BSplineInterpolator" "BSplineInterpolator")
(Metric "KNNGraphAlphaMutualInformation")
(ImageSampler "MultiInputRandomCoordinate")
// KNN specific
(Alpha 0.99)
(AvoidDivisionBy 0.0000000001)
(TreeType "KDTree")
(BucketSize 50)
(SplittingRule "ANN_KD_STD")
(ShrinkingRule "ANN_BD_SIMPLE")
(TreeSearchType "Standard")
(KNearestNeighbours 20)
(ErrorBound 10.0)
A complete list of the available parameters can be found in the doxygen documentation
elx::KNNGraphAlphaMutualInformationMetric.
6.1.3 Penalty terms
This paragraph requires extension and modication.
33
In order to regularise the transformation T
T
x
and
2
T
xx
T
. These we call
the JacobianOfSpatialJacobian and JacobianOfSpatialHessian, respectively.
The transform class as dened in the ITK does not support the computation of spatial derivatives T/x
and
2
T/x
2
, and their derivatives to . Initially, we created non-generic classes that combine mutual infor-
mation and the mentioned penalty terms specically (the MattesMutualInformationWithRigidityPenalty
component in elastix version 4.3 and earlier). In 2010, however, we created a more advanced version of
the ITK transform that does implement these spatial derivatives. Additionally, we created a bending energy
regularisation class that takes advantage of these functions, see Section 6.1.4. We also reimplemented the
rigidity penalty term, see Section 6.1.5; it currently however does not yet use these spatial derivatives. More
detailed information can be found in Staring and Klein [2010a].
This all means that it is possible in elastix to combine any similarity metric with any of the available
penalty terms (currently the bending energy and the rigidity penalty term).
6.1.4 Bending energy penalty
The bending energy penalty term is dened in 2D as:
P
BE
() =
1
P
xi
2
T
xx
T
( x
i
)
2
F
(6.5)
=
1
P
xi
2
j=1
2
T
j
x
2
1
( x
i
)
2
+ 2
2
T
j
x
1
x
2
( x
i
)
2
+
2
T
j
x
2
2
( x
i
)
2
, (6.6)
where P is the number of points x
i
, and the tilde denotes the dierence between a variable and a given
point over which a term is evaluated. As you can see it penalises sharp deviations of the transformation
(e.g. no high compression followed by a nearby high expansion). You can use it to regularize your nonrigid
transformation if you experience problems such as foldings. In our current implementation the computation
time of this term is relatively large, though.
It can be selected in elastix using
(Metric "AnySimilarityMetric" "TransformBendingEnergy")
(Metric0Weight 1.0)
(Metric1Weight <weight>)
and has no further parameters.
34
6.1.5 Rigidity penalty
Some more advanced metrics, not found in the ITK, are available in elastix: The rigidity penalty term
P
rigid
(T
; I
M
) described in Staring et al. [2007a]. It is specied in the parameter le with:
(Metric "AnySimilarityMetric" "TransformRigidityPenalty")
// normal similarity metric parameters
...
// Rigidity penalty parameters:
(RigidityPenaltyWeight 0.1)
(LinearityConditionWeight 10.0)
(OrthonormalityConditionWeight 1.0)
(PropernessConditionWeight 100.0)
(MovingRigidityImageName "movingRigidityImage.mhd")
A complete list of the available parameters can be found in the doxygen documentation
elx::TransformRigidityPenalty. See also Section 6.1.3.
6.1.6 DisplacementMagnitudePenalty: inverting transformations
The DisplacementMagnitudePenalty is a cost function that penalises ||T
(x) x||
2
. You can use this
to invert transforms, by setting the transform to be inverted as an initial transform (using -t0), setting
(HowToCombineTransforms "Compose"), and running elastix with this metric. After that you can man-
ually set the initial transform in the last parameter le to "NoInitialTransform", and voila, you have the
inverse transform! Strictly speaking, you should then also change the Size/Spacing/Origin/Index settings to
match that of the moving image. Select it with:
(Metric "DisplacementMagnitudePenalty")
Note that inverting a transformation becomes conceptually very similar to performing an image registration
in this way. Consequently, the same choices are relevant: optimisation algorithm, multiresolution etc...
Note that this procedure was described and evaluated in Metz et al. [in press].
6.1.7 Corresponding points: help the registration
Most of the similarity measures in elastix are based on corresponding characteristics of the xed and
moving image. It is possible, however, to register based on point correspondence. Therefore, in elastix
4.4 we introduced a metric that minimizes the distance of two point sets with known correspondence. It is
dened as:
S
CP
=
1
P
xFi
x
Mi
T
(x
Fi
) (6.7)
where P is the number of points x
i
, and x
Fi
, x
Mi
corresponding points from the xed and moving image
point sets, respectively. The metric can be used to help in a dicult image registration task that fails if
performed fully automatically. A user can manually click corresponding points (or maybe automatically
extract), and setup elastix to not only minimize based on intensity, but also taking into account that some
positions are known to correspond. The derivative of S
CP
reads:
S
CP
=
1
P
xFi
1
x
Mi
T
(x
Fi
)
(x
Mi
T
(x
Fi
))
T
(x
Fi
). (6.8)
In elastix this metric can be selected using:
35
(Metric "AnySimilarityMetric" "CorrespondingPointsEuclideanDistancePointMetric")
(Metric0Weight 1.0)
(Metric1Weight <weight>)
Note that this metric must be specied as the last metric, due to some technical constraints. The xed and
moving point set can be specied on the command line:
elastix ... -fp fixedPointSet.txt -mp movingPointSet.txt
The point set les have to be dened in a specic format, identical to supplying points to transformix, see
Section 4.2.
6.1.8 VarianceOverLastDimensionMetric: aligning time series
This metric is explained in Metz et al. [in press]. Example parameter les can be found on the wiki parameter
le database, entry par0012.
This metric should be used to estimate the motion in dynamic imaging data (time series). The variance
of intensities over time is measured. Two- to four-dimensional imaging data is supported.
6.2 Image samplers
RandomSparseMask This variant of the random sampler is useful if the xed image mask is sparse (i.e.
consists of many zeros).
6.3 Interpolators
ReducedDimensionBSplineInterpolator This is a variant of the normal B-spline interpolator, which
uses a 0
th
order spline in the last dimension. This saves time when aligning time-series, when you do
not have to interpolate in the last (time) dimension anyway. Its usage is illustrated in entry par0012
of the parameter le database.
6.4 Transforms
DeformationFieldTransform This transform serves as a wrapper around existing deformation eld vector
images. It computes the transformation by interpolating the deformation eld image. The relevant
tags in the transform parameter le are as follows:
(Transform "DeformationFieldTransform")
(DeformationFieldFileName "deformationField.mhd")
(DeformationFieldInterpolationOrder 1)
(NumberOfParameters 0)
The deformation eld images pixel type should be a vector of float elements. It could be a deformation
eld that is the result of transformix -def all for example! Since this transform does not have any
parameters (the has zero length), it makes no sense to use it for registration. It can just be used as
an initial transformation (supplied by the option -t0) or as input for transformix.
SplineKernelTransform As an alternative to the B-spline transform, elastix includes a SplineKernelTransform,
which implements a thin-plate spline type of transform; see also Section 2.6. This transformation re-
quires a list of xed image landmarks (control points) to be specied, by means of an input points
le which has the same format as the -def le used by transformix (Section 4.2. See the doxygen
documentation on the website for a list of its parameters.
36
WeightedCombinationTransform This is a transformation that is modelled as a weighted combination
of user-specied transformations: T
(x) =
i
w
i
T
i
(x). The weights w
i
form the parameter vector
. The sub-transforms T
i
(x) may for example follow from a statistical deformation model, obtained
by principal component analysis. See the doxygen documentation on the website for a list of its
parameters.
BSplineTransformWithDiusion This transform implements the work described in Staring et al. [2007b].
BSplineStackTransform This transformation model denes a stack of independent B-spline transforma-
tions. Its usage is illustrated in Metz et al. [in press]. Example parameter les can be found on the
wiki parameter le database, entry par0012.
6.5 Optimisation methods
AdaptiveStochasticGradientDescent This optimizer is very similar to the StandardGradientDescent,
but implements an adaptive step size mechanism and estimates a proper initial value for SP a. See
Klein et al. [2009] for more details. In practice this optimizer works in many applications with its
default settings. Only the number of iterations must be specied by the user:
(Optimizer "AdaptiveStochasticGradientDescent")
(MaximumNumberOfIterations 500)
(SigmoidInitialTime 4.0)
(MaximumStepLength 1.0)
The last two options are not mandatory. SigmoidInitialTime corresponds to t
0
in the article; its
default value is 0. The MaximumStepLength corresponds to in the article; its default value equals the
average voxel spacing of xed and moving image.
Conjugate gradient ConjugateGradientFRPR
CMAEvolutionStrategy
FiniteDierenceGradientDescent
Full search
Quasi Newton
RegularStepGradientDescent RSGDEachParameterApart
SimultaneousPerturbation
37
Chapter 7
Developers guide
7.1 Relation to ITK
A large part of the elastix code is based on the ITK Iba nez et al. [2005]. The use of the ITK implies that
the low-level functionality (image classes, memory allocation etc.) is thoroughly tested. Naturally, all image
formats supported by the ITK are supported by elastix as well. The C++ source code can be compiled
on multiple operating systems (Windows XP, Linux, Mac OS X), using various compilers (MS Visual Studio
up to version 2010, GCC up to version 4.3), and supports both 32 and 64 bit systems.
In addition to the existing ITK image registration classes, elastix implements new functionality. The
most important enhancements are listed in Table 7.1.
A modular framework for sampling strategies. See for more details Staring and
Klein [2010b].
Several new optimisers: Kiefer-Wolfowitz, Robbins-Monro, adaptive stochastic
gradient descent, evolutionary strategy. Complete rework of existing ITK op-
timisers, adding more user control and better error handling: quasi-Newton,
nonlinear conjugate gradient.
Several new or more exible cost functions: (normalised) mutual information,
implemented with Parzen windowing similar to Thevenaz and Unser [2000], mul-
tifeature -mutual information, bending energy penalty term, rigidity penalty
term.
The ability to concatenate any number of geometric transformations.
The transformations support computation of not only T/, but also of spatial
derivatives T/x and
2
T/x
2
, and their derivatives to , frequently required
for the computation of regularisation terms. Additionally, the compact support of
certain transformations is integrated more generally. See for more details Staring
and Klein [2010a].
Linear combinations of cost functions, instead of just a single cost function.
Table 7.1: The most important enhancements and additions in elastix, compared to the ITK.
38
7.2 Overview of the elastix code
The elastix source code consists roughly of two layers, both written in C++: A) ITK-style classes that
implement image registration functionality, and B) elastix wrappers that take care of reading and setting
parameters, instantiating and connecting components, saving (intermediate) results, and similar adminis-
trative tasks. The modular design enables adding new components, without changing the elastix core.
Adding a new component starts by creating the layer A class, which can be compiled and tested independent
of layer B. Subsequently, a small layer B wrapper needs to be written, which connects the layer A class to
the other parts of elastix.
The image samplers, for example, are implemented as ITK classes that all inherit from a base class
itk::ImageSamplerBase. These can be found in src/Common/ImageSamplers. This is layer A in elastix.
For each sampler (random, grid, full... ) a wrapper is written, located in src/Components/ImageSamplers,
which takes care of conguring the sampler before each new resolution of the registration process. This is
layer B of elastix.
7.2.1 Directory structure
The basic directory structure is as follows:
dox
src/Common: ITK classes, Layer A stu. This directory also contains some external libraries, unrelated
to ITK, like xout (which is written by us) and the ANNlib.
src/Core: this is the main elastix kernel, responsible for the execution ow, connecting the classes,
reading parameters etc.
src/Components: this directory contains the components and their elastix wrappers (layer B). Very
component-specic layer A code can also be found here.
In elastix 4.4 and later versions, it is also possible to add your own Component directories. These can
be located anywhere outside the elastix source tree. See Section 7.3 for more details about this.
7.3 Creating new components
If you want to create your own component, it is natural to start writing the layer A class, without bothering
about elastix. The layer A lter should implement all basic functionality and you can test in a separate
ITK program if it does what it is supposed to do. Once you got this ITK class to work, it is trivial to write
the layer B wrapper in elastix (start by copy-pasting from existing components).
With CMake, you can tell elastix in which directories the source code of your new components is
located, using the ELASTIX USER COMPONENT DIRS option. elastix will search all subdirectories of these
directories for CMakeLists.txt les that contain the command ADD ELXCOMPONENT( <name> ... ). The
CMakeLists.txt le that accompanies an elastix component looks typically like this:
ADD_ELXCOMPONENT( AdvancedMeanSquaresMetric
elxAdvancedMeanSquaresMetric.h
elxAdvancedMeanSquaresMetric.hxx
elxAdvancedMeanSquaresMetric.cxx
itkAdvancedMeanSquaresImageToImageMetric.h
itkAdvancedMeanSquaresImageToImageMetric.hxx )
39
The ADD ELXCOMPONENT command is a macro dened in src/Components/CMakeLists.txt. The rst argu-
ment is the name of the layer B wrapper class, which is declared in elxAdvancedMeanSquaresMetric.h.
After that, you can specify the source les on which the component relies. In the example above, the les
that start with itk form the layer A code. Files that start with elx are the layer B code. The le
elxAdvancedMeanSquaresMetric.cxx is particularly simple. It just consists of two lines:
#include "elxAdvancedMeanSquaresMetric.h"
elxInstallMacro( AdvancedMeanSquaresMetric );
The elxInstallMacro is dened in src/Core/Install/elxMacro.h.
The les elxAdvancedMeanSquaresMetric.h/hxx together dene the layer B wrapper class. That class
inherits from the corresponding layer A class, but also from an elx::BaseComponent. This gives us the
opportunity to add a common interface to all elastix components, regardless of the ITK classes from which
they inherit. Examples of this interface are the following methods:
void BeforeAll(void)
void BeforeRegistration(void)
void BeforeEachResolution(void)
void AfterEachResolution(void)
void AfterEachIteration(void)
void AfterRegistration(void)
These methods are automatically invoked at the moments indicated by the name of the function. This gives
you a chance to read/set some parameters, print some output, save some results etc.
40
Appendix A
Example parameter le
//ImageTypes
(FixedInternalImagePixelType "float")
(FixedImageDimension 2)
(MovingInternalImagePixelType "float")
(MovingImageDimension 2)
//Components
(Registration "MultiResolutionRegistration")
(FixedImagePyramid "FixedRecursiveImagePyramid")
(MovingImagePyramid "MovingRecursiveImagePyramid")
(Interpolator "BSplineInterpolator")
(Metric "AdvancedMattesMutualInformation")
(Optimizer "StandardGradientDescent")
(ResampleInterpolator "FinalBSplineInterpolator")
(Resampler "DefaultResampler")
(Transform "EulerTransform")
// ********** Pyramid
// Total number of resolutions
(NumberOfResolutions 3)
// ********** Transform
//(CenterOfRotation 128 128) center by default
(AutomaticTransformInitialization "true")
(AutomaticScalesEstimation "true")
(HowToCombineTransforms "Compose")
// ********** Optimizer
// Maximum number of iterations in each resolution level:
(MaximumNumberOfIterations 300 300 600)
//SP: Param_a in each resolution level. a_k = a/(A+k+1)^alpha
(SP_a 0.001)
41
//SP: Param_alpha in each resolution level. a_k = a/(A+k+1)^alpha
(SP_alpha 0.602)
//SP: Param_A in each resolution level. a_k = a/(A+k+1)^alpha
(SP_A 50.0)
// ********** Metric
//Number of grey level bins in each resolution level:
(NumberOfHistogramBins 32)
(FixedKernelBSplineOrder 1)
(MovingKernelBSplineOrder 3)
// ********** Several
(WriteTransformParametersEachIteration "false")
(WriteTransformParametersEachResolution "false")
(ShowExactMetricValue "false")
(ErodeMask "true")
// ********** ImageSampler
// Number of spatial samples used to compute the
// mutual information in each resolution level:
(ImageSampler "RandomCoordinate")
(NumberOfSpatialSamples 2048)
(NewSamplesEveryIteration "true")
// ********** Interpolator and Resampler
//Order of B-Spline interpolation used in each resolution level:
(BSplineInterpolationOrder 1)
//Order of B-Spline interpolation used for applying the final deformation:
(FinalBSplineInterpolationOrder 3)
//Default pixel value for pixels that come from outside the picture:
(DefaultPixelValue 0)
42
Appendix B
Example transform parameter le
(Transform "EulerTransform")
(NumberOfParameters 3)
(TransformParameters -0.000000 -4.564513 -2.091174)
(InitialTransformParametersFileName "NoInitialTransform")
(HowToCombineTransforms "Compose")
// Image specific
(FixedImageDimension 2)
(MovingImageDimension 2)
(FixedInternalImagePixelType "float")
(MovingInternalImagePixelType "float")
(Size 256 256)
(Index 0 0)
(Spacing 1.0000000000 1.0000000000)
(Origin 0.0000000000 0.0000000000)
// EulerTransform specific
(CenterOfRotationPoint 128.0000000000 128.0000000000)
// ResampleInterpolator specific
(ResampleInterpolator "FinalBSplineInterpolator")
(FinalBSplineInterpolationOrder 3)
// Resampler specific
(Resampler "DefaultResampler")
(DefaultPixelValue 0.000000)
(ResultImageFormat "mhd")
(ResultImagePixelType "short")
43
Appendix C
Software License
Overview:
Elastix was developed by Stefan Klein and Marius Staring under
supervision of Josien P.W. Pluim, initially under contract to the
Image Sciences Institute, University Medical Center Utrecht, The
Netherlands.
Elastix is distributed under the new and simplified BSD license
approved by the Open Source Initiative (OSI)
[http://www.opensource.org/licenses/bsd-license.php].
The software is partially derived from the Insight Segmentation and
Registration Toolkit (ITK), which is also distributed under the new
and simplified BSD licence. The ITK is required by Elastix for
compilation of the source code.
The copyright of the files in the Common/KNN/ann_1.1 subdirectory
is held by a third party, the University of Maryland. The ANN
package is distributed under the GNU Lesser Public Licence. Please
read the content of the subdirectory for specific details on this
third-party license.
Elastix Copyright Notice:
Copyright (c) 2004-2010 University Medical Center Utrecht
All rights reserved.
License:
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
44
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the University Medical Center Utrecht nor the names of
its contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
45
Bibliography
R. Brooks and T. Arbel. Improvements to the itk::KernelTransform and subclasses. The Insight Journal,
January - June, 2007. URL http://hdl.handle.net/1926/494.
W. R. Crum, O. Camara, and D. L. G. Hill. Generalized overlap measures for evaluation and validation in
medical image analysis. IEEE Trans. Med. Imag., 25(11):14511461, 2006.
M. H. Davis, A. Khotanzad, D. P. Flamig, and S. E. Harms. A physics-based coordinate transformation for
3-D image matching. IEEE Trans. Med. Imag.
B. Fischer and J. Modersitzki. A unied approach to fast image registration and a new curvature based
registration technique. Linear Algebra Appl., 380:107 124, 2004.
Joseph V Hajnal, Derek L G Hill, and David J Hawkes, editors. Medical Image Registration. CRC Press,
2001. ISBN 0849300649.
D. L. G. Hill, P. G. Batchelor, M. Holden, and D. J. Hawkes. Medical image registration. Phys. Med. Biol.,
46(3):R1 R45, 2001.
L. Iba nez, W. Schroeder, L. Ng, and J. Cates. The ITK Software Guide. Kitware, Inc. ISBN 1-930934-15-7,
second edition, 2005.
S. Klein, M. Staring, and J. P. W. Pluim. Evaluation of optimisation methods for nonrigid medical image
registration using mutual information and B-splines. IEEE Trans. Image Process., 16(12):2879 2890,
December 2007.
S. Klein, U. A. van der Heide, I. M. Lips, M. van Vulpen, M. Staring, and J. P. W. Pluim. Automatic
segmentation of the prostate in 3D MR images by atlas matching using localized mutual information.
Math. Program., 35(4):1407 1417, April 2008.
S. Klein, J. P. W. Pluim, M. Staring, and M.A. Viergever. Adaptive stochastic gradient descent optimisation
for image registration. International Journal of Computer Vision, 81(3):227 239, March 2009.
H. Lester and S. R. Arridge. A survey of hierarchical non-linear medical image registration. Pattern Recognit.,
32(1):129 149, 1999.
F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens. Multimodality image registration by
maximization of mutual information. IEEE Trans. Med. Imag., 16(2):187 198, 1997.
J. B. A. Maintz and M. A. Viergever. A survey of medical image registration. Med. Image Anal., 2(1):1
36, 1998.
D. Mattes, D. R. Haynor, H. Vesselle, T. K. Lewellen, and W. Eubank. PET-CT image registration in the
chest using free-form deformations. IEEE Trans. Med. Imag., 22(1):120 128, 2003.
46
C.T. Metz, S. Klein, M. Schaap, T. van Walsum, and W.J. Niessen. Nonrigid registration of dynamic medical
imaging data using nD+t B-splines and a groupwise optimization approach. Medical Image Analysis, in
press.
J. Modersitzki. Numerical Methods for Image Registration. Oxford University Press, 2004. ISBN 978-
0198528418.
K. Murphy, B. van Ginneken, J.P.W. Pluim, S. Klein, and M. Staring. Semi-automatic reference stan-
dard construction for quantitative evaluation of lung CT registration. In Medical Image Computing and
Computer-Assisted Intervention (MICCAI), volume 5242 of Lecture Notes in Computer Science, pages
1006 1013, 2008.
J. Nocedal and S. J. Wright. Numerical optimization. Springer-Verlag, New York, 1999. ISBN 0-387-98793-2.
T. Rohlng, R. Brandt, R. Menzel, and C. R. Maurer Jr. Evaluation of atlas selection strategies for atlas-
based image segmentation with application to confocal microscopy images of bee brains. NeuroImage, 21
(4):14281442, 2004.
D. Rueckert, L. I. Sonoda, C. Hayes, D. L. G. Hill, M. O. Leach, and D. J. Hawkes. Nonrigid registration
using free-form deformations: Application to breast MR images. IEEE Trans. Med. Imag., 18(8):712
721, 1999.
M. Staring and S. Klein. itk::transforms supporting spatial derivatives. The Insight Journal, 2010a. URL
http://hdl.handle.net/10380/3215.
M. Staring and S. Klein. An image sampling framework for the itk. The Insight Journal, 2010b. URL
http://hdl.handle.net/10380/3190.
M. Staring, S. Klein, and J. P. W. Pluim. A rigidity penalty term for nonrigid registration. Medical Physics,
34(11):4098 4108, 2007a.
M. Staring, S. Klein, and J. P. W. Pluim. Nonrigid registration with tissue-dependent ltering of the
deformation eld. Physics in Medicine and Biology, 52(23):6879 6892, 2007b.
M. Staring, U. A. van der Heide, S. Klein, M. A. Viergever, and J. P. W. Pluim. Registration of cervical
MRI using multifeature mutual information. IEEE Transactions on Medical Imaging, 28(9):14121421,
2009. In press.
C. Studholme, D. L. G. Hill, and D. J. Hawkes. An overlap invariant entropy measure of 3D medical image
alignment. Pattern Recognit., 32:7186, 1999.
P. Thevenaz and M. Unser. Halton sampling for image registration based on mutual information. Sampling
Theory in Signal and Image Processing, 7(2):141171, 2008.
P. Thevenaz and M. Unser. Optimization of mutual information for multiresolution image registration. IEEE
Trans. Image Process., 9(12):2083 2099, 2000.
M. Unser. Splines: A perfect t for signal and image processing. IEEE Signal Process. Mag., 16(6):22 38,
1999.
Paul Viola and William M. Wells III. Alignment by maximization of mutual information. Int. J. Comput.
Vis., 24(2):137 154, 1997.
B. Zitova and J. Flusser. Image registration methods: a survey. Image Vis. Comput., 21(11):9771000, 2003.
47