Navigation Filter Best Practices - NASA Report 2018
Navigation Filter Best Practices - NASA Report 2018
R=20180003657 2019-12-27T06:15:13+00:00Z
NASA/TP–2018–219822
J. Russell Carpenter
Goddard Space Flight Center, Greenbelt, Maryland
Christopher N. D’Souza
Johnson Space Center, Houston, Texas
April 2018
NASA STI Program. . . in Profile
J. Russell Carpenter
Goddard Space Flight Center, Greenbelt, Maryland
Christopher N. D’Souza
Johnson Space Center, Houston, Texas
April 2018
The use of trademarks or names of manufacturers in this report is for accurate reporting and
does not constitute an official endorsement, either expressed or implied, of such products or
manufacturers by the National Aeronautics and Space Administration.
Available from:
April, 2018
This material is declared a work of the US Government and is not subject to copyright
protection in the United States, but may be subject to US Government copyright in other
countries.
If there be two subsequent events, the probability of the second b/N and the probability
of both together P/N , and it being first discovered that the second event has also happened,
from hence I guess that the first event has also happened, the probability I am right is P/b.
Thomas Bayes, c. 1760
The explicit calculation of the optimal estimate as a function of the observed variables
is, in general, impossible.
Rudolph Kalman, 1960.
The use of Kalman Filtering techniques in the on-board navigation systems for the
Apollo Command Module and the Apollo Lunar Excursion Module was an important factor
in the overwhelming success of the Lunar Landing Program.
Peter Kachmar, 2002.
Dedicated to the memory of Gene Muller, Emil Schiesser, and Bill Lear.
Contents
Foreword iii
Editor’s Preface v
Notational Conventions ix
It certainly should not come as a surprise to the reader that navigation systems are at
the heart of almost all of NASA’s missions, either on our launch vehicles, on robotic science
spacecraft, or on our crewed human exploration vehicles. Clearly navigation is absolutely
fundamental to operating our space systems across the wide spectrum of mission regimes.
Safe and reliably performing navigation systems are essential elements needed for routine
low Earth orbiting science missions, for rendezvous and proximity operation missions or
precision formation flying missions (where relative navigation is a necessity), for navigation
through the outer planets of the solar system, and for accomplishing pinpoint landing on
planets/small bodies, and many more mission types.
I believe the reader will find that the navigation filter best practices the team has
collected, documented, and shared in this first edition book will be of practical value in
your work designing, developing, and operating modern navigation systems for NASA’s
challenging future missions. I want to thank the entire team that has diligently worked
to create this NASA Engineering and Safety Center (NESC) GN&C knowledge capture
report. I especially want to acknowledge the dedication, care, and attention to detail as
well as the energy that both Russell Carpenter and Chris D’Souza, the report editors,
have invested in producing this significant product for the GN&C community of practice.
It was Russell and Chris who had the inspiration to create this report and they have
done a masterful job in not only directly technically contributing to the report but also
coordinating its overall development. It should be mentioned that some high-level limited
work was previously performed under NESC sponsorship to capture the lessons learned
over the course of the several decades NASA has been navigating space vehicles. This
report however fills a unique gap by providing extensive technical details and, perhaps
more importantly, providing the underlying rationale for each of the navigation filter best
practices presented here. Capturing these rationales has proven to be a greatly needed but
very challenging task. I congratulate the team for taking this challenge on.
The creation, and the wide dissemination of this report, is absolutely consistent with the
NESC’s commitment to engineering excellence by capturing and passing along, to NASA’s
next generation of engineers, the lessons learned emerging from the collective professional
experiences of NASA’s navigation system subject matter experts. I believe this book will
not only provide relevant tutorial-type guidance for early career GN&C engineers that have
limited real-world on the job experience but it should also serve as a very useful memory
aid for more experienced GN&C engineers, especially as a handy reference to employ for
technical peer reviews of navigation systems under development.
As the NASA Technical Fellow for GN&C I urge the reader (especially the “naviga-
tors” among you obviously) to invest the time to digest and consider how the best practices
provided in this report should influence your own work developing navigation systems for
the Agency’s future missions. The editors and I recognize this will be a living document
iii
iv FOREWORD
and we sincerely welcome your feedback on this first edition of the report, especially your
constructive recommendations on ways to improve and/or augment this set of best practices.
Cornelius J. Dennehy
NASA Technical Fellow for GN&C
January 2018
Editor’s Preface
As the era of commercial spaceflight begins, NASA must ensure that lessons the US
has learned over the first 50 years of the Space Age will continue to positively influence the
continuing exploration and development of space. Of the many successful strands of this
legacy, onboard navigation stands out as an early triumph of technology whose continuous
development and improvement remains as important to future exploration and commercial
development as it was in the era of Gemini and Apollo. The key that opened the door
to practical and reliable onboard navigation was the discovery and development of the ex-
tended Kalman filter (EKF) in the 1960s, a story that has been well-chronicled by Stanley
Schmidt [65], and Kalman filtering has far outgrown NASA’s applications over the inter-
vening decades. What are less well-documented are the accumulated art and lore, tips and
tricks, and other institutional knowledge that NASA navigators have employed to design
and operate EKFs in support of dozens of missions in the Gemini/Apollo era, well over one
hundred Space Shuttle missions, and numerous robotic missions, without a failure ever at-
tributed to an EKF. To document the best of these practices is the purpose that motivates
the contributors of the present document.
Kernals of such best practices have appeared, scattered throughout the open technical
literature, but such contributions are limited by organizational publication policies, and in
some case by technology export considerations. Even within NASA, there has heretofore
not been any attempt to codify this knowledge into a readily available design handbook
that could continue to evolve along with the navigation community of practice. As a result,
even with the Agency, it is possible for isolated practitioners “not to know any better:” to
fail to appreciate the subtleties of successful and robust navigation filter design, and to lack
an understanding of the motivations for, and the implied cost/benefit trade, of many of the
tried and true approaches to filter design.
Some limited progress toward filling this void has been made at a summary level in
reports and briefings prepared for the NASA Engineering and Safety Center (NESC) [13].
In particular, one of a series of recommendations in Reference 13 “...directed towards the
development of future non-human rated [rendezvous] missions...” included as its fourteenth
recommendation the admonishment to “[u]tilize best practices for rendezvous navigation
filter design.” This recommendation listed eight such practices, as follows:
• Non-simultaneous measurements
• Backward smoothing (for BETs)
• Error Budgets
• Sensitivity Analysis
• IMU/Accelerometer processing
• Observability
While these summary-level lists give the community a place to start, they are lacking in
some respects. They lack sufficient rationale that would motivate a designer to adopt them.
Even if so motivated, a designer needs much more detailed information concerning how to
implement the recommendations.
The present work is an attempt to address these shortcomings. Each contributor has
selected one aspect of navigation filter design, or several closely related ones, as the basis
of a chapter. Each chapter clearly identifies best practices, where a consensus of the com-
munity of practice exists. While it is sometimes difficult to cast aside one’s opinions and
express such a consensus, each contributor has made a best e↵ort in this regard. Where a
diversity of opinion exists, the chapter will summarize the arguments for and against each
approach. Also, if promising new developments are currently afoot, the chapter will assess
their prospects.
While the contributors strive for consistency of convention and notation, each has his
own preferences, and readers may need to accommodate subtle di↵erences along these lines
as they traverse the book. The first chapter, which summarizes the EKF, sets the stage,
and should be briefly perused by even seasoned navigators in order to become familiar with
the conventions adopted for this work. Subsequent chapters should stand on their own, and
may be consulted in any order.
While this is a NASA document concerned with space navigation, it is likely that many
of the principles would apply equally to the wider navigation community. That said, readers
should keep in mind that hard-earned best practices of a particular discipline do not always
carry over to others, even though they may be seemingly similar. To assume so is a classic
example of the logical fallacy argumentum ad verecundiam, or the argument from [false]
authority.
Finally, the contributors intend for this work to be a living document, which will continue
to evolve with the state of the practice.
Notational Conventions
A A set
a, ↵, A Scalars
q, M An array of scalars, e.g. column, row, matrix
x A point in an abstract vector space
~
r A physical vector, i.e. an arrow in 3-D space
F A coordinate frame
T
M The transpose of the array M
kxk The (2-)norm of the vector x
y A random variable
z A random vector
px (x) The probability density function of the random variable x evaluated at the
realization x
Pr(y < Y ) The probability that y < Y
E[z] The expectation of the random vector z
exp(t) The exponential function of t
et exp(t) written as Euler’s number raised to the t power
dx Leibniz’ (total) di↵erential of x
dy
dx (First) (total) derivative of y with respect to x
dn y dn
dxn = dxn y nth (total) derivative of y with respect to x
Fdn
dtn ~
r ~ with respect to t in frame F
nth (total) derivative of r
@M
@x (First) partial derivative of M with respect to x
@M
@x xo (First) partial of M with respect to x, evaluated at xo
ix
CHAPTER 1
As described in the preface, use of the Extended Kalman Filter (EKF) for navigation has
a long history of flight-proven success. The EKF thus forms the foundational best practice
advocated by this work, and it forms the basis for many of the best practices later chapters
describe. The purpose of the present chapter is not to derive the EKF and its relations, but
rather to present them in a basic form, as a jumping o↵ point for the rest of the material
we shall present. As we shall show, while the EKF is a powerful and robust algorithm, it
is based on a few ad hoc assumptions, which can lead to misuses and misunderstandings.
Many of the best practices we shall describe are tricks of the trade that address such issues.
1.1. The Additive Extended Kalman Filter
The additive EKF is distinguished from the multiplicative EKF (MEKF) by the form
of its measurement update. The additive EKF is the usual and original form of the EKF,
and when we refer to the EKF without a modifier, one may assume we mean the additive
form.
1.1.1. The Dynamics Model Suppose we have a list of n real quantities that we
need to know in order to perform navigation, and we have a di↵erential equation that tells
us how these quantities evolve through time, such as
Ẋ(t) = f (X(t), t) (1.1)
where X 2 Rn , which we call the state vector, contains the quantities of interest; we shall
call the f (X, t) the dynamics function. If we knew these quantities perfectly at any time,
(1.1) would allow us to know them at any other time. For a variety of reasons, this is not
the case however; both the initial conditions and the dynamics function are corrupted by
uncertainty.
Suppose instead that the quantities of interest are realizations of a random process, X(t),
whose distribution at some initial time to is known to us, and whose evolution (forward) in
time follows the stochastic di↵erential equation given by
dX(t) = f (X(t), t) + B(t) dw(t) (1.2)
where the presence of the process noise dw(t) reflects uncertainty in the dynamics. To
interpret (1.2), imagine dw(t) as the limit of a discrete sequence of random increments,
as the time between increments goes to zero. The result will be a continuous but non-
di↵erentiable process; hence the notation Ẋ(t) has ambiguous meaning. Henceforth, we
shall define our notation such that when we write
Ẋ(t) = f (X(t), t) + B(t)w(t) (1.3)
1
2 1. THE EXTENDED KALMAN FILTER
any particular time in the future depends only on its present value, and its accumulated
di↵usion due to the process noise over the interval between now and the future time of
interest. Random processes such as this are said to possess the Markov Property. Using
this property, we can write (1.11) in terms of the measurement history as follows:
pXi |Yi 1 (Xi |Yi 1)
pXi |Yi (Xi |Yi ) = pYi |Xi (Yi |Xi ) (1.12)
pYi |Yi 1 (Yi |Yi 1)
Even if we could compute all of the PDFs in (1.12), we are not guaranteed that the
sequence of measurements provide sufficient information to reduce the initial uncertainty
of all the modes of (1.1). If the system given by (1.1) and (1.10) is such that use of (1.12)
results in uncertainty in all the modes going asymptotically to zero in finite time, from any
initial condition, then we say the system is globally asymptotically observable. If at least all
of the unstable modes are observable, then we say the system is detectable. Unfortunately,
for nonlinear systems, there is no known way to compute global observability. At best, under
certain restrictions on (1.1) and (1.10), we can in principal establish local observability, in
the neighborhood of a particular initial condition. However, this is a laborious calculation,
often numerically unstable to evaluate. Also, note that observability is a property of the
structure of (1.1) and (1.10), and hence is dependent on how one chooses to represent the
navigation problem. Hence, a system that is observable with one representation may be
unobservable with a di↵erent representation.
Kalman’s original filter, which we now usually call the linear Kalman filter (LKF),
is the result when the dynamics and measurement models are linear, Markov, Gaussian,
and observable. An appreciation of the linear Kalman filter is essential to understanding
the strengths and weaknesses of the EKF, although it is almost never the case that such
assumptions are valid for real-world navigation problems.
1.1.3. The Linear Kalman Filter Suppose the dynamics and measurements are
given by the following discrete-time linear models:
xi = i,i 1 xi 1 + i ui (1.13)
yi = Hi xi + vi (1.14)
with
E[xo ] = x̄o and E[(xo x̄o )(xo x̄o )]T ] = Po (1.15)
⇥ ⇤
E[ui ] = 0 and E i ui uTj Ti = Si ij (1.16)
and the moments of vi as given by (1.8). This system will be globally observable if the
observability Gramian is strictly positive definite,
k
X
0 T
Wk = i,1 Hi Hi i,1 >0 (1.17)
i=1
i.e. it has full rank.
With such assumptions, Kalman showed [36] that Algorithm 1.1 provides an optimal
(both minimum variance and maximum liklihood) estimate of the moments of the PDFs
appearing in (1.11). Note that in Algorithm 1.1, the covariance recursion given by (1.19)
and (1.22) does not depend on the measurement history, and hence one may compute the
gain sequence, Ki , o↵-line and store it as a time-indexed table or schedule, along with i,i 1
and Hi . Also note that because the system is globally observable, there is no chance that it
4 1. THE EXTENDED KALMAN FILTER
will fail to converge from any initial condition, except perhaps due to build up of numerical
truncation and/or roundo↵ error.
If we further suppose the dynamics and measurements are given by linear time-invariant
(LTI) models,
xi = xi 1 + u i (1.23)
yi = Hxi + vi (1.24)
then we may test its global observability using a somewhat simpler calculation than (1.17),
as follows: 2 3
H
6 H 7
6 7
6 H 2 7
W=6 7>0 (1.25)
6 .. 7
4 . 5
H n 1
If the system is detectable, then it turns out that the covariance recursion given by (1.19)
and (1.22) reaches a steady-state, which we denote P1 . The corresponding gain is K1 =
P1 HT R 1 . There exist numerous software packages that will compute such quantities,
e.g. the Matlab Control Systems Toolbox, which may unfortunately lead to their misuse
in inappropriate contexts. Perhaps worse, experts from other domains, who are familiar
with techniques such as pole placement for control of LTI systems, may recognize that the
steady-state linear Kalman filter is “just a pole placement algorithm,” and may infer that
the EKF is not much more than a clever pole placement algorithm as well. As we shall show
below, this is far from being the case; the EKF operates directly on the nonlinear system
of interest, for which such LTI concepts have dubious applicability.
1.1.4. The Linearized Kalman Filter An immediately apparent generalization of
the linear Kalman Filter is to use it to solve for small corrections to a nonlinearly propagated
reference trajectory. While such an approach may have certain applications over limited
time horizons, and/or for ground-based applications where an operator can periodically
intervene, experience with onboard navigation systems has shown that such corrections can
fail to remain small enough to justify the required approximations.
1.1.5. The Extended Kalman Filter There are a number of ways to proceed from
Algorithm 1.1 to “derive” the EKF, but all contain a variety of ad hoc assumptions that
are not guaranteed to hold in all circumstances. Most weaknesses and criticisms of the
EKF arise from such assumptions. Rather than reproduce one or more of such derivations,
we will simply point out that if one replaces (1.18) with an integral of (1.1) over the time
between measurements, and computes the coefficient matrices appearing in (1.13) and (1.14)
1.1. THE ADDITIVE EXTENDED KALMAN FILTER 5
as Jacobians evaluated at the current solution of (1.1), then the result is Algorithm 1.2,
which bears more than a passing resemblance to the Kalman filter.
@f @hi
A(t) = , Hi = (1.27)
@X X̂(t) @X X̂i
Z ti
(ti , ti 1) = A(⌧ ) (ti , ⌧ ) d⌧, (ti 1 , ti 1 ) =I (1.28)
ti 1
Z ti Z ti
Si = (ti , ⌧ )B(⌧ ) E[w(⌧ )wT ( )] BT ( ) T
(ti , ) d⌧ d (1.29)
ti 1 ti 1
+
Pi = (ti , ti 1 )Pi 1
T
(ti , ti 1) + Si , P+
0 = Po (1.30)
T T 1
Ki = Pi Hi Hi Pi Hi + Ri (1.31)
⇣ ⌘
X̂i+ = X̂i + Ki Yi hi (X̂i ) (1.32)
P+
i = Pi Ki Hi Pi (1.33)
This implies that an initially Gaussian distribution for the state cannot in general
remain Gaussian. At best, all we can hope is that X̂i ⇡ E[X|Yi 1 ].
• Let us define the estimation error as e(t) = X(t) X̂(t). Then since X̂(t) 6=
E[X(t)|Yi 1 ],
P(t) 6= E[e(t)eT (t)|Yi 1 ] (1.35)
At best, all we can hope is that P(t) ⇡ E[e(t)eT (t)|Yi 1 ].
• Let us define the innovation as ri = Yi h(X̂i ). Then since h(X̂i ) 6= E[Y],
Hi Pi HTi + Ri 6= E[ri rTi ] (1.36)
At best, all we can hope is that the above will hold approximately.
• Taken together, the approximations listed above imply that (1.32) and (1.33) can
at best satisfy (1.12) only approximately, not only because the mean and covariance
are approximations, but also because the PDFs fail to remain Gaussian, and hence
fail to be characterized completely by only their first two moments.
• Even if all of the above are reasonable approximations, there is a problem with
(1.33). The posterior covariance should be approximated by
⇥ + + T ⇤
P+
i ⇡ E ei (ei ) |Yi (1.37)
6 1. THE EXTENDED KALMAN FILTER
• In general, one would need to simultaneously integrate (1.26) and (1.28) due to
their interdependence via the Jacobian A. If the time between measurements is
small enough, then if one were to employ a suitable approximation for (1.28),
perhaps as simple as
(ti , ti 1) ⇡ I + A(ti ) (ti ti 1) (1.44)
then one may reasonably expect that a carefully chosen approximation would be
no worse than the many other approximations inherent in the EKF. One may also
consider the same or simpler approximations when considering approximations to
(1.42).
• Because there no way to prove global observability for a nonlinear system, the
EKF may fail to converge from some initial conditions, even if the system is locally
observable in particular neighborhoods.
In light of the above observations, we conclude this section by presenting a slightly
improved version of the EKF as Algorithm 1.3. In subsequent chapters, we shall describe
additional improvements to the EKF.
1.2. THE MULTIPLICATIVE EXTENDED KALMAN FILTER 7
@f @hi
A(t) = , Hi = (1.46)
@X X̂(t) @X X̂i
(ti , ti 1 ) = a suitable approximation to (1.28) (1.47)
Si = a suitable approximation to (1.42) (1.48)
Pi = (ti , ti 1 )P+ i 1
T
(ti , ti 1 ) + Si , P+
0 = Po (1.49)
1
Ki = Pi HTi Hi Pi HTi + Ri (1.50)
⇣ ⌘
X̂i+ = X̂i + Ki Yi hi (X̂i ) (1.51)
P+
i = (I Ki Hi ) Pi (I T
Ki Hi ) + Ki Ri Ki T
(1.52)
As Chapter 1 pointed out, the EKF estimate X̂(t) is at best an approximation for E[X(t)|Y],
and hence the EKF symbol P(t) is at best an approximation for E[e(t)eT (t)|Y]. In the
present Chapter we discuss best practices for maintaining such approximations, and hence-
forth we will simply refer to P(t) as the covariance matrix.
2.1. Metrics for Orbit State Covariances
To discuss what makes one covariance approximate better or worse than another, we
must be able to compare matrices to one another, and hence we must adopt metrics. For ma-
trices that serve as coefficients, there exist various matrix norms that serve. For covariance
matrices, we are usually more interested in measures of the range of possible realizations
that could be drawn from the probability distribution characterized by the covariance. The
square root of the trace of the covariance is often a reasonable choice, since it is the root sum
squared (RSS) formal estimation error. A drawback to the trace for orbital applications
is that coordinates and their derivatives typically di↵er by several orders of magnitude, so
that for example the RSS position error will dominate the RSS velocity error if the trace
is taken over a 6 ⇥ 6 Cartesian position and velocity state error covariance, unless some
scaling is introduced. Although arbitrary scalings are possible, we discuss several metrics
herein that have been found to be especially suitable to space applications.
Orbit determination is distinguishable from other types of positioning and navigation
not only by the use of dynamics suitable to orbiting bodies, but also by a fundamental need
to produce states that predict accurately. This need arises because spacecraft operations
require accurate predictions for acquisition by communications assets, for planning future
activities such as maneuvers and observations, for predicting conjunctions with other space
objects, etc. For closed, i.e. elliptical, orbits about most planetary bodies, the two-body
potential dominates all other forces by several orders of magnitude. Thus, in most cases, the
ability of an orbit estimate to predict accurately is dominated by semi-major axis (SMA)
error, a. This is because SMA error translates into period error through Kepler’s third
law, and an error in orbit period translates into a secularly increasing error in position along
the orbit track. As Reference [6] shows, the along-track drift per orbit revolution, s, for
an elliptical orbit with eccentricity e is bounded by
r
1+e
s = 3⇡ a from periapse to periapse (2.1)
1 e
r
1 e
s = 3⇡ a from apoapse to apoapse (2.2)
1+e
9
10 2. THE COVARIANCE MATRIX
This phenomenon is especially significant for rendezvous and formation flying applications,
where relative positions must be precisely controlled.
For a central body whose gravitational constant is µ, the SMA of a closed Keplerian
orbit, a, may be found from the vis viva equation,
µ µ v2
= + (2.3)
2a r 2
from which one can see that achieving SMA accuracy requires good knowledge of both
radius, r, and speed, v. What is less obvious from (2.3) is that radius and speed errors
must also be both well-balanced and well-correlated to maximize SMA accuracy [6, 9, 27],
as Figure 1 illustrates. In this figure, radius error, r , has been normalized by the squared
10 4 ρrv =0 10000
ρrv = −0.5
ρrv = −0.9
ρrv = −0.99
3 10000
10
1000
2
10
σv /(nvc /v)
10 1
10
10 0
100
10 -1 0.1 1
10 -2 0.01
-2 -1 0 1 2 3 4
10 10 10 10 10 10 10
σr /(r2 /a2 )
figures of merit for evaluating orbit determination performance, particularly for relative
navigation applications.
In fact, it is easy to show that any function of two (scalar) random variables possesses
a similar correlation and balance structure, at least to first order. For example, navigation
requirements for atmospheric entry are often stated in terms of flight-path angle error, .
Since sin = r rk · v
~ /k~ v k, then from geometrical considerations we should expect that
~ /k~
depends on the component of position error which is in the local horizontal plane, in
the direction of the velocity vector, and on the component of velocity error that is normal
to both velocity and angular momentum, i.e. binormal to the velocity vector. These are
of course the in-plane components of position and velocity that are normal to radius and
speed. Thus by using the pair a and as metrics, we can fully characterize the correlation
and balance of the in-plane covariance components. The following subsections derive these
relationships.
2.1.1. Variance of an Arbitrary Function of Two Random Variables Suppose
there exists a random variable z which is a possibly nonlinear function of two other random
variables, x and y, such that
z = f (x, y) (2.4)
and let the joint covariance of x and y be given by
2
x ⇢xy x y
P= 2 (2.5)
⇢xy x y y
The variance of z is given by ⇥ ⇤
2
z = E (z E[z])2 (2.6)
where Z 1 Z 1 Z 1
E[z] = ⇣ pz (⇣) d⇣ = f (⇠, ⌘) pz (f (⇠, ⌘)) d⇠ d⌘ (2.7)
1 1 1
Let x̂ = E[x] and ŷ = E[y]. Then E[x x̂] = 0 and E[y ŷ] = 0. Expanding f (x, y) around
f (x̂, ŷ) in a Taylor series to first order, we find that
f (x, y) ⇡ f (x̂, ŷ) + fx · (x x̂) + fy · (y ŷ) (2.8)
where fx and fy are the partials of f with respect to x and y, respectively; so, to first order,
ẑ = E[z] = E[f (x, y)] ⇡ f (x̂, ŷ) (2.9)
Now let us similarly expand (z E[z])2 :
(z E[z])2 = (f (x, y) f (x̂, ŷ))2 (2.10)
⇡ fx2 · (x 2
x̂) + 2fx fy · (x x̂)(y ŷ) + fy2 · (y ŷ) 2
(2.11)
Taking expectations on both sides yields
⇥ ⇤ ⇥ ⇤ ⇥ ⇤
E (z E[z])2 = fx2 E (x x̂)2 + 2fx fy E[(x x̂)(y ŷ)] + fy2 E (y ŷ)2 (2.12)
2
z = fx2 x2 + 2fx fy ⇢xy x y + fy2 y2 (2.13)
T
= FPF (2.14)
where F = [fx , fy ]. Since 1 < ⇢xy < 1, it is clear that a high negative correlation
between x and y will minimize z for given values of x and y , but if either fx x >> fy y
or fx x << fy y , the impact of the negative correlation will be insignificant. Thus, the
only way to simultaneously achieve z << fx x and z << fx y is when ⇢xy ⇡ 1 and
12 2. THE COVARIANCE MATRIX
fx x ⇡ fy y , which are the correlation and balance conditions mentioned above, and which
occur along the diagonal of Figure 1.
Note also that by defining new variables scaled by their respective partial derivatives,
x̃ = xfx and ỹ = yfy , and correspondingly ˜x = fx x and ˜y = fy y , then a normalization
of the fashion described above is also possible:
q
z = ˜x2 + 2⇢xy ˜x ˜y + ˜y2 (2.15)
2.1.2. Semi-Major Axis Variance To derive a relationship for semi-major axis vari-
ance, let us take variations on (2.3), which results in
a 2 r 2v v
2
= 2 + (2.16)
a r µ
If we replace the variations with deviations of random variables from their expectations,
and the non-deviated terms with their expected values, we find that
✓ ◆
2 (r r̂) v̂(v v̂)
(a â) = 2â + (2.17)
r̂2 µ
which by squaring and taking expectation yields the following equation for the SMA vari-
ance: ⇢
2 4 1 2 v̂ v̂ 2 2
a = 4â + 2 ⇢ rv r v + (2.18)
r̂4 r µr̂2 µ2 v
For the normalization used in Figure 1, rewrite (2.18) as
s✓ ◆2 ✓ ◆✓ ◆ ✓ ◆2
r r v v
a =2 + 2⇢rv + (2.19)
r̂2 /â2 r̂2 /â2 µ/(â2 v̂) µ/(â2 v̂)
and note that µ/(â2 v̂) = n̂v̂c /v̂. As mentioned above, normalizing radius and speed standard
deviation in this manner permits comparison of data across all points in all closed orbits.
If the orbit is exactly circular, then further simplification of (2.18) is possible. In this
case, a = r and v/µ = Tp /2⇡, where Tp is the orbit period. Then (2.18) may be rewritten
as s ✓ ◆ ✓ ◆2
2
Tp Tp 2
a =2 r +2 ⇢rv r v + v (2.20)
2⇡ 2⇡
For orbit determination applications, the state representation most often chosen is a
rT , v
Cartesian inertial state vector, x = [~ ~ T ]T . Rewriting (2.3) as
✓ ◆ 1
2 v k2
k~
a(x) = (2.21)
k~rk µ
and taking partials yields
@a 2 ~rˆ T ˆT
~v
= Fa (x̂) = 2â (2.22)
@x x̂ r̂3 µ
so that
2
a = Fa (x̂)Px FTa (x̂) (2.23)
where Px is the state error covariance.
2.1. METRICS FOR ORBIT STATE COVARIANCES 13
2.1.3. Flight-Path Angle Variance The flight-path angle, , is the angle between
the velocity vector and the local horizontal plane; it is therefore the complement of the
angle between the position and velocity vectors, so that
= arcsin(ur~ · uv~ ) (2.24)
where ur~ = r~ /r and uv~ = v
~ /v. Taking partials with respect to x, we find that
T
@ 1 uv̂ sin ˆ uTr̂ uTr̂ sin ˆ uTv̂ ⇡ ⇡
F (x̂) = =p , <ˆ< (2.25)
@x x̂ 1 sin ˆ2 r̂ v̂ 2 2
so that
2
= F (x̂)Px FT (x̂) (2.26)
which is a form suitable for use in an OD filter estimating a Cartesian inertial state.
For analysis, a simpler form of (2.26) is as follows. Let us define two vectors that are
normal to both the position vector and a vector normal to the orbit plane, n ~ : the unit
in-track vector,
uv~ ur~ sin
u v~ = un ~ ⇥ ur ~ = p (2.27)
1 sin2
which defines a unit vector in the orbit plane that is along the orbit track at apoapsis and
periapsis, and the unit bi-normal vector,
ur~ uv~ sin
u~b = uv~ ⇥ un ~ = p (2.28)
1 sin2
which defines a unit vector in the orbit plane that is along the position vector at apoapsis
and peripasis. Let us next define a composite transformation matrix as follows. Let
⇥ ⇤
Mrtn = ur~I u v~ I un ~I (2.29)
where the subscript I indicates that the specified vector is expressed in an inertial basis.
The unitary matrix MTrtn thus transforms coordinates of vectors in physical space, which
are given in the frame I, to a coordinates defined in the basis given by the the radial,
transverse, and normal unit vectors. Similarly, define Mvnb as
⇥ ⇤
Mvnb = uv~ I un ~ I u~bI (2.30)
Now define the block diagonal transformation matrix M as
Mrtn 03x3
M= (2.31)
03x3 Mvnb
Using the matrix M, we can transform the state error covariance, given in inertial coor-
dinates, such that its position error covariance is expressed in the “RTN” frame, and its
velocity error covariance is in the “VNB” frame. Using the preceding results in (2.26) results
in considerable simplification:
2
⇥ T T
⇤ MTrtn 03x3 Mrtn 03x3 ut̂ /r̂
= ut̂ /r̂ ub̂ /v̂ Px (2.32)
03x3 MTvnb 03x3 Mvnb ub̂ /v̂
⇣ ⌘2 ⇣ ⌘⇣ ⌘ ⇣ ⌘
rt rt vb vb 2
= + 2⇢rt vb + (2.33)
r̂ r̂ v̂ v̂
which demonstrates the aforementioned assertion that flight-path angle error depends on
the in-track component of position error, and the bi-normal component of velocity error.
Note that (2.33) possesses the desirable feature that the relevant covariance information
(in-track position variance and bi-normal velocity variance) is normalized by radius and
14 2. THE COVARIANCE MATRIX
speed, allowing di↵ering orbital conditions to be readily compared with one another. In
most applications, rt << r̂ and vb << v̂ so that these ratios can be taken as small angles
and expressed in angular measures commensurate with the units chosen for flight-path angle
itself. Figure 2 employs this convention.
10 -1
0.01
10 -2
10 -3
0.1
0.0001
10 -4
0.001
10 -5
1e-05
10 -6
10 -6 10 -5 10 -4 10 -3 10 -2 10 -1
2.2.1. Matrix Ricatti Equation Many textbooks on Kalman filtering derive (2.40)
as the solution of a matrix Ricatti equation; using the notation of Chapter 1, this takes the
following form:
Ṗ(t) = A(t)P(t) + P(t)AT (t) + Q(t) (2.41)
Use of (2.41) would seem to avoid the need to perform the integrations required to compute
the state transition matrix (STM) and process noise covariance (PNC). In orbit determina-
tion practice however, (2.40) has been found to be more numerically stable and also, despite
16 2. THE COVARIANCE MATRIX
the need to compute or approximate the state transition and process noise integrals, more
efficient than (2.41).
2.2.2. State Transition Matrix A common approach in ground-based OD, especially
in the batch least squares context, is to simultaneously integrate the STM along with the
state vector,
Ẋ(t) = f (X(t), t), X(to ) = Xo (2.42)
˙ (t, to ) = A(t) (t, to ), (to , to ) = I (2.43)
When coupled with a good numerical integration algorithm, this method has excellent
fidelity, which is rarely necessary in onboard OD applications. Lear [40] studied a number
of practical methods for computing the STM in the onboard OD context. As his report is
not widely available, we will summarize some key findings here.
Lear’s approach was to compare various orders of truncated Taylor series and Runge-
Kutta approximations to the solution of (2.43). He used these STM approximations to
propagate an initially diagonal covariance for one revolution in a two-body circular orbit
around a point mass with the GM of Earth. By comparing these results to those he obtained
using an analytic STM, Lear could compute the maximum step size that would result in a
given relative accuracy for radius and speed formal standard deviations. Table 1 lists a few
of Lear’s results. Notably, Method H has the desirable feature that simply saving the value
of the state Jacobian from the previous propagation step allows for more than an order of
magnitude increase in allowable time step, with essentially the same computational burden
as Method B. If it is not too burdensome to compute A at the midpoint of the propagation
step, then Method G o↵ers nearly equivalent performance without the need to retain the
previous value of A. For higher-rate propagations, Method B o↵ers far more accuracy
than Method A with only a small additional computational burden. While Method A
appears to be a poor choice for many applications, it does play a central role in some useful
approximations to the process noise covariance, as the sequel shows.
2.2.3. Process Noise Covariance As stated in Chapter 1, nearly all practical meth-
ods for computing the process noise covariance assume that E[w(t)wT (⌧ )] = Q(t) (t ⌧ ), so
that (1.29) simplifies to a single integral for the process noise covariance1, given by (1.42),
1A notable exception is the work of Wright [76], which describes a correlated process noise model that
is intended to account for gravity modeling errors in a “physically realistic” manner. Although this method
2.2. COVARIANCE PROPAGATION 17
Tapley, Schutz, and Born [69] describe two approximations to the portion of (2.44) which
corresponds to position and velocity state noise, which have proven useful in both ground-
and onboard-OD applications. Reference 69 refers to these methods as “State Noise Com-
pensation” (SNC) and “Dynamic Model Compensation” (DMC). Before describing SNC
and DMC however, we will consider some inappropriate models.
Chapter 1 pointed out that
"Z #
ti Z ti
E (ti , ⌧ )B(⌧ )w(⌧ )wT ( )BT ( ) T (ti , ) d⌧ d
ti 1 ti 1
"Z Z #
ti ti
6= E (ti , ⌧ )B(⌧ )w(⌧ ) d⌧ wT (⌧ )BT (⌧ ) T
(ti , ⌧ ) d⌧ (2.45)
ti 1 ti 1
Let us explore the implications of assuming equality of the expression above. Suppose we
assume that the process noise increments are approximately constant over some particular
interval t = ti ti 1 , and that E[w(ti )wT (tj )] = W(ti ) ij , where ij denotes the Kronecker
delta function. Then,
"Z Z ti #
ti
E (ti , ⌧ )B(⌧ )w(⌧ ) d⌧ wT (⌧ )BT (⌧ ) T (ti , ⌧ ) d⌧
ti 1 ti 1
Z ti Z ti
= (ti , ⌧ )B(⌧ ) d⌧ E[w(ti )w(ti )T ] BT (⌧ ) T
(ti , ⌧ ) d⌧
ti 1 ti 1 (2.46)
T
= i Wi i
There is a subtlety with (2.46) that can lead to issues: if the time interval associated with the
assumption that E[w(ti )wT (tj )] = W(ti ) ij is not the same as the time interval associated
Rt
with the integral i = tii 1 (ti , ⌧ )B(⌧ ) d⌧ , then the process noise will not be consistently
applied. For example, if the EKF is tuned at a particular time step using a particular noise
covariance W, and then for some reason the time step is changed, then one must retune
the value of W.
A similar issue occurs when the process noise covariance is chosen without regard for
the dynamics, e.g. by setting it equal to a diagonal matrix of user-specified parameters.
Whatever careful tuning has been done to choose such parameters will be invalidated by a
change in the time step.
2.2.3.1. State Noise Compensation For SNC, as applied to OD, we assume velocity error
is an uncorrelated random walk with fixed intensity in orbit-fixed coordinates, such as the
RTN or VNB coordinates described above. Thus, assuming RTN coordinates without loss
of generality, the process noise spectral density matrix becomes
2 3
qr 0 0
Q(t) = Qrtn = 4 0 qt 0 5 (2.47)
0 0 qn
has had occasional onboard application, it is more widely known for its inclusion in commercial-o↵-the-shelf
software for ground-based OD.
18 2. THE COVARIANCE MATRIX
We assume the transformation from orbit-fixed coordinates to the coordinates used for
navigation, which are typically inertial coordinates, is approximately constant over the
interval t = ti ti 1 , and ignore the correlation-inducing dependence of this transformation
on the estimated position and velocity. This results in
03⇥3
B(t) = Brtn = (2.48)
Mrtn
We also assume that t is small enough that a 1st -order Taylor series truncation (Lear’s
Model A) is adequate for modeling the STM (ti , ⌧ ) in the integrand of (2.44), and that
Mrtn is constant. With these assumptions, (2.44) becomes
2 3
t3 t2
6Q̃ 3 Q̃
2 7
Si = 6
4 2
7
5 (2.49)
t
Q̃ Q̃ t
2
where Q̃ = Mrtn Qrtn MTrtn . Note that with the SNC model, velocity covariance grows
linearly with time, as expected for a random walk model, and hence we should expect the
p
units of q i , i = r, t, n to be meters per second3/2 .
2.2.3.2. State Noise Compensation for Maneuvers During powered flight, it is often
necessary to include additional process noise to accommodate maneuver magnitude and
direction errors. One approach is to simply define an additional SNC process noise covari-
ance, with intensities that are sized to the maneuvering errors. While this works fine for
modeling maneuver magnitude errors, direction errors may be more accurately modeled by
recognizing that a misaligned maneuver vector may be represented by (I3 ✓⇥ ) v~ nom ,
⇥
where ✓ represents a skew-symmetric matrix of small angle misalignments, and v ~ nom
is the nominal maneuver vector. Thus, v ⇥
~ nom ✓ is the error in the velocity increment
due to maneuver direction errors, or if sensed accelerations are being fed-forward into the
dynamics, due to IMU misalignments. To model these direction errors as a process noise
term, let
03⇥3
B(t) = ⇥ (2.50)
~ nom
v
and let q✓ be the intensity of the maneuver direction noise. Then the SNC-style process
noise for accommodating maneuver direction errors becomes
2 3
⇥ 2 t3 ⇥ 2 t2
~ nom )
6 ( v ( v~ nom )
S i = q✓ 6 3 2 7 7 (2.51)
4 t2 5
⇥
~ nom )
( v 2 ( v ⇥ 2
~ nom ) t
2
⇥ T
⇥ ⇥ )2 . A version of this method was used by the Space Shuttle
since v ~ nom ~ nom
v = ( v~ nom
during powered flight with IMU-sensed accelerations.
2.2.3.3. Dynamic Model Compensation The DMC approach assumes the presence of
exponentially-correlated acceleration biases, which are included as additional solve-fors in
the filter state. As Chapter 5 and Appendix A discuss, a model for such biases is given by
t
b(t + t) = e b(t) + $(t)
⌧ (2.52)
⇣ 2 t
⌘
where b(to ) ⇠ N (0, pbo ), and $(t) ⇠ N (0, q⌧
2 1 e ⌧ ). As Chapter 5 discusses, ⌧
is a time constant controlling the “smoothness” of the random process, and q is a power
2.2. COVARIANCE PROPAGATION 19
spectral density that describes the intensity of the random input. While Chapter 5 discusses
a variety of other bias models that might be used, the exponentially-correlated model has
proved to be a best practice for applications in which there are measurements continually
available to persistently excite it. Refer to Chapter 5 for a fuller discussion of the relative
merits of various bias modeling approaches.
As above, without loss of generality we can assume the acceleration biases are aligned
with the RTN frame, and again assume that t is small enough that a 1st -order Taylor series
truncation is adequate for modeling the portion of the STM corresponding to position and
velocity errors. With these assumptions, the terms appearing the integrand of (2.44) become
2 3
03⇥3
B(t) = Brtn = 4 03⇥3 5 (2.53)
Mrtn
and
2 3
I3 tI3 ⌧ t ⌧ 2 (1 e t/⌧ ) I3
(t + 4
t, t) = 03⇥3 I3 ⌧ (1 e t/⌧ )I3 5 (2.54)
03⇥3 03⇥3 e t/⌧ I3
with
n⇣ ⌘ ⇣ ⌘ o
⌧5 2 t/⌧ 2 t t/⌧ t 2 2 t 3
pp = 2 1 e + ⌧ 1 2e 2 ⌧ + 3 ⌧ (2.56)
n⇣ ⌘ ⇣ ⌘ ⇣ ⌘ o
⌧4 2 t/⌧ t/⌧ 2 t t/⌧ t 2
pv = 2 e 1 2 e 1 + ⌧ e 1 + ⌧ (2.57)
n⇣ ⌘ o
⌧3 2 t/⌧ 2 t t/⌧
pa = 2 1 e ⌧ e (2.58)
n⇣ ⌘ ⇣ ⌘ o
⌧3 2 t/⌧ t/⌧
vv = 2 1 e 4 1 e + 2 t/⌧ (2.59)
⇣ ⌘2
⌧2 t/⌧
va = 2 1 e (2.60)
⇣ ⌘
⌧ 2 t/⌧
aa = 2 1 e (2.61)
2In (2.55), the matrix Q̃ has the same form as it does for the SNC method, but with pq , i = r, t, n
i
now representing acceleration intensities, with units of meters per second5/2 . Also note that (2.55) assumes
the same time constant is applicable to all three acceleration channels. While this is usually sufficient, it is
straightforward to extend (2.55) to accommodate separate time constants for each channel.
20 2. THE COVARIANCE MATRIX
2.2.3.4. Explicit Dynamic Biases While the DMC approach allows for quite general
estimation of otherwise unmodeled forces on the spacecraft, it is often the case that the
domain of application provides context that can narrow the filter designer’s focus. For
example, it may be the case that the only under-modeled force of appreciable significance
on the spacecraft is drag, or perhaps solar radiation pressure, within the context of the
application. Alternatively, the application may require much higher resolution models than
DMC, which might necessitate estimation of smaller forces with larger uncertainties such
as Earth radiation pressure, spacecraft thermal emission, etc., or panel-based modeling of
drag and/or solar radiation pressure, etc. In such cases, it is often useful to tailor the
DMC approach so that it estimates model-specific biases, such a drag or SRP corrections,
rather than modeling three general RTN biases. Similarly, during powered flight, maneuver
magnitude and direction errors might be more successfully modeled explicitly.
As an example of an explicit bias, consider estimating a multiplicative correction to the
density; a similar approach may be used for drag or solar radiation pressure coefficients.
Let t denote geocentric coordinate time. Let R denote a planetary-body-fixed, body-centric
system of coordinates, aligned with the central body’s rotation axis. Let I denote a body-
centric, celestially-referenced system of coordinates, aligned with R at an epoch to . Let r ~
represent the position of the center of gravity of a satellite, expressed in I. Let v represent
~
the satellite’s velocity within I. Let v~ r represent the satellite’s velocity within frame R.
Assume that r ~ evolves with respect to t and I in the vicinity of to according to
Id2
✓ ◆
µ 1 A ⇢
~=
r ~
r CD ⇢ 1 + ~r
vr v (2.62)
dt2 r3 2 m ⇢
where r = k~r k, vr = k~
vr k, ⇢ is an atmospheric density disturbance, ⇢ is the undisturbed
~ r , m is the satellite
atmospheric density, A is the area of the satellite in a plane normal to v
mass, and CD is the satellite’s coefficient of drag. Assume that ⇢/⇢ is a random process
that formally evolves as a first-order Gauss-Markov process, similar to a DMC bias:
✓ ◆ ✓ ◆
d ⇢ 1 ⇢
= + w⇢ (2.63)
dt ⇢ ⌧ ⇢
r0 , v
where q⇢ is the intensity of w⇢ . Let the state vector be x = [~ ~ 0 , ( ⇢/⇢)]0 . With these
assumptions, the state dynamics and noise input partials are
2 3 2 3
03⇥3 I3 03⇥1 03⇥1
A(t) = 4G(t) + Dr (t) Dv (t) d(t) ~ 5 , B(t) = 403⇥1 5 (2.64)
01⇥3 01⇥3 1/⌧ 1
~ = 1 CD A ⇢vr v
where G(t) is the gravity gradient matrix, d(t) ~ r is the nominal drag accel-
2 m
eration, and Dr and Dv are partials of the drag acceleration with respect to position and
velocity, respectively3.
Using the DMC results from above, with the assumption that the nominal drag accel-
eration is approximately constant over the integration time, t, the term (t + t, t)B(t)
3The sensitivity D contains terms that are roughly proportional to the drag acceleration magnitude
r
divided by the atmospheric scale height, and to the product of drag acceleration magnitude and the ratio
of planetary rotation rate to speed relative to the atmosphere. For nearly all spacecraft, these terms will be
many orders magnitude smaller than the gravity gradient, which is proportional to the gravity acceleration
divided by the radius. So Dr can usually be neglected. For reference, Dv = 12 CD m A
⇢vr (~ ~ 0vr + I3 ),
uvr u
0 ⇥ ⇥
and Dr = d/Rs (~ ~ vr ) Dv !
ur u ~ , where !~ is the skew-symmetric “cross-product” matrix formed from the
central body’s rotation rate vector, and the notation u~ (·) indicates the unit vector of its subscript.
2.2. COVARIANCE PROPAGATION 21
2.2.3.5. Episodic Dynamic Biases It has sometimes been found to be the case, particu-
larly for crewed missions, that episodic spacecraft activities can produce un-modeled accel-
erations. In the early days of manned spaceflight, events such as vents, momentum unloads,
RCS firings, etc. that may perturb a spacecraft’s trajectory were not well modeled, and came
to be described as FLAK, which was supposed to be an acronym for (un)-Fortunate Lack
of Acceleration Knowledge.
If the mean time between such activities can be characterized, along with the expected
intensity of the acceleration, a compound Poisson-Gaussian process noise model may be
e↵ective. Conveniently, it turns out that the covariance of a linear system driven by a train
of Gaussian-distributed impulses whose arrival times follow a Poisson distribution is the
same as the covariance of the same system driven by a white noise input process, except
for the scaling of the process noise covariance by the Poisson process rate parameter [24].
To understand this result, consider a linear model of the error in a spacecraft trajectory
as follows
ẋ(t) = A(t)x(t) + B(t)u(t) (2.67)
where x represents the deviation of the actual position/velocity state from its estimated or
nominal value, and u represents the FLAK. Then, if we make use of inertial coordinates,
0 I 0
A(t) = , B(t) = (2.68)
G(t) 0 M(t)
where G represents the gravity gradient matrix, and M is the direction cosine matrix
rotating the supposed body-fixed FLAK into the inertial frame. There is no general solution
to this di↵erential equation, but over short time intervals, we can assume that
Z tk
x(tk ) = (tk , tk 1 )xk 1 + (tk , ⌧ )B(⌧ )u(⌧ ) d⌧ (2.69)
tk 1
where
(tk , tk 1 ) = I + A(tk )(tk tk 1 ) (2.70)
Since the input is a sequence of impulses of (random) length nk ,
nk
X
u(t) = ui (t ti )) (2.71)
i=1
with tk < ti < tk+1 , the delta functions annihilate the integral and our model becomes
(with ti = tk ti ):
nk
X ~ i ti
x(tk ) = (tk , tk 1 )xk 1 + ui (2.72)
~i
i=1
22 2. THE COVARIANCE MATRIX
where ~ i is the unit inertial direction vector for the FLAK event. Note that the input
response is in some sense fundamentally an increment to the entire state vector at each tk ;
we can however compute an equivalent zero-order hold acceleration by dividing the velocity
increment by the time step tk tk 1 .
Since we assume each impulse is Gaussian, the input response has zero mean. To find a
tractable form for the covariance, assume that the direction of the FLAK event is constant
over each interval tk tk 1 . This assumption assures that the impulses are identically
distributed over each sampling interval. Then, the process noise covariance is given by
Reference 24:
Z tk
~ k (tk ⌧ ) ⇥ 0
⇤
S(tk ) = q ~ 0k d⌧
~ k (tk ⌧ ), (2.73)
tk 1 ~k
where q is the intensity of the Gaussian impulses and is the rate parameter of the Poisson
process. Carrying out the integration results in
~ 0k t3 /3
~ k
~ 0k t2 /2
~ k
S(tk ) = q 0 (2.74)
~ k
2
~ k t /2 ~ k
~ 0k t
2.2.3.6. Computational Considerations The primary computational issues that can af-
fect covariance propagation may be broadly characterized as underflow and overflow. Under-
flow can occur especially because many of the process noise parameters described above may
often have values that approach computational truncation limits, which can lead to non-
positive-definite process noise covariances. Overflow can similarly occur when truncation
limits are approached by very large covariance values such as can occur with long propaga-
tion times. Because the orbital dynamics are at best marginally stable, even propagating
without process noise can result in di↵erences of many orders of magnitude between largest
and smallest eigenvalues. This problem will be exacerbated if process noise is present, since
all of the process noise models described above introduce unbounded position covariance
error growth4.
Simple tricks like enforcing symmetry, or adding a small positive diagonal matrix, will
not always ensure positive eigenvalues in such cases. A better solution is to maintain the
covariance in factorized form, for example as Chapter 7 describes. In lieu of a fully fac-
torized filtering approach, process noise factors may be computed from their factorizations.
Chapter 5 shows a few examples of Cholesky factorizations that may be employed in this
fashion.
2.2.4. Tuning the Covariance Propagation Since even the best practices this
Chapter has discussed are at best approximations, it is inevitable that EKF designers must
perform some artful tuning of the free parameters to achieve acceptable results. Further-
more, computational limitations of flight computers often lead to the need for compromises
in modeling fidelity. What one generally hopes to accomplish via tuning of the covariance
propagation is that any approximations or compromises the EKF has had to endure to be
implementable have not impaired its covariance’s accuracy too much. In particular, one
would like to compute an idealized “truth” covariance matrix, based on the best-available
models and data, and adjust the EKF’s “formal” covariance via the tuning process to yield
some semblance of a match.
4Reference 8 proposes an approximate “solution” to this problem via a Floquet analysis of a modified
set of covariance propagation dynamics that include artificially-introduced damping.
2.2. COVARIANCE PROPAGATION 23
K
X
T 1
Êe {e(ti )e (ti )} = ej (ti )eTj (ti ) (2.78)
K 1
j=1
5Stationary data are those for which the statistics do not change when the time origin shifts.
24 2. THE COVARIANCE MATRIX
The ensemble statistics will generally give the best indication of performance if
the data are non-stationary, so long as we use an adequate number of Monte Carlo
cases.
Since there is nothing analogous to an ensemble of Monte Carlo cases for flight data,
we cannot use ensemble statistics as defined above. Let d represent the random di↵erence
vector between the quantity of interest and its comparison value. We can apply time series
statistics, but as the mission evolves, the span of the time series continually extends, so we
have to decide which subsets of the entire mission span to use, e.g. the time series extending
back over the entire history of the mission, extending back only over some shorter interval,
etc., and also how frequently to recompute the time series statistics, e.g. continuously, once
per day, etc. There are some other approximations to the expectation that we might use
here.
Sliding Window Time Series Expectation: We can take statistics of the di↵er-
ence realizations, d(ti ), across a sliding window extending t into the past from
each observation, for each case, to get the t-sliding window time series expecta-
tions:
n 1
1 X
Êt, t (d) = d(tN i ) (2.79)
n
i=0
n 1
X
1
Êt, t (ddT ) = d(tN i )d(tN i )T (2.80)
n 1
i=0
where n is the number of time samples in the window t.
Period-Folding Expectation: If the data are periodic, we can break up the data
into K spans of one period in duration each, and shift the time origin of each span
so that the data are “folded” into the same, one-period-long interval. We can then
take ensemble statistics over times at the same phase angle, t i , within each period.
K
1 X
Êf {d(t i )} = dj (t i ) (2.81)
K
j=1
K
X
1
Êf {d(t i )d(t i )T } = dj (t i )dTj (t i ) (2.82)
K 1
j=1
It is often useful to fold the data into bins of equal mean anomaly. This is especially
useful for orbits with notable eccentricity, since it ensures that a roughly equal
number of time points will be present in each bin.
Sliding Window Period-Folding Expectation: Period-folding can obviously be
applied over a sliding window as well, with each window extending n periods into
the past.
n 1
1X
Êf,n {d(t i )} = dK j (t i ) (2.83)
n
j=0
n
X1
1
Êf,n {d(t i )d(t i )T } = dK T
j (t i )dK j (t i ) (2.84)
n 1
j=0
This is especially useful for identifying secular trends in periodic data sets.
2.3. COVARIANCE MEASUREMENT UPDATE 25
2.2.4.2. Tuning for Along-track Error Growth As described above, the position error
component along the orbit track will dominate covariance propagation error, and so the
most important step in tuning the covariance propagation is to ensure that this component
grows no faster or slower than it should based on the truncations and approximations that
the EKF design has employed. One may use any of the analytical or empirical methods
described above to estimate the “true” covariance. For example, for preflight analysis,
one may generate a time series or ensemble of time series of di↵erences between states
propagated using the formal models the filter employs, and a best available “truth” model
of the system. One can then compare the appropriate empirical covariance computed from
this data set to the filter’s formal covariance, and adjust the process noise intensities until
a reasonable match occurs. For flight data analysis, one may similarly di↵erence across
overlaps between predictive and definitive states, and compare these empirical covariances
of these di↵erences to the sum of the predictive and definitive formal covariances from the
filter.
If one uses the SMC method, the primary “knob” for tuning the alongtrack covariance
growth rate is the corresponding alongtrack component of the process noise intensity qT or
qV , depending on whether RTN or VNB components are used, respectively. Essentially, an
impulse along the velocity vector, or change in speed, causes a change in SMA, corresponding
to a change in period, and hence a secular growth in position error along the orbit, as
discussed at the beginning of this Chapter. This mechanism is especially transparent for
near-circular orbits, and some simple analysis yields a good starting point. One may find a
fuller exposition of the following result in Reference 21.
For near-circular orbits, the position components of the integrand in (2.44) become, in
RTN coordinates,
2 3
qR 0 0
rv ( t) 4 0 qT 0 5 Trv ( t) (2.85)
0 0 qN
where rv ( t) is given, per Hill, Clohessy, and Wiltshire, by
2 3
sin(n t)/n 2(1 cos(n t))/n 0
rv ( t) = 42(cos(n t) 1)/n 4 sin(n t)/n 3 t 0 5 (2.86)
0 0 sin(n t)/n
Retaining only secular terms and carrying out the integral, the along-track component of
the process noise covariance becomes
ST ( t) ⇡ 3 t3 qT (2.87)
an approximation which holds for t > Tp . Thus, one may use an empirical covariance
of the along-track error after one orbit period, such as ˆ 2s = Êf { s2 }, to derive a starting
point from which to tune qT , as
ˆ2
qT = s3 (2.88)
3Tp
computational burden from O(n3 ) to O(n2 /2), where n is the state dimension.
Although Algorithm 2.1 does not show the state update, it may also be sequentially
updated as part of the iteration. However, the order in which the scalar measurements
update the state can a↵ect the outcome, if the measurement partials are computed one
column at a time, corresponding with each scalar update. This may produce undesirable
or even unstable outcomes. Chapter 3 will discuss such issues further.
Despite the extensive and successful flight heritage of Algorithm 2.1, it cannot guarantee
numerical stability and positive definiteness of the covariance. Therefore, the recommended
best practice for the covariance update is to utilize the U D-factorization, which Chapter 7
describes.
2.3. COVARIANCE MEASUREMENT UPDATE 27
2.3.2. Use of Consider States It may often be the case that unobservable states
are present in the system being estimated. Most commonly, such states will be parameters
whose values are unknown or uncertain. Inclusion of such parameters as solve-for states in
the EKF is a not a recommended practice. However, if the EKF completely ignores the
uncertainty that such parameters introduce, its covariance can become overly optimistic, a
condition sometimes known as “filter smugness.” One approach to addressing this problem
was introduced by Schmidt [64], originally in the context of reducing the computational
burden that the EKF imposed on flight computers of the 1960’s. Schmidt’s idea is essentially
for the EKF to maintain a covariance containing all of the states whose uncertainties are
significant enough to a↵ect filter performance, but only to update a subset of those states.
The states which are not updated in this framework are typically known as “consider”
parameters, and such a filter has been called a “consider filter” or a “Schmidt-Kalman”
filter. Although most commonly the state space is simply partitioned by selecting states
as either solve-for or consider states, Reference 48 points out that partitioning using linear
combinations of the full state space is also possible.
Following Reference 48, suppose the filter produces estimates for a subset of ns solve-for
states, out of the full state of size n. The filter does not estimate the remaining nc = n ns
consider states. Denote the true solve-for vector by s(t), and the true consider vector by c(t).
Assume that these are linear combinations of the true states, according to the following:
s(t) = S(t)x(t) and c(t) = C(t)x(t) (2.91)
where the ns ⇥ n matrix S(t) and the nc ⇥ n matrix C(t) are such that the matrix
S
M= (2.92)
C
is non-singular. The inverse of M is partitioned into an n ⇥ ns matrix S̃ and an n ⇥ nc
matrix C̃: h i
M 1 = S̃, C̃ . (2.93)
The properties of the matrix inverse then lead immediately to the identities
SS̃ = Ins , CC̃ = Inc , SC̃ = 0ns ⇥nc , CS̃ = 0nc ⇥ns , (2.94)
and
S̃S + C̃C = In . (2.95)
In the usual case that the elements of the solve-for and consider vectors are merely selected
and possibly permuted components of the state vector, the matrix M is an orthogonal
permutation matrix. In this case, and in any case for which M is orthogonal, the matrices
S̃ and C̃ are just the transposes of S and C, respectively, which makes inversion of M
unnecessary and simplifies many of the following equations.
It follows from Eqs. 2.91 and 2.95 that
x(t) = S̃(t)s(t) + C̃(t)c(t). (2.96)
Relations similar to Eq. 2.91 give the estimated solve-for vector ŝ(t) and the assumed con-
sider vector ĉ(t) in terms of the estimated state x̂(t). Thus, errors in the solve-for and
consider states are given by
es (t) = s(t) ŝ(t) = S(t)e(t) (2.97)
ec (t) = c(t) ĉ(t) = C(t)e(t) (2.98)
28 2. THE COVARIANCE MATRIX
and the true error may be written in terms of the solve-for and consider errors by
e(t) = S̃(t)es (t) + C̃(t)ec (t). (2.99)
In terms of this notation, the EKF update has the form
ŝ+
i = ŝi + Ki ri (2.100)
where
ri = yi ŷi = Hi ei + vi , (2.101)
and the subscript i is a shorthand for the time argument ti . The usual EKF will not contain
the full covariance, but only its solve-for part
Pss (t) = E[es (t)eTs (t)] (2.102)
By contrast, the Schmidt-Kalman filter will use the full covariance, P(t). In the usual case,
the Kalman gain is given by
⇥ ⇤ 1
Ki = Pssi HTsi Hsi Pssi HTsi + Ri (2.103)
where
Hsi = Hi S̃i (2.104)
In the Schmidt-Kalman case,
⇥ ⇤ 1
Ki = Si Pi HTi Hi Pi HTsi + Ri (2.105)
Thus, the Schmidt-Kalman gain matrix is computed from the full covariance, but only
applies measurement innovations to the solve-for states.
2.4. Sigma-Point Methods
Before concluding this Chapter, it is worth noting that a promising method for propa-
gating and updating the covariance that is coming into greater use and acceptance within
the navigation community is the use of sigma-point methods, also known as “unscented”
transforms. A subsequent chapter covering Advanced Topics will cover these topics in a
section on sigma-point filtering.
CHAPTER 3
Processing Measurements
This chapter will discuss how to handle real-world measurement processing issues. In par-
ticular, being able to handle measurements that aren’t synchronous is of paramount impor-
tance to running filters in a real-time environment. As well, the performance of navigation
filters which have nonlinear measurement models are susceptible to divergence depending
on the order of processing of measurements which occur at the same epoch. Therefore, a
technique which provides invariance to measurement processing is detailed. A technique
for processing correlated measurements is presented, and brief comments on filter cascading
and processing of inertial data are o↵ered.
3.1. Measurement Latency
In general, the measurement time tags are not going to be equal to the current filter
epoch time, tk . To state it another way, the measurements do not come in at the current
filter time. Rather, they may be latent by up to p seconds. Thus, a situation will arise where
the filter has propagated its state and covariance to time t = tk from time t = tk 1 , and is
subsequently given a measurement to be filtered (denoted by subscript m) that corresponds
to the time t = tm , where
tm t k (3.1)
If t = tm tk is not insignificant, the time di↵erence between the measurement and the
filter state and covariance will need to be accounted for during filtering in order to accurately
process the measurement. This can be done in much the same way a batch filter operates
(see pages 196-197 of Tapley [69]). If the measurement at time t = tm is denoted as ym ,
the nominal filter state at that time is given by X⇤m ⌘ X⇤ (tm ) (⇤ denotes the nominal), and
the measurement model is denoted as hm (Xm , tm ), then one can expand the measurement
model to first order about the nominal filter state to get
hm (Xm , tm ) = hm (X⇤m , tm ) + Hm xm + ⌫m (3.2)
where xm = Xm X⇤m and Hm is defined as
✓ ◆
@hm (X, tm )
Hm = (3.3)
@X X=X⇤m
The perturbed state at time tm can be written in terms of the state at time tk as follows
xm = (tm , tk )xk + m wm (3.4)
so we can compute the measurement as
hm (Xm , tm ) = hm (X⇤m , tm ) + Hm (tm , tk )xk Hm m wm + ⌫m (3.5)
29
30 3. PROCESSING MEASUREMENTS
⇥ 2⇤
where the measurement noise has the characteristics E[⌫m ] = 0 and E ⌫m = Rm , the state
process noise from t = tm to t = tk has the characteristics E[wm ] = 0 and E[wm wTm ] = Qm ,
and the state deviation is given by
xk = Xk = Xk X⇤k (3.6)
(Note that the e↵ect of the state process noise would be to increase the measurement noise
variance. However, because the process noise term is very small over time periods of a few
seconds, it can safely be neglected for the remainder of this analysis.)
Upon taking the conditional expectation of the measurement equation and rearranging,
the scalar residual of the measurement is given by
ym H e m x̂k ( ) = ym hm (X⇤ , tm ) Hm (tm , tk )x̂k ( ) (3.7)
m
where ˆ· denotes an estimated value,
ym = ym hm (X⇤m , tm )
x̂k ( ) = X̂k ( ) X⇤k (3.8)
The measurement partials that are used in the update, which map the measurement to the
state at time t = tk , are given by
e m = Hm (tm , tk )
H (3.9)
Eq. 3.9 was derived by noting that
Hm x̂m = Hm (tm , tk )x̂k
e m x̂k
= H (3.10)
From the above discussion, it is evident that the unknown quantities needed to update
the state at time t = tk with a measurement from time t = tm are the nominal state at the
measurement time, X⇤m , and the state transition matrix relating the two times, (tm , tk ).
Given those values, hm (X⇤m , tm ) and He m can be calculated.
Thus the nominal state at the measurement time is calculated by back-propagating the
filter state from time tk to time tm using bu↵ered IMU data. The same thing is done to
calculate the required state transition matrix. The same propagation algorithms used in
forward propagation ought to be utilized for the back-propagation, with the exception that
the smaller time step allows for a 1st -order approximation of the matrix exponential used
to update the state transition matrix.
3.2. Invariance to the Order of Measurement Processing
It has long been known that the performance of an EKF is dependent on the order in
which one processes measurements. This is of particular import in the case when there
is powerful measurement coupled with a large a priori error. The state (and covariance)
update will be large, very likely out of the linear range. Subsequent measurements which
are processed may well be outside the residual edit thresholds, and hence will be rejected.
In order to remedy this, we employ a hybrid Linear/Extended Kalman Filter measurement
update. Recall that in an Extended Kalman Filter, the state is updated / relinearized
/ rectified after each measurement is processed, regardless of whether the measurements
occur at the same time. Hence, the solution is highly dependent on the order in which the
measurements are processed. This is not a desirable situation in which to be.
We obviate this difficulty simply by not updating the state until all the measurements
at a given time are processed. We accumulate the state updates in state deviations x, using
3.3. PROCESSING VECTOR MEASUREMENTS 31
Algorithm 3.1. This algorithm makes use of the fact that, in the absence of process noise, a
batch/least squares algorithm is mathematically equivalent to a linear Kalman Filter [25].
Algorithm 3.1 is a recommended best practice. Algorithm 3.1 may readily be combined with
Algorithm 3.1 Measurement Update Invariant to Order of Processing
for each (scalar) measurement j = 1 through k do
yj = Yj hj (X⇤m , tm )
@h
Hj = @Xj (X⇤m , tm )
1
Kj = Pj HTj Hj Pj HTj + Rj
x̂j x̂j Kj (yj Hj x̂j )
Pj (I Kj Hj ) Pj (I Kj Hj )T + Kj Rj KTj
end for
Xm X⇤m + x̂j
the residual mapping approach described above when the measurements are asynchronous.
Algorithm 3.1 may also be readily combined with Algorithm 2.1, for cases in which the
preferred factorized covariance methods are precluded.
3.3. Processing Vector Measurements
If the UDU factorization is used, measurements need to be processed as scalars. If
the vector measurements are correlated, one option is to assume they are uncorrelated and
ignore the correlations between the measurements.
However, there is a better alternative. Given the measurement equation (Yj = Hj (Xj , tj )+
⌫ j ) with measurement error covariance matrix, Ri , first decompose the matrix with a
Cholesky factorization as
⇥ ⇤ 1/2 T/2
Ri = E ⌫ j ⌫ Tj = Ri Ri (3.11)
1/2
and premultiply the measurement equation by Ri to yield
Ỹj = H̃j (Xj , tj ) + ⌫
˜j (3.12)
with
1/2
Ỹj = Ri Yj (3.13)
1/2
H̃j (Xj , tj ) = Ri Hj (Xj , tj ) (3.14)
1/2
˜j
⌫ = Ri ⌫j (3.15)
so that E[˜⌫j⌫˜ j ] = I. Thus, the new measurement equation has errors which are now
decorrelated.
Alternatively, one can decompose the m ⇥ m measurement error covariance matrix
with a UDU decomposition as Ri = URi DRi UTRi so that using a similar reasoning, we
premultiply the measurement equation by URi1 so that in this case
Ỹj = URi1 Yj (3.16)
1
H̃j (Xj , tj ) = URi Hj (Xj , tj ) (3.17)
1
˜j
⌫ = U Ri ⌫ j (3.18)
so that E[˜ ˜ j ] = DRi where DRi is a diagonal matrix and, as in the case of the Cholesky
⌫j⌫
decomposition, the new measurement model has decorrelated measurement errors.
32 3. PROCESSING MEASUREMENTS
Measurement Underweighting
4.1. Introduction
Given an m-dimensional random measurement y which is somehow related to an un-
known, n-dimensional random vector x the family of affine estimators of x from y is
x̂ = a + K y (4.1)
where a 2 <n and K 2 <n⇥m . The optimal, in a Minimum Mean Square Error sense, affine
estimator has
K = Pxy Pyy1 (4.2)
a = E[x] K E[y] (4.3)
where
h⇣ ⌘⇣ ⌘T i
Pxy = E x E[x] y E[y] (4.4)
h⇣ ⌘⇣ ⌘T i
Pyy = E y E[y] y E[y] (4.5)
In the presence of nonlinear measurements of the state,
y = h(x) + v (4.6)
(where v is zero-mean measurement noise) the extended Kalman filter (EKF) [20] approxi-
mates all moments of y by linearization of the measurement function centered on the mean
of x. This methodology has proven very e↵ective and produces very satisfactory results
in most cases. Approaches other than the EKF exist, for example the Unscented Kalman
Filter [32] approximates the same quantities via stochastic linearization using a determin-
istic set of Sigma Points. High order truncations of the Taylor series are also possible.
Underweighting [38, 40] is an ad-hoc technique to compensating for nonlinearities in the
measurement models that are neglected by the EKF and successfully flew on the Space
Shuttle and on Orion Exploration Flight Test 1.
The commonly implemented method for the underweighting of measurements for hu-
man space navigation was introduced by Lear [39] for the Space Shuttle navigation system.
In 1966 Denham and Pines showed the possible inadequacy of the linearization approx-
imation when the e↵ect of measurement nonlinearity is comparable to the measurement
error [12]. To compensate for the nonlinearity Denham and Pines proposed to increase the
measurement noise covariance by a constant amount. In the early seventies, in anticipation
of Shuttle flights, Lear and others developed relationships which accounted for the second-
order e↵ects in the measurements [40]. It was noted that in situations involving large state
33
34 4. MEASUREMENT UNDERWEIGHTING
errors and very precise measurements, application of the standard extended Kalman filter
mechanization leads to conditions in which the state estimation error covariance decreases
more rapidly than the actual state errors. Consequently the extended Kalman filter be-
gins to ignore new measurements even when the measurement residual is relatively large.
Underweighting was introduced to slow down the convergence of the state estimation error
covariance thereby addressing the situation in which the error covariance becomes overly
optimistic with respect to the actual state errors. The original work on the application of
second-order correction terms led to the determination of the underweighting method by
trial-and-error [39].
More recently, studies on the e↵ects of nonlinearity in sensor fusion problems with
application to relative navigation have produced a so-called “bump-up” factor. [16, 44, 56,
58]. While Ferguson [16] seems to initiate the use of the bump-up factor, the problem
of mitigating filter divergence was more fully studied by Plinval [58] and subsequently by
Mandic [44]. Mandic generalized Plinval’s bump-up factor to allow flexibility and notes
that the value selected influences the steady-state convergence of the filter. In essence, it
was found that a larger factor keeps the filter from converging to the level that a lower factor
would permit. This finding prompted Mandic to propose a two-step algorithm in which the
bump-up factor is applied for a certain number of measurements only, upon which the factor
was completely turned o↵. Finally, Perea, et al. [56] summarize the findings of the previous
works and introduce several ways of computing the applied factor. In all cases, the bump-up
factor amounts in application to the underweighting factor introduced in Lear [39]. Save for
the two-step procedure of Mandic, the bump-up factor is allowed to persistently a↵ect the
Kalman gain which directly influences the obtainable steady-state covariance. E↵ectively,
the ability to remove the underweighting factor autonomously and under some convergence
condition was not introduced.
The work of Lear is not well known as it is only documented in internal NASA memos
[39, 40]. Kriegsman and Tau [38] mention underweighting in their 1975 Shuttle navigation
paper without a detailed explanation of the technique.
4.2. Nonlinear E↵ects and the Need for Underweighting
We review briefly the three state estimate update approaches assuming a linear time-
varying measurement model leading to the classical Kalman filter, a nonlinear measure-
ment model with first-order linearization approximations leading the widely used extended
Kalman filter, and a nonlinear model with second-order approximations leading to the
second-order extended Kalman filter.
4.2.1. Linear Measurement Model and the Classical Kalman Filter Update
Let’s briefly recap the linear Kalman filter, the measurement model is
yi = Hi xi + vi , (4.7)
where yi 2 Rm are the m measurements at each time ti , xi 2 Rn is the n-dimensional state
vector, Hi 2 Rm⇥n is the known measurement mapping hmatrix, i vi 2 R is modeled as a
m
all expectations are conditioned on past measurements and we find that the state estimate
update is given by [20]
x̂+
i = x̂i + Ki [yi h(x̂i )] . (4.17)
Similarly, the measurement residual is given by
r i = yi h(x̂i ) ' Hi ei + vi , (4.18)
T
Computing the measurement residual covariance E[ri ri ] yields
Wi = Hi Pi HTi + Ri . (4.19)
The state estimation error covariance and Kalman gain are the same as in Eqs. (4.9) and
(4.10), respectively, with Hi given as in Eq. (4.15). The state estimation error covariances
in the forms shown in Eqs. (4.11) and (4.12) also hold in the nonlinear setting with Hi as
in Eq. (4.15).
From Eqs. (4.12) and (4.17), it is seen that reducing the Kalman gain leads to a smaller
update in both the state estimation error covariance and the state estimate, respectively.
Reducing the gain and hence the update is the essence of underweighting and the need for
this adjustment is illuminated in the following discussion.
Adopting the viewpoint that the state estimation error covariance matrix represents the
level of uncertainty in the state estimate, we expect that when we process a measurement
(adding new information) that the uncertainty would decrease (or at least, not increase).
This is, in fact, the case and can be seen in Eq. (4.12). Under the assumption that the
symmetric matrices Pi > 0 and Ri > 0, it follows that
Ki [Hi Pi HTi + Ri ]KTi 0, (4.20)
and we can find a number ↵i 0 at each time ti such that
Pi P+
i ↵i I , (4.21)
which shows that the Pi P+
i is non-negative definite. The same argument can be made
from the viewpoint of comparing the trace (or the matrix norm) of the a posteriori and
a priori state estimation error covariances. As each new measurement is processed by the
EKF, we expect the uncertainty in the estimation error to decrease. The question is, does
the a posteriori uncertainty as computed by the EKF represent the actual uncertainty, or
in other words, is the state estimation error covariance matrix always consistent with the
actual state errors? In the nonlinear setting when there is a large a priori uncertainty in the
state estimate and a very accurate measurement, it can happen that the state estimation
error covariance reduction at the measurement update is too large. Underweighting is a
method to address this situation by limiting the magnitude of the state estimation error
covariance update with the goal of retaining consistency of the filter covariance and the
actual state estimation error through situations of high nonlinearity of the measurements.
Pre- and post-multiplying the a posteriori state estimation error covariance in Eq. (4.12)
by Hi and HTi , respectively, yields (after some manipulation)
Hi P+ T T T
i Hi = Hi Pi Hi (Hi Pi Hi + Ri )
1
Ri . (4.22)
T
In Eq. (4.22), we see that if Hi Pi Hi Ri then it follows that
Hi P+ T
i Hi ' Ri . (4.23)
The result in Eq. (4.23) is of fundamental importance and is the motivation behind un-
derweighting. What this equations express is the fact that when the a priori estimated
4.2. NONLINEAR EFFECTS AND THE NEED FOR UNDERWEIGHTING 37
state uncertainty Hi Pi HTi is much larger than the measurement error covariance Ri , the
Kalman filter largely neglects the prior information and relies heavily on the measurement.
Therefore the a posteriori estimated state uncertainty Hi P+ T
i Hi is approximately equal to
Ri .
4.2.3. Nonlinear Measurement Model and the 2nd-Order Kalman Filter Up-
date Eq. (4.14) truncates the Taylor series expansion of the nonlinear measurement func-
tion to first order, carrying it to second order we obtain
m
!
X @ 2 h(xi , ti )
yi ' h(x̂i , ti ) + Hi ei + ei (k) ei + vi (4.24)
@xi @xi (k) xi =x̂
k=1 i
= h(x̂i , ti ) + Hi ei + bi + vi , (4.25)
where ei (k) and xi (k) are the k-th elements of vectors ei and xi , respectively. The expected
value of the measurement now includes contributions from the second order terms, denoted
as b̂i
E[yi ] ' h(x̂i , ti ) + b̂i (4.26)
Define " #
@ 2 hi (xi )
Hi,k ,
T
,
@xi @xTi xi =x̂
i
where hi (xi ) is the k-th component of h(xi ). Then the k-th component of bi is given by
1 1
bi,k = (ei )T HTi,k ei = tr(HTi,k ei (ei )T ) . (4.27)
2 2
where tr denotes the trace. To keep the filter unbiased the k-th component of b̂i is given by
b̂i,k = 1/2 tr(HTi,k Pi ) .
The measurement residual is
ri = yi E[yi ] (4.28)
Expanding Eq. (4.28), the k-th component of the residual is obtained to be
ri,k = hTi,k ei + 1/2 tr(HTi (ti )ei (ei )T ) 1/2 tr(HTi (ti )Pi ) + vi,k , (4.29)
T
where hi,k is the ik-th row of the measurement Jacobian and vi,k is the k-th component of
the measurement noise vi . Computing the measurement residual covariance E[ri riT ] yields
Wi = Hi Pi HTi + Bi + Ri , (4.30)
where matrix Bi is the contribution of the second order e↵ects and its (kj)-th component
is given by
⇥ ⇤
Bi,kj , 1/4 E tr(HTi (ti )ei (ei )T ) tr(HTi (ti )ei (ei )T )
1/4 tr(HTi (ti )Pi ) tr(HTi (ti )Pi ) .
where it was assumed that the third order central moments are all zeros. Assuming the
prior estimation error is distributed as a zero-mean gaussian distribution with covariance
Pi , the ij th component of Bi is given by
1
Bi,kj = tr(HTi,k Pi HTi,j Pi ) . (4.31)
2
Comparing the measurement residual covariance for the EKF in Eq. (4.19) with the
measurement residual covariance for the second-order filter in Eq. (4.30), we see that when
38 4. MEASUREMENT UNDERWEIGHTING
the nonlinearities lead to significant second-order terms which should not be neglected, then
the EKF tends to provide residual covariance estimates that are not consistent with the
actual errors. Typically, we address this by tuning the EKF using Ri as a parameter to
be tweaked. If the contribution of the a priori estimation error Hi Pi HTi to the residuals
covariance is much larger than the contribution of the measurement error Ri , the EKF
algorithm will produce Hi P+ T
i Hi ' Ri . If Bi is of comparable magnitude to Ri then the
actual covariance of the posterior measurement estimate should be Hi P+ T
i Hi ' Ri + Bi .
Therefore, a large underestimation of the a posteriori covariance can occur in the presence
of nonlinearities when the estimated measurement error covariance is much larger than the
measurement error covariance.
The covariance update is given by the modified Gaussian second order filter update [30]
T
P+
i = Pi Hi Pi Wi 1
Hi Pi , (4.32)
where the residual covariance Wi is given by Eq. (4.30).
4.3. Underweighting Measurements
In the prior section we saw that when “large” values of Hi Pi HTi exist (or similarly,
large values of Pi ), and possibly “small” values of Ri , the EKF is at risk underestimating
the posterior estimation error covariance matrix. We must repeat that this can only happen
in the presence of “large” nonlinearities. The larger Pi , the larger the domain of possible
values of the true state x, hence the more likely the higher order terms of the expansion of the
nonlinear measurement functions will become relevant. If a measurement function is largely
non-linear, but the prior estimate is very precise, the EKF algorithm and linearization are
likely sufficiently accurate since:
(1) The posterior measurement will rely heavily on the prior and rely less on the
measurement
(2) Since the error is small, while the Hessian matrix might be relatively large, the
actual contributions of the second order e↵ects is likely to remain small
Underweighting is the process of modifying the residual covariance to reduce the update
and compensate for the second-order e↵ects described above. In this section, we describe
three common methods for performing underweighting with the EKF algorithm.
4.3.1. Additive Compensation Method The most straightforward underweighting
scheme is to add an underweighting factor Ui as
Wi = Hi Pi HTi + Ri + Ui . (4.33)
With the Kalman gain given by
1
Ki = Pi HTi Wi , (4.34)
we see that the symmetric, positive-definite underweighting factor Ui decreases the Kalman
gain, thereby reducing the state estimate and state estimation error covariance updates.
One choice is to select Ui = Bi , which is the contribution to the covariance assuming
the prior distribution of the estimation error is Gaussian. The advantage of this choice is
its theoretical foundation based on analyzing the second-order terms of the Taylor series
expansions. The disadvantages include higher computational costs to calculate the second-
order partials and the reliance on the assumption that the estimation errors possess Gaussian
distributions. In practical applications, the matrix Ui needs to be tuned appropriately for
acceptable overall performance of the EKF. The process of tuning a positive definite matrix
is less obvious than tuning a single scalar parameter.
4.4. PRE-FLIGHT TUNING AIDS 39
4.3.3. Lear’s Method Lear’s choice was to make Ui a percentage of the a priori
estimation error covariance via [39]
Ui = Hi Pi HTi . (4.36)
Let P̄i 2 R3⇥3 be the partition of the state estimation error covariance associated
q with
the position error states. The Space Shuttle employs underweighting when tr P̄i > ↵.
The positive scalars ↵ and are design parameters. For the
q Space Shuttle, ↵ is selected to
be 1000 meters and is selected to be 0.2 [39]. When tr P̄i > 1000 m, then = 0.2,
otherwise = 0.
Orion employs a slightly di↵erent approach, underweighting is applied when Hi Pi HTi >
↵, where ↵ is a tunable flight software parameter.
This choice of underweighting scheme is sounds since it assumes that the higher order
e↵ect are a fraction (or a multiple) of the first order e↵ects, which are a related quantity.
Some unusual nonlinear measurement cases where the measurement Jacobian evaluates to
zero, or a small value, while the Hessian does not vanish are not appropriately handled by
underweighting.
Running simulations pre-flight, the designer can calculate the time history of tr Bi / tr Hi Pi HTi
and choose an appropriate value of . It is unlikely that higher order terms than Bi will
need to be considered in designing the value of .
CHAPTER 5
Bias Modeling
5.1.1. Random Constant The simplest type of systematic error is a constant bias
on the measurement. There are two types of such biases: deterministic constants, which
are truly constant for all time, and random constants, which are constant or very nearly so
over a particular time of interest. For example, each time a sensor is power-cycled, a bias
associated with it may change in value, but so long as the sensor remains powered on, the
bias will not change.
In some cases, we may have reason to believe that a particular systematic error source
truly is a deterministic bias, but due to limited observability, we do not have knowledge of
41
42 5. BIAS MODELING
its true value. In such cases, we may view our estimate of the bias as a random constant,
and its variance as a measure of the imprecision of our knowledge.
Thus, we may view all constants that could be solve-for or consider parameters in orbit
determination as random constants. A model for a random constant is
ḃ(t) = 0, b(to ) ⇠ N (0, pbo ). (5.2)
Thus the unconditional mean of b(t) is zero for all time, and its unconditional covariance is
constant for all time as well. If b(t) is a filter solve-for variable that is observable, then its
covariance conditioned on the measurement sequence will reach zero in the limit as t ! 1.
This is an undesirable characteristic for application in a sequential navigation filter.
To simulate a realization of the random constant, we need only generate a random
number according to N (0, pbo ), as the previous subsection described.
5.1.2. Random Ramp The random ramp model assumes that the rate of change of
the bias is itself a random constant; thus the random ramp model is
b̈(t) = 0, ḃ(to ) ⇠ N (0, pḃo ). (5.3)
Thus, the initial condition ḃ(to ) is a random constant. For a pure random ramp, the initial
condition on b(to ) and its covariance are taken to be zero, but an obvious and common
generalization is to allow b(to ) to also be a random constant.
It is convenient to write this model as a first-order vector system as follows:
ḃ(t) ḃ(t) 0 1 b(t)
= = (5.4)
b̈(t) ḋ(t) 0 0 d(t)
ẋ(t) = Ax(t) (5.5)
The resulting output equation for the total measurement error is
⇥ ⇤
e = 1 0 x+v (5.6)
= Hx + v (5.7)
Note that the ensemble of realizations of x(t) has zero-mean for all time. The unconditional
covariance evolves in time according to
T
Px (t) = (t to )Pxo (t to ) (5.8)
where
1 t p 0
(t) = and Pxo = bo (5.9)
0 1 0 pḃo
which we can also write in recursive form as
T
Px (t + t) = ( t)Px (t) ( t) (5.10)
Thus, we can generate realizations of the random ramp with either x(t) ⇠ N (0, Px (t)) or
recursively from
x(t + t) = ( t)x(t) (5.11)
2
Note that the norm of the unconditional covariance becomes infinite as t becomes in-
finite. This could lead to an overflow of the representation of the covariance in a computer
program if the propagation time between measurements is large, if the bias is unobserv-
able, or if the bias is a consider parameter, and could also lead to the representation of
the covariance losing either its symmetry and/or its positive definiteness due to roundo↵
and/or truncation. If the bias and drift are filter solve-for variables, then the norm of their
5.2. SINGLE-INPUT BIAS STATE MODELS 43
covariance conditioned on the measurement sequence will reach zero in the limit as t ! 1.
These are all undesirable characteristics for application in a sequential navigation filter.
5.1.3. Higher-Order Derivatives of Random Constants In principle, a random
constant may be associated with any derivative of the bias in a straightforward extension
of the models above. In practice, it is rare to need more than two or three derivatives.
Conventional terminology does not appear in the literature for derivatives of higher order
than the random ramp. The slope of the bias is most commonly described as the “bias
drift,” so that a “drift random ramp” would be one way to describe a bias whose second
derivative is a random constant. The measurement partials matrix needs to be accordingly
padded with trailing zeros for the derivatives of the bias in such cases.
5.2. Single-Input Bias State Models
The simplest non-constant systematic errors are systems with a single input that is a
random process. We can think of a random process as the result of some kind of limit in
which the intervals between an uncorrelated sequence of random variables get infinitesimally
small. In this limit, each random increment instantaneously perturbs the sequence, so that
the resulting process is continuous but non-di↵erentiable. We call this kind of a random
input “process noise.”
Although such random processes are non-di↵erentiable, there are various techniques
for generalizing the concept of integration so that something like integrals of the process
noise exist, and hence so do the di↵erentials that appear under the integral signs. It turns
out that so long as any coefficients of the process noise are non-random, these di↵erentials
behave for all practical purposes as if they were di↵erentiable.
5.2.1. Random Walk The random walk is the simplest random process of the type
described above. In terms of the “formal derivatives” mentioned above, the random walk
model for a measurement bias is
ḃ(t) = w(t), w(t) ⇠ N (0, q (t s)) (5.12)
The input noise process on the right hand side is known as “white noise,” and the Dirac
delta function that appears in the expression for its variance indicates that the white noise
process consists of something like an infinitely-tightly spaced set of impulses. The term q
that appears along with the delta function is the intensity of each impulse1. The initial
condition b(to ) is an unbiased random constant. Since b(to ) and w(t) are zero-mean, then
b(t) is also zero-mean for all time. The unconditional variance of b evolves in time according
to
pb (t) = pbo + q(t to ) (5.13)
which we can also write in recursive form as
pb (t + t) = pb (t) + q t (5.14)
Thus, to generate a realization of the random walk at time t, we need only generate a
random number according to N (0, pb (t)). Equivalently, we could also generate realizations
of $(t) ⇠ N (0, q t), and recursively add these discrete noise increments to the bias as
follows:
b(t + t) = b(t) + $(t) (5.15)
1Another way to imagine the input sequence, in terms of a frequency domain interpretation, is that
it is a noise process whose power spectral density, q, is non-zero at all frequencies, which implies infinite
bandwidth.
44 5. BIAS MODELING
Note that the unconditional variance becomes infinite as t becomes infinite. This could
lead to an overflow of the representation of pb if q is large in the following circumstances:
in a long gap between measurements, if the bias is unobservable, or if the bias is a consider
parameter. These are all somewhat undesirable characteristics for application in a sequential
navigation filter. However, because the process is persistently stimulated by the input, its
variance conditioned on a measurement history will remain positive for all time. Hence the
random walk finds frequent application in sequential navigation filters, particularly when
continuous measurements from which the bias is observable are generally available, such as
often occurs for GPS data.
5.2.2. Random Run The random run model assumes that the rate of change of the
bias is itself a random walk; thus the random run model is
b̈(t) = w(t), w(t) ⇠ N (0, q (t s)) (5.16)
The initial condition ḃ(to ) is a random constant. For a pure random run, the initial condition
on b(to ) and its covariance are taken to be zero, but an obvious and common generalization
is to allow b(to ) to also be a random constant.
It is convenient to write this model as a first-order vector system as follows:
ḃ(t) ḃ(t) 0 1 b(t) 0
= = + w(t) (5.17)
b̈(t) ḋ(t) 0 0 d(t) 1
ẋ(t) = Ax(t) + Bw(t) (5.18)
The measurement partial is the same as for the random ramp. The initial condition x(to )
is an unbiased random constant. Since x(to ) and w(t) are zero-mean, then x(t) is also
zero-mean for all time. The covariance evolves in time according to
T
Px (t) = (t to )Pxo (t to ) + S(t to ) (5.19)
where
1 t p 0
(t to ) = and Pxo = bo (5.20)
0 1 0 pḃo
and
t3 /3 t2 /2
S(t) = q (5.21)
t2 /2 t
which we can also write in recursive form as
T
Px (t + t) = ( t)Px (t) ( t) + S( t) (5.22)
Thus, we can generate realizations of the random run with either x(t) ⇠ N (0, Px(t) ) or
recursively from
x(t + t) = ( t)x(t) + $(t) (5.23)
where $(t) ⇠ N (0, S( t)) is a noise sample vector arising from formal integration of the
scalar noise input process over the sample time. A Cholesky decomposition of S(t) useful
for sampling is
p
p 3t3 /3 p0
C
S(t) = p (5.24)
3t/2 t/2
Note that the norm of the unconditional covariance becomes infinite as t3 becomes
infinite, and the process is persistently stimulated by the input, so its covariance conditioned
on a measurement history will remain positive definite for all time. Hence, the random run
5.2. SINGLE-INPUT BIAS STATE MODELS 45
shares similar considerations with the random walk for application in sequential navigation
filters.
2One sometimes sees ⌧ described as the “half-life,” but since 1/ e < 1/2, this is not an accurate label.
46 5. BIAS MODELING
and
(1,1) q e 2⇣!n t 2
Q (t) = 1 (!d + 2⇣!n !d cos !d t sin !d t + 2⇣ 2 !n2 sin2 !d t) (5.40)
4⇣!n3 wd2
(2,2) q e 2⇣!n t 2
Q (t) = 1 (!d 2⇣!n !d cos !d t sin !d t + 2⇣ 2 !n2 sin2 !d t) (5.41)
4⇣!n wd2
(1,2) q
Q (t) = 2 e 2⇣!n t sin2 !d t, (5.42)
2!d
(2,1) (1,2)
Q (t) = Q (t) (5.43)
p
where !d = !n 1 ⇣ 2 . In the over-damped case (⇣ > 1), replace sin and cos with sinh
and cosh, respectively. In the critically-damped case,
! t
e n (1 + !n t) t e !n t
(t) = 2 ! t (5.44)
!n t e n e n t (1 !n t)
!
and
(1,1) q ⇥ ⇤
Q (t) =3
1 e 2!n t (1 + 2!n t + 2!n2 t2 ) (5.45)
4!n
(2,2) q ⇥ ⇤
Q (t) = 1 e 2!n t (1 2!n t + 2!n2 t2 ) (5.46)
4!n
(2,1) (1,2) qt2 2!n t
Q (t) = Q (t) = e (5.47)
2
Note that for any damping ratio, kPx k remains finite, since as t ! 1,
q 1/!n2 0
Px (t ! 1) = . (5.48)
4⇣!n 0 1
Thus, the ratio of the steady-state standard deviations of the bias and drift will be
d
= !n , (5.49)
b
and these are related to the power spectral density by
3
d
q = 4⇣ . (5.50)
b
Hence, we can choose the parameters of the SOGM so that we avoid any overflow, loss of
symmetry and/or positive definiteness of Px due to roundo↵ and/or truncation. For these
reasons, the SOGM is recommended as a best practice for bias drift modeling in sequential
navigation filters.
where, as previously, b(to ) ⇠ N (0, Pbo ), and w(t) ⇠ N (0, q (t s)). A formal solution to
(5.51) gives
Z t
t to t to s
b(t) = b(to ) e ⌧ +b1 (1 e ⌧ ) + e ⌧ w(s) ds (5.52)
to
Since b(to ) and w(t) are zero-mean, then
t to
E[b(t)] = b1 (1 e ⌧ ) (5.53)
⇥ ⇤
and E[b(t)] = b1 as t ! 1. Since E[b(t)]2 is subtracted from E b(t) 2 to get the covariance,
the covariance evolves in time identically to the FOGM,
2
(t to )
pb (t) = e ⌧ pbo + s(t to ) (5.54)
where as before
q⌧ ⇣ 2
⌘
s(t to ) = 1 e ⌧ (t to ) (5.55)
2
Thus, to generate a realization of the Vasicek Model at particular time t, we could generate
a realization of the initial bias value, and then at each sample time generate realizations of
$(t) ⇠ N (0, s( t)), and recursively add these discrete noise sample increments to the bias
sample history as follows:
t t
b(t + t) = b(t) e ⌧ +b1 (1 e ⌧ ) + $(t) (5.56)
t to
or we could generate a random realization of N (0, pb (t)) and add this to b1 (1 e ⌧ ).
To configure or “tune” the Vasicek model, one chooses the time constant ⌧ and the
noise PSD q in a manner analogous to the FOGM process; it is less clear how one might
choose b1 . Seago et al. [66] proposed that b1 be estimated as a random constant filter
state. Casting the Vasicek model into such a two-state form results in the following model:
ḃ(t) 1/⌧ 1/⌧ b(t) w(t)
= + , (5.57)
ḃ1 0 0 b1 0
which leads to the following state transition matrix and process noise covariance:
" ⇣ 2t
⌘ #
q⌧
e t/⌧ 1 e t/⌧ 2 1 e ⌧ 0
(t) = , S(t) = . (5.58)
0 1 0 0
While the Vasicek Model shares with the FOGM the desirable feature that pb ! q⌧ /2
as t ! 1, in the two-state form just described, it also has the undesirable feature that the
variance of b1 goes to zero as t ! 1. Modeling b1 with process noise, e.g. as a random
walk with PSD of q1 , introduces an unstable integral of the process noise as occurs for the
integrated FOGM:
2 ⇣ t 2t
⌘ ⇣ 2t
⌘ ⇣ t
⌘3
q⌧
q1 t 3⌧ 2 + 2⌧ e ⌧
⌧
2⇣e ⌧ + 1 e ⌧ q 1 t q 1 ⌧ 1 e ⌧
S(t) = 4 t
⌘2 5 , (5.59)
q1 t q1 ⌧ 1 e ⌧ q1 t
although choosing q1 appropriately small may mitigate this concern. In any case, retaining
a steady-state bias across long data gaps may not always be warranted, depending on the
context. And if long measurement gaps are not present, the need to retain such a bias, with
the accompanying complexity of maintaining an additional state, may not be necessary. We
will consider further such multi-input bias models in the sequel.
5.3. MULTI-INPUT BIAS STATE MODELS 49
5.3.1. Bias and Drift Random Walks (Random Walk + Random Run) A
common model for biases in clocks, gyros, and accelerometers is that the bias is driven
by both its own white noise input, and also by the integral of the white noise of its drift.
Such models derive from observations that the error magnitudes of these devices depend
on the time scale over which the device is observed. They are often characterized by Allan
deviation specifications, which may be heuristically associated with the white noise power
spectral densities. The model is as follows:
ḃ(t) 0 1 b(t) 1 0 wb (t)
= + (5.60)
ḋ(t) 0 0 d(t) 0 1 wd (t)
ẋ(t) = Ax(t) + Bw(t) (5.61)
The measurement partial is the same as for the random ramp. The initial condition x(to )
is an unbiased random constant. Since x(to ) and w(t) are zero-mean, then x(t) is also
zero-mean for all time. The covariance evolves in time according to
T
Px (t) = (t to )Pxo (t to ) + S(t to ) (5.62)
where
1 t p 0
(t) = and Pxo = bo (5.63)
0 1 0 pḃo
and
qb t + qd t3 /3 qd t2 /2
S(t) = (5.64)
qd t2 /2 qd t
which we can also write in recursive form as
T
Px (t + t) = ( t)Px (t) ( t) + S( t) (5.65)
Thus, we can generate realizations of the random run with either x(t) ⇠ N (0, Px (t)) or
recursively from
x(t + t) = ( t)x(t) + $(t) (5.66)
where $(t) ⇠ N (0, S( t)). Note that a Cholesky decomposition of S(t) is
p p
p
C qb t + qd t3 /12 qd t3 /2
S(t) = p (5.67)
0 qd t
As with its constituent models, the norm of the unconditional covariance becomes infi-
nite as t3 becomes infinite, while the process is persistently stimulated by the input, so its
covariance conditioned on a measurement history will remain positive definite for all time.
Hence, the this model shares similar considerations with its constituents for application in
sequential navigation filters.
50 5. BIAS MODELING
5.3.2. Bias, Drift, and Drift Rate Random Walks (Random Walk + Random
Run + Random Zoom) Another model for biases in very-high precision clocks, gyros,
and accelerometers is that the bias is driven by two integrals of white noise in addition
to its own white noise input. Such models are often characterized by Hadamard deviation
specifications, which may be heuristically associated with the white noise power spectral
densities. The model is as follows:
2 3 2 32 3 2 32 3
ḃ(t) 0 1 0 b(t) 1 0 0 wb (t)
4ḋ(t)5 = 40 0 15 4d(t)5 + 40 1 05 4wd (t)5 (5.68)
d̈(t) 0 0 0 ḋ(t) 0 0 1 w ḋ (t)
ẋ(t) = Ax(t) + Bw(t) (5.69)
The resulting output equation is
⇥ ⇤
e = 1 0 0 x+v (5.70)
= Hx + v (5.71)
The initial condition x(to ) is an unbiased random constant. Since x(to ) and w(t) are zero-
mean, then x(t) is also zero-mean for all time. The covariance evolves in time according
to
T
Px (t) = (t to )Pxo (t to ) + S(t to ) (5.72)
where 2 3 2 3
1 t t2 /2 pbo 0 0
(t) = 40 1 t 5 and Pxo = 4 0 pdo 0 5 (5.73)
0 0 1 0 0 pḋo
and 2 3
qb t + qd t3 /3 + qḋ t5 /5 qd t2 /2 + qḋ t4 /8 qḋ t3 /6
S(t) = 4 qd t2 /2 + qḋ t4 /8 qd t + qḋ t3 /3 qḋ t2 /25 (5.74)
3
qḋ t /6 2
qḋ t /2 qḋ t
which we can also write in recursive form as
T
Px (t + t) = ( t)Px (t) ( t) + S( t) (5.75)
Thus, we can generate realizations of the random run with either x(t) ⇠ N (0, Px (t)) or
recursively from
x(t + t) = ( t)x(t) + $(t) (5.76)
where $(t) ⇠ N (0, S( t)). Note that a Cholesky decomposition of S(t) is
2p p p 3
3 /12 + q t5 /720 t/2 q t + q t3 /12 t2 /6 q t
p q b t + q d t ḋ p d ḋ ḋ
p
C
S(t) = 4 0 qd t + qḋ t3 /12 t/2 qḋ t 5 (5.77)
p
0 0 qḋ t
Similar to its constituent models, the norm of the unconditional covariance becomes
infinite as t5 becomes infinite, while the process is persistently stimulated by the input, so
its covariance conditioned on a measurement history will remain positive definite for all time.
Hence, the this model shares similar considerations with its constituents for application in
sequential navigation filters.
5.3. MULTI-INPUT BIAS STATE MODELS 51
5.3.3. Bias and Drift Coupled First- and Second-Order Gauss-Markov The
following model provides a stable alternative, developed in Reference 10, to the “Random
Walk + Random Run” model. Note that the following description corrects a sign error in
the process noise cross-covariance results of the cited work. The transient response of the
stable alternative can be tuned to approximate the Random Walk + Random Run model,
and its stable steady-state response can be used to avoid computational issues with long
propagation times, observability, consider states, etc. Although this model has received
limited application as of the time of this writing, due to its stability, it shows promising
potential to evolve into a best practice for sequential navigation filtering applications.
The coupled first- and second-order Gauss-Markov model is as follows.
ḃ(t) 1/⌧ 1 b(t) 1 0 wb (t)
= + (5.78)
ḋ(t) !n2 2⇣!n d(t) 0 1 wd (t)
ẋ(t) = Ax(t) + Bw(t) (5.79)
The measurement partial is the same as for the random ramp. The initial condition x(to )
is an unbiased random constant. Since x(to ) and w(t) are zero-mean, then x(t) is also
zero-mean for all time. The covariance evolves in time according to
T
Px (t) = (t to )Pxo (t to ) + S(t to ) (5.80)
where
e⌘t ⌫ cos ⌫t + (⌘ + 2⇣!n ) sin ⌫t sin ⌫t
(t) = (5.81)
⌫ !n2 sin ⌫t ⌫ cos ⌫t + (⌘ + ) sin ⌫t
with
= 1/⌧, (5.82)
1
⌘= ( + 2⇣!n ) , (5.83)
r 2
1
⌫= !d2 + ⇣!n 2, (5.84)
4
p
!d = !n 1 ⇣ 2 , (5.85)
= + ⇣!n ;
2
52 5. BIAS MODELING
State Representations
This Chapter discusses state representation, primarily for translations; attitude representa-
tions are discussed in Chapter 8.
6.5.1. Dual Inertial State Representation Here, we consider only two spacecraft,
but the results are easily generalized. Let xi = [riT , viT ]T , i = 1, 2 denote the true state of
spacecraft i, with ri , vi the position and velocity vectors expressed in non-rotating coordi-
nates centered on the primary central gravitational body. Based on mission requirements,
56 6. STATE REPRESENTATIONS
any appropriate fidelity of dynamics may be directly utilized per the methods of Chapter 50,
e.g.
vi P
ẋi = µ
r + f (6.2)
kri k3 i j j
where the specific forces fj may include thrust, higher-order gravity, drag, solar radiation
pressure, gravity from non-central bodies such as the moon and the sun, etc.
Let ei = x̂i xi , where x̂i is an estimate for the state of spacecraft i. Then, the error
in the state estimate x̂ = [x̂T1 , x̂T2 ]T is e = [eT1 , eT2 ]T , and the error covariance is
T P1 P12
P = E[ee ] = T (6.3)
P12 P2
Any linear unbiased estimate of x will have the following measurement update equation:
x̂+ = x̂ + K(y h(x̂ )) (6.4)
where x̂ is the value of x̂ immediately prior to incorporating the observation, y, and h(x̂ )
is an unbiased prediction of the measurement’s value. The optimal gain is
1
K = P H T (HP H T + R) (6.5)
where R is the measurement noise covariance and H = @h(x)/@x|x̂ . Partition the update
as follows:
+
x̂1 x̂1 K1
= + (y h(x̂ )) (6.6)
x̂+
2 x̂ 2 K2
x̂1 P H T + P12 H2T
= + 1T 1 T (HP H T + R) 1 (y h(x̂ )) (6.7)
x̂2 P12 H1 + P2 H2T
from which it is clear that the optimal update for the relative state x̂rel = x̂2 x̂1 is
x̂+
rel
T
= x̂rel + (P2 H2 P1 H 1T T T T
P12 H2 + P12 H1 )(HP H + R) T 1
(y h(x̂ )) (6.8)
with corresponding relative error covariance
T
Prel = P1 + P2 P12 P12 (6.9)
Noting that it must be true that h(xrel ) = h(x) and hence that @h(xrel )/@xrel = @h(x2 )/@x2 =
@h(x1 )/@x1 , let Hrel = H2 = H1 . Then it is clear that
T
Prel Hrel = P2 H2T P1 H1T P12 H2T + P12
T
H1T (6.10)
and that
T
Hrel Prel Hrel = HP H T (6.11)
and hence
x̂+ T T
rel = x̂rel + Prel Hrel (Hrel Prel Hrel + R)
1
(y h(x̂rel )) (6.12)
Therefore, the dual inertial state update is mathematically (although perhaps not compu-
tationally) equivalent to a direct update of the relative state.
Appendix C reproduces a memorandum that further details the benefits of the dual
inertial formulation.
6.5. RELATIVE STATE REPRESENTATIONS 57
ρ φ
the spacecraft is point-mass gravity from the central body, then the equations of motion
are given by
2 3
⇢˙
6 µ/⇢2 + ⇢ ˙ 2 + ⇢✓˙2 cos2 7
6 7
6
6 ✓˙ 7
7
ẋ = f (x) = 6 ˙ ˙ ˙ 7 (6.14)
6 2 ⇢˙ ✓/⇢ + 2 ✓ tan 7
4 ˙ 5
2⇢˙ ˙ /⇢ ✓˙2 cos sin
Now consider a circular reference orbit, with
p radius ⇢⇤ , which is in the plane of the great
circle containing the ✓ coordinate. Let !⇤ = µ/⇢3⇤ . Then, the state of an object following
the circular reference orbit at any time t > to will be x⇤ (t) = [⇢⇤ , 0, !⇤ (t to ) ✓o , !⇤ , 0, 0].
Without loss of generality, take ✓o = to = 0. Letting x = x x⇤ , linearization of (6.14) in
the neighborhood of x⇤ yields
2 3
0 1 0 0 0 0
63! 2 0 0 2!⇤ ⇢⇤ 0 07
6 ⇤ 7
@f (x) 6 0 0 0 1 0 07
ẋ(t) = 6
x(t) = 6 7 x(t) (6.15)
@x x⇤ (t) 6 0 2!⇤ /⇢⇤ 0 0 0 077
4 0 0 0 0 0 15
0 0 0 0 2
!⇤ 0
58 6. STATE REPRESENTATIONS
6.6.1. The Gyro Model The gyro is modeled in terms of the bias, scale factor and
non-orthogonality. The IMU case frame is defined such that the x-axis of the gyro is
the reference direction with the x y plane being the reference plane; the y- and z-axes
are not mounted perfectly orthogonal to it (this is why we don’t have a full misalign-
ment/nonorthogonality matrix as we will in the accelerometer model). The errors in de-
termining these misalignments are the so-called non-orthogonality errors, expressed as a
matrix , as
2 3
0 0 0
= 4 yx 0 0 5
zx zy 0
The gyro scale factor represents the error in conversion from raw sensor outputs (gyro
digitizer pulses) to useful units. In general we model the scale-factor error as a first-order
Markov (or a Gauss-Markov) process in terms of a diagonal matrix given as
2 g 3
sx 0 0
Sg = 4 0 sgy 0 5
0 0 sgz
Similarly, the gyro bias errors are modeled as as first-order vector Gauss-Markov processes
as
2 g 3
bx
b = bgy 5
g 4
bgz
6.6. MODELING INERTIAL COMPONENTS 59
6.6.2. The Accumulated ✓ In order to find the accumulated angle (not as a func-
tion of the measurement, but purely as a function of the true angular velocity), we define
✓ as
⇣ ⌘ Z tk
Ck C 1 C C
✓ Ck 1 = !m (⌧ ) + ⇥ !m (⌧ ) d⌧ (6.19)
m tk 1 2 Cref
Z tk "Z #
⌧
C 1 ˙ C C
= !m (⌧ ) + ( ) d ⇥ !m (⌧ ) d⌧ (6.20)
tk 1 2 tk 1 Cref
"Z ◆ #
Z tk ⌧ ✓
C 1 C 1 C C C
= !m (⌧ ) + !m ( )+ ⇥ !m ( ) d ⇥ !m (⌧ ) d⌧
tk 1 2 tk 1 2 Cref
Ignoring second-order terms, we get
⇣ ⌘ Z tk " Z #
Ck C 1 ⌧ C C
✓ Ck 1 = !m (⌧ ) + ! ( ) d ⇥ !m (⌧ ) d⌧ (6.21)
m tk 1 2 tk 1 m
⇣ ⌘
With this expression, we find that, by analogy, we can express ✓ CCkk 1 as
⇣ ⌘ Z tk " Z #
Ck C 1 ⌧ C C
✓ Ck 1 = ! (⌧ ) + ! ( ) d ⇥ ! (⌧ ) d⌧ (6.22)
tk 1 2 tk 1
6.6.3. The Accelerometer Model The accelerometer package will likely be mis-
aligned relative to the IMU reference frame. This is due to the fact that the three ac-
celerometers (contained in the accelerometer package) are not mounted orthogonal to each
other and these errors are expressed in terms of six di↵erent small angles as:
2 a a
3
0 ⇠xy ⇠xz
⌅a = 4 ⇠yxa 0 ⇠yz a 5
a a
⇠zx ⇠zy 0
Similar to the gyros, the accelerometer scale factor represents the error in conversion
from raw sensor outputs (accelerometer digitizer pulses) to useful units. In general we model
the scale-factor error as a first-order (Gauss-) Markov process in terms of a diagonal matrix
given as
2 a 3
sx 0 0
Sa = 4 0 say 0 5
0 0 saz
Similarly, the bias errors are modeled as as first-order Gauss-Markov processes as
2 a 3
bx
ba = 4 bay 5
baz
So, the accelerometer measurements, aCm are modeled as:
aCm = (I3 + ⌅a ) (I3 + Sa ) aC + ba + a (6.23)
where I3 is a 3 ⇥ 3 identity matrix, the superscript C indicates that this is an inertial
measurement at the ‘box-level’ expressed in case-frame co-ordinates, and AC is the ‘true’
non-gravitational acceleration in the case frame. The quantity a is the velocity random
walk, a zero-mean white sequence on acceleration that integrates into a velocity random
walk, which is the ‘noise’ on the accelerometer output. If we assume that the errors are
small, then to first-order
(I3 + ⌅a ) (I + Sa ) ⇡ I + ⌅a + Sa
So, the linear accelerometer measurements (in the case frame) are:
aCm = (I3 + ⌅a + Sa ) aC + ba + a (6.24)
6.6.4. Accumulated v We note that the measured v in the case frame, vCm , is
mapped to the end of its corresponding time interval by the sculling algorithm within the
IMU firmware, so that we can write
Z tk
vCm k = TCC(t)
k
aC(t)
m dt (6.25)
tk 1
where vCm k
covers the time interval from tk 1 to tk (tk > tk 1 ) and C(t) is the instan-
taneous case frame§. We recall that a transformation matrix can be written in terms of the
§Or equivalently,
⇣ ⌘ Z tk
B
vB
m = k
TB(t) aB(t)
m dt (6.26)
k tk 1
h i
B Bk
k
But since TB(t) ⇡ I3 B(t) ⇥ , we find
⇣ ⌘ Z tk h h ii
Bk
vB
m = I3 B(t) ⇥ aB(t)
m dt (6.27)
k tk 1
6.6. MODELING INERTIAL COMPONENTS 61
Euler axis/angle as
sin 1 cos T
T ( ) = cos( )I [ ⇥] + 2
(6.28)
sin 1 cos
= I [ ⇥] + 2
[ ⇥] [ ⇥] (6.29)
Finally, the accelerometer noise, which is zero-mean process with spectral density Sa be-
comes
Z tk+1
a dt = ua (6.35)
tk
where ua is a random vector with covariance Sa (tk tk 1 ). So, Eq. (6.32) becomes
vCm k
= [I3 + a
] vC k
+ ba t + a t (6.36)
a 1 a
Since we have established that [I3 + ] ⇡ [I3 ], and neglecting terms of second-
order,
vC k
= [I3 a
] vCm k
ba t a t (6.37)
The average acceleration in the case frame is
vC k
aCave = (6.38)
t
and the average measured acceleration in the case frame is
vCm
aCm ave
= k
(6.39)
t
62 6. STATE REPRESENTATIONS
so we find that
aCave = [I3 a
] aCm ave
ba a (6.40)
Recalling that the IMU measures accelerations except for gravity, total acceleration is
⇣ ⌘
B C
aI = gI (r) + TIBref TBref TB C aave (6.41)
k
6.6.5. The Gravity Call One of the more expensive computations involving the prop-
agation of the trajectory with IMU data is the gravity calculation. This is particularly acute
when the gravity field used is of high order. The gravity gradient matrix requires even more
computation. Hence it goes without saying that if a way is found to minimize the gravity
calls, that would make the navigation software more tractable. Taking advantage of the
fact that propagation of the trajectory using IMU data occurs at a high rate (usually at 40
Hz or higher), we expand the gravity vector in terms of a Taylor series about r⇤ as
@g 1 @2g
g(r) = g(r⇤ ) + [r r⇤ ] + [r r⇤ ] T [r r⇤ ] + . . . (6.42)
@r r=r⇤ 2 @r2 r=r⇤
Knowing that the gravity gradient matrix, G, is
@g
G(r⇤ ) = (6.43)
@r r=r⇤
and truncating after the first-order in r, we find that
g(r) ⇡ g(r⇤ ) + G(r⇤ ) [r r⇤ ] (6.44)
⇡ G(r⇤ )r + [g(r⇤ ) G(r⇤ )r⇤ ] (6.45)
where now the gravity vector and the gravity gradient matrix need only to be evaluated at
the beginning of the major cycle.
CHAPTER 7
Factorization Methods
Of the various covariance factorization methods⇤, the UDU covariance factorization tech-
nque is among the most commonly used covariance matrix factorization methodologies used
in practice. It is implemented in GEONS and flew on MMS and is the heart of the Orion
Absolute Navigation System. This chapter is intended to present the U DU triangular
factorization method and the rationale for its use.
Above all, we demonstrate that the U DU factorization results in a significant reduction
in the arithmetic operations (specifically adds and multiplies) compared with the usual
P T + Q time update and the Joseph measurement update.
In the next section, we present some notational and preliminary operations for the
matrix factors U and D. In the section that follows, we will derive the time update equations
for the aforementioned covariance matrix factors. Next, we will derive the measurement
update equations for the covariance matrix factors. Finally, we will present some concluding
comments.
7.1. Why Use the UDU Factorization?
The usual Kalman filter equations work well for rather simple problems. But once
the state-space becomes large, the condition numbers of the covariance matrix becomes
large and nonlinear e↵ects begin to a↵ect the numerical characteristics, problems such as
filter divergence and non-positive definiteness of the covariance matrix occur. These issues
began to be observed almost as soon as Kalman filters began to be used in real problems.
Matrix factorization techniques were introduced to solve (at least) some of these issues. The
earliest was the Potter Square Root Factorization, which was used in the on-board Apollo
navigation filters.
In fact Bierman and Thornton, in a 1976 JPL Report, rather cheekily compare those who
insist on using the conventional Kalman filtering and batch least-squares algorithms (contra
the matrix factorization algorithms) to unrepentant smokers by describing “an attitude often
encountered among estimation practitioners [is] that they will switch to the more accurate
and stable algorithms if and when numerical problems occur. An analogy comes to mind of
a smoker who promises to stop when cancer or heart ailment symptoms are detected. To
expand on the analogy, one may note the following:
• Most smokers do not get cancer or heart disease. (Most applications of the Kalman
algorithms work.)
⇤Other options include the Square Root Covariance Factorization and the Square Root Information
Filter (SRIF).
63
64 7. FACTORIZATION METHODS
• Even when catastrophic illness does not occur, there is diminished health. (Even
when algorithms work, performance may be degraded.)
• Smokers can take precautions to lessen the danger, such as smoking low tar or
filtered cigarettes. (Engineers can scale their variables to reduce the dynamic range
or use double-precision arithmetic.)
• Lung cancer may not be diagnosed until it is too advanced for treatment. (Numer-
ical problems may not be detected in time to be remedied.) ” [71]
In addition, a little advertised, but incredibly useful feature of the UDU factorization
is the ability to interrogate for the positive definiteness of the covariance matrix for ‘free’,
since definiteness of the D matrix may be trivially evaluated.
This sets the stage for the need for the matrix factorization techniques and the UDU
technique in particular.
7.2. Preliminaries
Let us factor a covariance matrix, P, into the following form
P = UDUT (7.1)
where U is a upper triangular matrix with 1’s on the diagonals and 0’s on the lower portion
of the matrix, D is a diagonal matrix. We can write U and D compactly as
U = {uij }, i<j (7.2)
D = {dii } (7.3)
as well
uii = 1 (7.4)
It should be noted that Eq. (7.2) gives the upper triangular portion of the covariance; the
lower triangular matrix can be obtained by reflection or by evaluation of Eq. (7.2), with
ulm = 0 for l > m.
Equally valid is
Xn
pij = uik dkk ujk , j<i (7.5)
k=i
and for the diagonals we find,
n
X
pii = u2ik dkk (7.6)
k=i
So, given an n ⇥ n symmetric, positive semi-definite matrix P, the unit upper triangular
factor U and the diagonal factor D (such that P = UDPT ) is obtained using the following
equations. We begin with the (n, n) element and work upwards (along the columns).
The algorithm is as follows:
dn,n = pn,n
un,n = 1.0
for j = n 1 : 1 : 1 do
uj,n = pi,n /dn,n
end for
for j = n : 1 : 2 do
dj,j = pj,j
for k = j + 1 : n do
7.3. THE TIME UPDATE OF THE COVARIANCE 65
7.3.1. The General Time Update Problem Given a state, x, that evolves accord-
ing to
x(tk ) = (tk , tk 1 )x(tk 1 ) + Gk wk
where wk is the process noise at time tk , where x is an n ⇥ 1 vector, and w is an m ⇥ 1
vector. With this in hand, the general problem is as follows [70]: we wish to propagate the
covariance matrix defined by
T
P(tk ) = (tk , tk 1 )P(tk 1 ) (tk , tk 1) + Gk Qk GTk (7.7)
where P is the propagated covariance (the overbar indicates a propagated quantity) and P
is the updated covariance at the prior time step, Qk is the diagonal process noise covariance
66 7. FACTORIZATION METHODS
matrix and Gk is the mapping of the noise to the state. To save memory, since Qk and Dk
are diagonal matrices, in the implementation, we pass Qk and Dk as vectors.
T
We want to find the propagated factors Uk and Dk , such that Pk = Uk Dk Uk . For
compactness, we now drop the time subscripts. Given the U DU factorization of covariance
matrices, we can write Eq. (7.7) as
T T T
Uk Dk Uk = k Uk 1 D k 1 Uk 1 k+ Gk Qk GTk (7.8)
⇥ ⇤ Dk 1 0 ⇥ ⇤T
= k Uk 1 Gk k Uk 1 Gk (7.9)
0 Qk
Since x is an n ⇥ 1 vector, and w is an m ⇥ 1 vector, k is an n ⇥ n matrix, Gk is an n ⇥ m
matrix, and Qk is an m ⇥ m matrix.
We recall that both Uk and Uk 1 are n ⇥ n upper triangular matrices, with 1’s on
the diagonal and D and D are purely diagonal matrices. So, we have some work to do
⇥ ⇤ Dk 1 0
on Eq, (7.9) because k U k 1 Gk is an n ⇥ (n + m) matrix and is an
0 Qk
(n + m) ⇥ (n + m) diagonal matrix.
⇥ ⇤
Defining the first matrix ( k U k 1 Gk which is an n ⇥ (n + m) matrix) on the
right hand side of Eq.(7.9) as
⇥ ⇤
Y= k U k 1 Gk (7.10)
we seek a matrix Tk that transforms Yk such that
⇥ ⇤
Yk Tk T = Uk 0n⇥m (7.11)
where Uk is an n ⇥ n upper triangular matrix with 1’s on the diagonal. In order to find the
desired matrix Tk we perform a Gram-Schmidt orthogonalization‡. We define the diagonal
matrix De k as
e k = Dk 1 0
D (7.12)
0 Qk
which will be important as we define the weighted inner product in the Gram-Schmidt
Orthogonalization. Eq. (7.9) which was
T
e k YT
Uk Dk Uk = Y D (7.13)
can now be rewritten as
h i
T
e k Tk T 1 YT
Uk Dk Uk = Yk Tk T TTk D (7.14)
k k
h i
e k Tk is a diagonal matrix.
We note that the matrix TTk D
“All” that remains is to find Tk . This is where we harness the power of the modified
Gram-Schmidt orthogonalization process which provides us with what we were after: Uk
and Dk .
for k = n, · · · , 2 do
bk = Y k
end for
for j = n, · · · , 2 do
e j
f j = Db [0, n(n + m), 0]
Djj = bTj f j [n(n + m), n(n + m), 0]
f j = f j /Djj [0, 0, (n + m)(n 1)]
for i = 1, · · · , j 1 do
2 2
Uij = bTi f j [(n + m) (n 2 n) , (n + m) (n 2 n) , 0]
2 2
bi = bi Uij bj [(n + m) (n 2 n) , (n + m) (n 2 n) , 0]
end for
U11 = 1
e 1
f 1 = Db
D11 = bT1 f 1
end for
Thus, the algorithm not only provides the orthogonal basis vectors, bj , j = 1, · · · , nx ,
but it also provides the triangular matrix factors U and D.
Since we are also interested in the arithmetic operations, we find that there are [n3x +
2
nx mx ] adds, [nx (nx + 1)(nx + mx )] mulitplies and [(nx + mx )(nx 1)] divides. For the case
when mx = 0, i.e. no process noise, we have n3x adds and [n3x + n2x ] mulitplies and [n2x nx ]
divides.
Finally, the entire covariance update algorithm, including the computation of Yk uses
[1.5n3x +0.5n2x (2mx 1)], and [0.5n2x (3nx +1)+nx mx (nx +1)] mulitplies and [(nx +mx )(nx
1)] divides.
The Modified Gram-Schmidt orthogonalization process makes no assumptions regarding
the structure of . For a large number of states (say, nx = 35 with process noise inputs,
mx = 35), most of which might be biases (or Gauss Markov processes), much of this is
wasted considering the sparseness of . In Appendix B.1, we show that we use [n3x + n2x mx ]
adds and [nx (nx + 1)(nx + mx )] multiplies to obtain Uk and Dk . For nx = 35 and mx = 35,
we require 85,750 adds and 88,200 multiplies – quite a large number of computations. We
can stop here and all will be well – if we are willing to pay the heavy computational price.
But we can do better! We can vastly improve (reduce) on the number of computations
by partitioning the original state vector into ‘states’ and ‘parameters’, where the parameters
68 7. FACTORIZATION METHODS
will be modeled as first-order Gauss-Markov processes. Unlike the parameters, the states
can vary in any manner. This motivates the next section.
7.3.2. An Improvement for the Case of Parameters As stated earlier, for a large
number of states (say, nx = 35), the U DU time update for the full covariance matrix
(a la Gram-Schmidt orthogonalization) is computationally expensive, requiring 2n3x + 2n2x
multiplies and additions. This is not competitive with the ‘standard’ P + Q formulation
(which uses 2n3x multiplies). However, one might guess that an improvement can be made.
This is particularly significant because, normally, most of the states are biases or ECRVs
(Exponentially Correlated Random Variables) or first-order Gauss-Markov processes. In
order to generalize the development, we assume ECRVs for the ‘parameter’ (or bias) states.
For most space-borne navigation applications, we can usually partition the states into
position, velocity, attitude (if applicable) and clock states, all of which we group together
and denote as x, and parameter states which usually comprise the sensor biases, scale
factors, etc., which we denote as p. This means that the full state space is
x
X = (7.15)
p
The ‘states’ partition must comprise all those quantities whose time evolution cannot be
described as purely self-auto correlated processes. With this in hand, we partition U and
D as
Uxx Uxp Dxx 0
U= D= (7.16)
0 Upp 0 Dpp
Also, partition according to
xx xp I 0 xx xp
= = = 2 1 (7.17)
0 M 0 M 0 I
where M is a diagonal matrix, representing an ECRV whose propagation for pk is
t/⌧
p̄k = e p̄k 1 (7.18)
so that
t/⌧i
M(i, i) = mi = e (7.19)
where ⌧i is the time constant of the ith
ECRV state and Q is partitioned according to
Qxx 0 Qxx 0 0 0
Q= = + = Q1 + Q2 (7.20)
0 Qpp 0 0 0 Qpp
Recall that the original propagation equation was
T
UDU = UDUT T
+Q (7.21)
T
Harnessing the development in Appendix B.1, U D U becomes
T T T T
UDU = 2[ 1 UDU 1 + Q1 ] 2 + Q1 (7.22)
This suggests the following two-step process:
e and D
1) Find U e from
eD
U eT =
eU 1 UDU
T T
+ Q1 (7.23)
1
2) Find U and D from
UDU =
T
e e eT
2 UDU
T
+ Q2 (7.24)
2
7.3. THE TIME UPDATE OF THE COVARIANCE 69
7.3.2.1. The First Sub-Problem ( 1 UDUT 1 T + Q1 ) Lets look at 1). The left hand
side of Eq. (7.23) is
" #
T e xx D
U eT + U
e xx U e xp D eT
e pp U e xp D
U eT
e pp U
UeDeUe = xx xp pp
(7.25)
Ue pp D eT
e pp U e pp D
U eT
e pp U
xp pp
and
2 3 2 3
0 0 0 0 0 0
Q 2 = 4 0 qb 0 5 + 4 0 0 0 5 = Q b + Q c (7.33)
0 0 0 0 0 Qc
As in the previous section, we note that c 1 Qb c T = Qb . So, now Eq. (7.24) becomes
h i
UDU = c
T
e e e T T + Qb
b UD U
T
b c + Qc (7.34)
The term in the square bracket in Eq. (7.34) is
ŬD̆Ŭ =
T
e e eT
b UDU
T
+ Qb (7.35)
b
In Appendix B the above equation is expanded until we find that
e ac ,
Ŭac = U e cc ,
D̆cc = D
T
eT
Ŭcc = U (7.36)
cc
Additionally,
e bc
Ŭbc = mb U (7.37)
and
d˘b = m2b deb + qb (7.38)
and
deb e
Ŭab = mb Uab (7.39)
d˘b
Finally, we find that
!
T
e aa D
e aa U
e + T deb qb eT
e ab U
Ŭaa D̆aa Ŭaa = U aa U ab (7.40)
d˘b
We note that U e ab is a column vector so Eq.(7.40), and hence is of rank 1, constitutes a
‘rank one’ update. Since d˘b , deb and qb are all positive (assuming mb is a positive quantity),
we can use the Agee-Turner Rank One update [1]. It should be pointed out that as the
algorithm proceeds down the ‘list’ of parameters, the size of the states a increases by one
(and consequently the size of the parameters c decreases by one. Hence Ŭaa and D̆aa begins
with a dimension of nx and concludes with dimension nx + np 1.
Therefore, this is done recursively for all the (sensor) parameters p which are of size np .
The algorithm can be expressed as follows (with the arithmetic operations (adds, multi-
plies, divides) in square brackets per k ):
for k = 1, · · · , np do
e x + k, nx + k) + Qpp (k, k)
D̆(nx + k, nx + k) = M(k, k)2 D(n (Eq. (7.38)) [1, 2, 0]
e
↵ = M(k, k) D(nx +k,nx +k) [0, 1, 1]
D̆(nx +k,nx +k)
for i = 1, · · · , (nx + k 1) do
Ŭ(i, nx + k) = ↵U(i, e nx + k) (Eq. (7.37)) [0, nx + k 1, 0]
end for
7.3. THE TIME UPDATE OF THE COVARIANCE 71
for j = nx + k + 1, · · · , (nx + np ) do
Ŭ(nx + k, j) = M(k, k)U(n e x + k, j) (Eq. (7.39)) [0, np k, 0]
end for
(k) (k)
Solve for Ŭxx , D̆xx using1 the Rank-One update [(nx +k)2 , (nx +k)2 +3(nx +k)+2, 0]
end for
Thus, the arithmetic operations are as follows:
Adds:
np np
X X
2 2
(nx + k) + 1 = nx np + np + nx (np + 1)np + k2 (7.41)
k=1 k=1
Multiplies:
np
X
3 + (nx + k 1) + np k + (nx + k)2 + 3(nx + k) + 2 = (5.5 + 5nx + n2x )np
k=1
np
1 X
+ (2nx + 5)n2p + k2 (7.42)
2
k=1
and np divides.
Appendix B.3 contains the development of the Agee-Turner Rank-One update which is
the key to reducing the numerical operations on the UDU Time update. Given U and D,
e and D
along with c, and the vector x, we are interested in obtaining U e along the lines of
eU
U e T = UDUT + cxxT
eU (7.43)
eij and D
The algorithm to compute U e ii ss:
Cn = c
for j = n, · · · , 2 do
De jj = Djj + C j x2 [n 1, 2(n 1), 0]
i
ejj = 1
U
j e
j = C /Djj [0, 0, (n 1)]
v j = j xj [0, (n 1), 0]
for i = 1, · · · , j 1 do
xi := xi Uij xj [ 21 (n2 n), 12 (n2 n), 0]
eij = Uij + xi vj
U [ 12 (n2 n), 12 (n2 n), 0]
end for
C j 1 = j Djj [0, (n 1), 0]
end for
De 11 = D11 + C 1 x2 [1, 2, 0]
1
7.3.3. Arithmetic Operations for Time Update For the time update of the co-
variance matrix, we will have (from Section XX and XX), we have the following arithmetic
operations:
np
X
Adds : 1.5n3x + n2x mx + n2x np 0.5n2x + np + nx (np + 1)np + k2
k=1
np
1 X
Multiplies : 0.5n2x (3nx + 1) + nx mx (nx + 1) + (5.5 + 5nx + n2x )np + (2nx + 5)n2p + k2
2
k=1
For nx = 9, mx = 9, np = 26, we will utilize 16,407 adds, 19,338 multiplies, and 170
divides. In contrast, if we did the MGS on all 35 states (nx = 35, mx = 35 and np = 0),
we would use 85,750 adds, 88,200 multiplies, and 34 divides. Finally, if the covariance were
updated (without any consideration given to the structure of from P T ) in the conven-
tional manner, with nx = 35, mx = 35, it would cost 84,525 adds, 85,750 multiplies and no
divides†. Thus, a very strong case is made for using the U DU factorization and harnessing
the benefit of updating the sensor parameters using the Agee-Turner Rank-One update.
Thus, the U DU time update taking advantage of the fact that most of the states are sensor
parameters results in nearly five times fewer adds and multiplies and 170 more divides
that if we operated on the full covariance matrix.
7.4. The Measurement Update
The U DU factorization requires that we process the measurements one-at-a-time [2].
This should not be construed as a weakness of the formulation. If the measurements are
correlated, they can be ‘decorrelated’ as in Appendix B.4
So, the covariance update equations are
P=P KHP (7.45)
where K is the Kalman Gain matrix, H is the measurement partial, P is the a priori
covariance, and P is the a posteriori covariance matrix. Using the covariance factors U and
D, we rewrite the above equation as
T T
UDUT = U D U KHŪD̄Ū (7.46)
We note that U and Ū and D and D̄ are n ⇥ n matrices, and because we are using the
paradigm of processing the measurements one at a time, H is an 1 ⇥ n vector and K is an
n ⇥ 1 vector. Recalling that K is defined as
T T
T T 1 U D U HT U D U HT
K = PH HPH + R = T = (7.47)
HU D U HT + R ↵
where the scalar ↵ is defined to be
T
↵ = HU D U HT + R (7.48)
†For matrices, A and B of dimension n ⇥ m and m ⇥ p, respectively, the product
C = AB (7.44)
results in n(m 1)p adds, nmp multiplies and no divides.
7.4. THE MEASUREMENT UPDATE 73
eD eT = D
eU 1 T
U vv (7.52)
↵
Therefore,
e
U = UU and e
D=D (7.53)
So, how do we proceed? This has all the marks of a rank-one update, for after all v is of
rank one. We can proceed by using the Agee-Turner rank-one update. Except for one thing
– that pesky minus sign in Eq. (7.52). That minus sign portends all sorts of numerical
issues because there is a strong possibility that we can lose numerical precision if the Agee-
Turner update is used blindly. It turns out that we can have ‘our cake and eat it too’, for
Neil Carlson developed a rank-one update to remedy precisely our issue. The mathematical
development of this algorithm is detailed in Appendix B.5.
For the normal, Joseph Kalman filter update, for a scalar measurement, we find that
if we use efficient methods of calculating and storing quantities [2], we use 4.5n2x + 3.5nx
adds, 4n2x + 4.5nx multiplies and 1 divide.
For the “Conventional” Kalman filter update (P = P KHP in Eq.(61)), for a scalar
measurement, we find that [2] we use 1.5n2x + 1.5nx adds, 1.5n2x + 0.5nx multiplies and 1
divide.
Thus, for nx = 35, the covariance update due to measurement processing with the U DU
factorization uses 1820 adds, 1890 divides and 104 divides compared with 5635 adds, 5058
multiplies and 1 divide for the efficient Joseph update. The “Conventional” Kalman update
uses 1890 adds, 1855 multiplies, and 1 divide.
Hence there almost a factor of 2.5 improvement in the adds and multiplies using the
triangular (U DU ) update compared with the Joseph update. This rivals the efficiency of
the “conventional” Kalman Filter update.
7.5. Consider Covariance and Its Implementation in the UDU Filter
‘Consider’ Analysis was first introduced by S. F. Schmidt of NASA Ames in the mid
1960s as a means to account for errors in both the dynamic and measurement models due to
uncertain parameters [64]. The Consider Kalman Filter, also called the Schmidt-Kalman
Filter, resulted from this body of work. The consider approach is especially useful when
parameters have low observability.
We partition the state-vector, x, into the ns “estimated states”, s, and the np “consider”
parameters, p, as
s
x= (7.54)
p
so that
Pss Psp ⇥ ⇤ Ks,opt Pss HTs + Psp HTp 1
P= ,H = Hs Hp , Kopt = = W
Pps Ppp Kp,opt Pps HTs + Ppp HTp
where Kopt is the optimal Kalman gain computed for the full state, x. Therefore, if we
now choose the Ks and Kp carefully such that the Ks = Ks,opt , the a posteriori covariance
matrix is
2 3
T Psp
6 Pss Ks WKs Psp Ks H
Ppp 7
P+ = 6
4
T
7
5 (7.55)
Psp T T T
Pps H Ks Ppp Kp WKp
Ppp
This equation is valid for any value of Kp . Notice that there is no Kp in the correlation terms
of the covariance matrix. Therefore, what is remarkable about this equation is that
once the optimal Ks is chosen, the correlation between s and p is independent
of the choice of Kp .
In its essence, the consider parameters are not updated; therefore, the Kalman gain
associated with the consider parameters, p, is zero, i.e. Kp = 0. However, several comments
are in order:
(1) When using the Schmidt-Kalman filter, the a priori and a posteriori covariance of
the parameters (Ppp ) are the same.
(2) The a posteriori covariance matrix of the states and the correlation between the
states and the parameters are the same regardless of whether one uses the Schmidt-
Kalman filter or the optimal Kalman update
7.5. CONSIDER COVARIANCE AND ITS IMPLEMENTATION IN THE UDU FILTER 75
7.6. Conclusions
Matrix factorization methods, particularly the UDU factorization, are very useful –
indeed essential – for onboard navigation algorithms. They are numerically stable and
computationally efficient, competitive with the classic Kalman filter implementation. In
addition, they allow the navigation designer to investigate the positive definiteness of the
covariance matrix for ‘free’, via the entries of the diagonal matrix D.
CHAPTER 8
Attitude Estimation
For actual applications, representations are 3 ⇥ 3 matrices that transform the repre-
sentations of vectors in one frame, i.e. their components along the basis vectors in that
frame, to their representations in a di↵erent frame. Thus attitude representations describe
a fixed physical vector in a rotated frame rather than a rotated vector. This is the passive
interpretation of a transformation, also known as the alias sense (from the Latin word for
“otherwise,” in the sense of “otherwise known as”) [67]. The alternative active interpre-
tation (also known as the alibi sense from the Latin word for “elsewhere”) considers the
representation in a fixed reference frame of a rotated physical vector. It is crucial to keep
this distinction in mind, because an active rotation in one direction corresponds to a pas-
sive rotation in the opposite direction. Overlooking this point has led to errors in flight
software.1
Now consider transforming the representation of a vector x ~ in a frame F to its repre-
sentation in a frame G and then from frame G to frame H or directly from frame F to frame
H, so
xH = AHG xG = AHG (AGF xF ) = AHF xF (8.1)
These transformations must be equivalent for any vector xF , so successive transformations
are accomplished by simple matrix multiplication:
AHF = AHG AGF (8.2)
This may appear to be an obvious result, but only one other attitude representation
has such a simple composition rule. Matrix multiplication is associative, meaning that
AHG (AGF AF E ) = (AHG AGF )AF E . Matrix multiplication is not commutative, however,
which means that AHG AGF 6= AGF AHG in general. The non-commutativity of matrix
multiplication is at the heart of the problem of finding a suitable attitude representation.
Transforming from frame F to frame G and back to frame F is e↵ected by the matrix
AF F = AF G AGF , which must be the identity matrix. Rotations must also preserve inner
products and norms of vectors, so
xF · yF = xG · yG = (AGF xF )T AGF yF = xTF ATGF AGF yF (8.3)
These two observations mean that
ATGF = AGF1 = AF G (8.4)
A matrix with its transpose is equal to its inverse is called an orthogonal matrix, and its
determinant must equal ±1. The attitude matrix must be a proper orthogonal matrix, i.e.
have determinant +1, in order to transform a right-handed coordinate frame to a right-
handed coordinate frame.
The nine-component attitude matrix is in some ways the ideal representation of a ve-
hicle’s attitude. It has a 1:1 correspondence with physical attitudes, it varies smoothly as
the physical attitude varies smoothly, its elements all have magnitudes less than or equal
to one, and it follows a simple rule for combining successive rotations. It is not an efficient
representation, though; only three of its nine parameters are independent because the or-
thogonality constraint is equivalent to six independent scalar constraints. This provides the
opportunity to specify an attitude or an attitude matrix using only three parameters, but
not, as was pointed out above, in a globally continuous and nonsingular fashion.
1One example is an incorrect sign for the velocity aberration correction for star tracker measurements
on the WMAP spacecraft, which fortunately was easily corrected.
8.2. EULER AXIS/ANGLE REPRESENTATION 79
where
#e
! GF
G ⌘ lim (8.12)
t!0 t
is the vector representation in frame G of the angular velocity of frame G with respect to
frame F. The angular velocity is known to be represented in frame G because the product
AGF (t + t)AF G (t) is a rotation from frame G at one time to frame G at a di↵erent time,
and these two frames coincide in the limit that t goes to zero. The angular velocity is
usually written simply as !, with the frames understood. Its units are rad/sec, assuming
that time is measured in seconds, because radian measure was assumed in taking the small
angle limit of sin #.
Equation (8.11) is the fundamental equation of attitude kinematics. It does not dis-
tinguish between the situations where frame F or frame G or both frames are rotating in
an absolute sense; it only cares about the relative rotation between the two frames. This
equation can also be written as
ȦGF = AGF AF G [! GF T
G ⇥]AF G = AGF [AF G ! GF
G ⇥] = AGF [! GF
F ⇥] (8.13)
which expresses the kinematics in terms of the representation in frame F of ! GF . The second
equality uses an identity that holds for any proper orthogonal matrix. These kinematic
equations, if integrated exactly, preserve the orthogonality of the attitude matrix.
The Euler axis/angle representation is fundamental for analysis, as demonstrated above,
but it has been entirely superseded for practical applications by a superior four-parameter
representation described in the next section.
where the three-component vector q1:3 and the scalar q4 are defined by
This representation has the advantage over the Euler axis/angle representation of requiring
no trigonometric function evaluations, and its four components are more economical than
the nine-component attitude matrix.
The four parameters of this representation were first considered by Euler but their full
significance was revealed by Rodrigues, so they are often referred to as the Euler symmet-
ric parameters or Euler-Rodrigues parameters. This representation is called the quaternion
representation and denoted A(q) because the four parameters can be regarded as the com-
ponents of a quaternion
q
q = 1:3 (8.16)
q4
8.3. QUATERNION REPRESENTATION 81
with vector part q1:3 and scalar q4 . A quaternion is basically four-component vector with
some additional operations defined for it.2 The attitude quaternion
e sin(#/2)
q(e, #) = (8.17)
cos(#/2)
is a unit quaternion, obeying the norm constraint kqk = 1, but not all quaternions are
unit quaternions. It is clear from Eq. (8.14) that q and q give the same attitude matrix.
This 2:1 mapping of quaternions to rotations is a minor annoyance that cannot be removed
without introducing discontinuities in the representation.
The most important added quaternion operations are two di↵erent products of two
quaternions q and q̄. They can be implemented in matrix form similar to the matrix form
of the vector cross product:3
q ⌦ q̄ = [ (q) q ] q̄ (8.18a)
q q̄ = [⌅(q) q ] q̄ (8.18b)
where (q) and ⌅(q) are the 4 ⇥ 3 matrices
q I [q ⇥]
(q) ⌘ 4 3 T 1:3 (8.19a)
q1:3
q I + [q ⇥]
⌅(q) ⌘ 4 3 T 1:3 (8.19b)
q1:3
Unlike the vector cross product, though, the norm of either product of two quaternions is
equal to the product of their norms.
Both quaternion products are associative but not commutative in general, in parallel
with matrix products. The two product definitions di↵er only in the signs of the cross
product matrices in Eqs. (8.19a) and (8.19b), from which it follows that
q ⌦ q̄ = q̄ q (8.20)
The identity quaternion ⇥ ⇤T
Iq ⌘ 0 0 0 1 (8.21)
acts in quaternion multiplication like the identity matrix in matrix multiplication. The
conjugate q⇤ of a quaternion is obtained by changing the sign of its vector part:
⇤
q q1:3
q⇤ = 1:3 ⌘ (8.22)
q4 q4
Either product of a quaternion with its conjugate is equal to the square of its norm times
the identity quaternion.
The inverse of any quaternion having nonzero norm is defined by
q 1
⌘ q⇤ /kqk2 (8.23)
2This is conceptually di↵erent from the quaternion introduced by Hamilton in 1844, before the introduc-
tion of vector notation, as a hypercomplex extension q = q0 + iq1 + jq2 + kq3 of a complex number z = x + iy.
The scalar part of a quaternion is often labeled q0 and put at the top of the column vector. Care must be
taken to thoroughly understand the conventions embodied in any quaternion equation that one chooses to
reference.
3The notation q ⌦ q̄ was introduced in Ref. [43], and q q̄ is a modification of notation introduced in
Ref. [57]. Hamilton’s product q̄q corresponds to q q̄, but q ⌦ q̄ has proven to be more useful in attitude
analysis. The order of the quaternion products in Eqs. (8.27) and (8.28) would be reversed with the classical
definition of quaternion multiplication.
82 8. ATTITUDE ESTIMATION
A unit quaternion, such as the attitude quaternion, always has an inverse, which is
identical with its conjugate. The conjugate of the product of two quaternions is equal to
the product of their conjugates in the opposite order: (q̄ ⌦ q)⇤ = q⇤ ⌦ q̄⇤ . The same
relationship holds for the other product definition and with conjugates replaced by inverses.
Equation (8.14) can be compactly written as
A(q) = ⌅T (q) (q) (8.24)
Now consider, for a unit quaternion q, the product
✓ ◆
x ⇤ ⇤ x ⇥ ⇤ ⇤
⇤⇥ ⇤ x
q⌦ ⌦q =q q⌦ = ⌅(q ) q (q) q
0 0 0
T
⌅ (q) A(q) x
= (q) x = (8.25)
qT 0
Applying a transformation by a second quaternion q̄ gives
✓ ◆
x ⇤ ⇤ A(q) x ⇤ A(q̄)A(q) x
q̄ ⌦ q ⌦ ⌦ q ⌦ q̄ = q̄ ⌦ ⌦ q̄ =
0 0 0
x A(q̄ ⌦ q) x
= (q̄ ⌦ q) ⌦ ⌦ (q̄ ⌦ q)⇤ = (8.26)
0 0
Because this must hold for any x, it shows that
A(q̄ ⌦ q) = A(q̄)A(q) (8.27)
This and Eq. (8.2) mean that the quaternion corresponding to successive rotations is just
the product
qHF = qHG ⌦ qGF (8.28)
A simple bilinear composition rule of this type holds only for the attitude matrix and
quaternion representations, a major reason for the popularity of quaternions.
Representing the rotation between times t and t + t by an Euler axis and angle,
Eqs. (8.28), (8.17), (8.20), and (8.18b) give
e sin(#/2) e
q(t + t) = ⌦ q(t) = cos(#/2)q(t) + sin(#/2) ⌦ q(t)
cos(#/2) 0
= cos(#/2)q(t) + sin(#/2)⌅ (q(t)) e (8.29)
This quaternion propagation equation has proven to be very useful. It preserves the unity
norm of the attitude quaternion. If the angular velocity is well approximated as constant
over the time interval, then # = k!k t and e = !/k!k. Alternatively, and particularly for
onboard applications, #e can be computed by di↵erencing the outputs of rate-integrating
gyroscopes.
Using small angle approximations for the sine and cosine leads to the kinematic equation
for the quaternion:
q(t + t) q(t) 1 ! 1
q̇ ⌘ lim = ⌦ q = ⌅(q) ! (8.30)
t!0 t 2 0 2
where ! is defined by Eq. (8.12) and several time arguments have been omitted. Exact
integration of this equation preserves the unit norm of the quaternion. The inverse of
Eq. (8.30) is often useful:
! = 2⌅T (q)q̇ (8.31)
8.4. RODRIGUES PARAMETER REPRESENTATION 83
by either an MRP vector with kpk 1 or an equivalent MRP vector in the shadow set of
MRPs with kpk 1.
The MRP vector corresponding to the quaternion product q̄⌦q follows the composition
rule
1 kpk2 p̄ + 1 kp̄k2 p 2 p̄ ⇥ p
p̄ p ⌘ (8.45)
1 + kpk2 kp̄k2 2 p̄ · p
This composition law is associative but not commutative in general, in parallel with matrix
and quaternion products. It cannot be represented as a matrix product.
The kinematic equation for the MRPs is
✓ ◆
1 + kpk2 [p⇥]2 + [p⇥]
ṗ = I3 + 2 ! (8.46)
4 1 + kpk2
The matrix in parentheses is orthogonal, so the inverse of Eq. (8.46) is
✓ ◆
4 [p⇥]2 [p⇥]
!= I3 + 2 ṗ (8.47)
1 + kpk2 1 + kpk2
The norm of an MRP vector can grow without limit during dynamic propagation or
attitude estimation; Eq. (8.41) shows that the norm is infinite for # = 2⇡. This singularity
can be avoided by switching from the MRP vector to its shadow. The norm can be restricted
to be less than or equal to unity in theory, but in practice it is best to allow the norm to
exceed unity by some amount before switching in order to avoid “chattering” between the
MRP and its shadow. An error p in an MRP vector corresponds to an error
@pS 2ppT kpk2 I3 ⇥ S S T ⇤
pS = p= p = 2p (p ) kpS k2 I3 p (8.48)
@p kpk4
in its shadow. This relation is useful for mapping covariance matrices into and out of
the shadow set. The 2:1 four-component quaternion representation does not have these
complications because the two quaternions representing the same attitude both have unit
norm, so there is no need to switch between them.
8.6. Rotation Vector Representation
It is convenient for analysis, but not for computations, to combine the Euler axis and
angle into the three-component rotation vector
q
# ⌘ # e = 2(cos 1 q4 ) 1:3 (8.49)
kq1:3 k
This leads to the very elegant expression
A(#) = exp( [#⇥]) (8.50)
where exp(·) is the matrix exponential. This equation can be verified by expansion of it and
Eq. (8.7) as Taylor series in # and repeated applications of the identity [e⇥]2 = eeT I3 .
The kinematic equation for the rotation vector is
✓ ◆
1 1 # #
#̇ = ! + # ⇥ ! + 2 1 cot # ⇥ (# ⇥ !) (8.51)
2 # 2 2
The coefficient of # ⇥ (# ⇥ !) goes to 1/12 as # goes to zero, but it is singular for # equal
to any nonzero multiple of 2⇡. The inverse of Eq. (8.51) is
1 cos # # sin #
! = #̇ 2
# ⇥ #̇ + # ⇥ (# ⇥ #̇) (8.52)
# #3
86 8. ATTITUDE ESTIMATION
The rotation vector is useful for the analysis of small rotations, but not for large rota-
tions, because of both the computational cost of evaluating the matrix exponential and the
kinematic singularity for k#k = 2⇡. This singularity can be avoided, as for the MRPs, by
switching from the rotation vector to its shadow
#S ⌘ (1 2⇡k#k 1
)# (8.53)
which represents the same attitude. This can restrict the norm of the rotation vector to ⇡
or less in theory, but in practice it is best to allow the norm to exceed ⇡ by some amount
before switching in order to avoid “chattering” between the rotation vector and its shadow.
The properties of the rotation vector are very similar to those of the MRPs, and it has
no obvious advantages over the MRPs. It has the disadvantage of requiring transcenden-
tal function evaluations to compute the attitude matrix, so it is rarely used in practical
applications.
8.7. Euler Angles
An Euler angle representation parameterizes a rotation by the product of three rotations
about coordinate frame unit vectors:
Aijk ( , ✓, ) = A(ek , )A(ej , ✓)A(ei , ) (8.54)
where ej , ej , and ej are selected from the set
⇥ ⇤T ⇥ ⇤T ⇥ ⇤T
e1 = 1 0 0 , e2 = 0 1 0 , e3 = 0 0 1 (8.55)
The possible choice of axes is constrained by the requirements i 6= j and j 6= k, leaving six
symmetric sets with ijk equal to 121, 131, 232, 212, 313, and 323 and six asymmetric sets
with ijk equal to 123, 132, 231, 213, 312, and 321. Symmetric Euler angle sets are used in
classical studies of rigid body motion [22, 28, 35, 49, 74].
The asymmetric sets of angles are called the Tait-Bryan angles or roll, pitch, and yaw
angles. The latter terminology originally described the motions of ships and then was
carried over into aircraft and spacecraft. Roll is a rotation about the vehicle body axis
that is closest to the vehicle’s usual direction of motion, and hence would be perceived as a
screwing motion. The roll axis is conventionally assigned index 1. Yaw is a rotation about
the vehicle body axis that is usually closest to the direction of local gravity, and hence
would be be perceived as a motion that points the vehicle left or right. The yaw axis is
conventionally assigned index 3. Pitch is a rotation about the remaining vehicle body axis,
and hence would be perceived as a motion that points the vehicle up or down. The pitch
axis is conventionally assigned index 2. Note that Eq. (8.54) assigns the variables , ✓, and
based on the order of rotations in the sequence, making no definite association between
these variables and either the axis labels 1, 2, and 3 or the names roll, pitch and yaw. Many
authors follow a di↵erent convention, denoting roll by , pitch by ✓, and yaw by . As
always, the reader consulting any source should be careful to understand the conventions
that it follows.
Using the product rule and Eq. (8.11) to compute the time derivative of Eq. (8.54) gives
n
[!⇥]Aijk ( , ✓, ) = [( ˙ ek )⇥] A(ek , )[(✓e ˙ j )⇥]AT (ek , )
o
A(ek , )A(ej , ✓)[( ˙ ei )⇥]AT (ej , ✓)AT (ek , ) Aijk ( , ✓, ) (8.56)
The identity A[x⇥]AT = [(Ax)⇥], which holds for any proper orthogonal A, gives
! = ˙ ek + ✓A(e
˙ ˙
k , )ej + A(ek , )A(ej , ✓)ei = A(ek , )M[
˙ ✓˙ ˙ ]T (8.57)
8.8. ADDITIVE EKF (AEKF) 87
where
M = [A(ej , ✓)ei ej ek ] (8.58)
The second equality in Eq. (8.57) makes use of A(ek , )ek = ek . The Euler angle rates as
functions of the angular velocity are
[ ˙ ✓˙ ˙ ]T = M 1
AT (ek , )! (8.59)
This kinematic equation is singular if the determinant of M is zero. Equation (8.7) and the
Euler axis requirement that ei · ej = ej · ek = 0 gives
det M = [A(ej , ✓)ei ] · (ej ⇥ ek ) = cos ✓[ ei · (ej ⇥ ek )] sin ✓(ei · ek ) (8.60)
The triple vector product ei ·(ej ⇥ek ) is zero for the symmetric Euler angles, so the kinematic
equations of these representations are singular if sin ✓ = 0. The dot product ei · ek is zero
for the asymmetric Euler angles, so the kinematics of these representations are singular if
cos ✓ = 0. This singularity is known as gimbal lock and is caused by collinearity of the
physical rotation axis vectors of the first and third rotations in the sequence. Note that the
column vector representations of these rotation axes are always parallel for the symmetric
Euler angle sequences and always perpendicular for the asymmetric sequences, but this
neither causes nor prevents gimbal lock.
Because Euler angles are discussed in many references on rotational motion and because
they are not widely used in navigation filters, they will not be discussed further here.
Kinematic equations and explicit forms of the attitude matrices for all twelve sets can be
found in Refs. [28, 35, 49, 74].
The constraints on the representations are satisfied because Atrue , A, and  are all proper
orthogonal matrices, and qtrue , q, and q̂ all have unit norm. The MEKF avoids the attitude
restrictions or switching required by additive attitude EKFs because the error vector #
is assumed to be small enough to completely avoid singularities in the parameterizations
A( #) or q( #). In some sense, though, Eq. (8.64) incorporates a continuous switching
of the attitude reference.
Only the quaternion version of the MEKF, which is much more widely employed, is
presented here. Reference [43] reviews its history. Any three-component attitude represen-
tation that is related to first order in # to the quaternion by
#/2
q⇡ (8.65)
1
can be used as the error vector. Common choices are the rotation vector, as suggested by
the notation of Eq. (8.64), two times the vector part of the quaternion, two times the vector
of Rodrigues parameters, four times the vector of MRPs, or a vector of suitably indexed
roll, pitch, and yaw angles [49].
The order of the factors on the right side of Eq. (8.64) means that the attitude errors
are in the body reference frame. This leads to a major advantage of the MEKF, that
the covariance of the attitude error angles in the body frame has a transparent physical
interpretation. The covariance of estimators using other attitude representations has a
less obvious interpretation unless the attitude matrix is close to the identity matrix. It is
possible to reverse the order of the factors on the right side of Eq. (8.64) so the attitude
errors are in the reference frame [19]. The covariance can be mapped into the body frame
if desired.
The MEKF estimates # and any other state variables of interest. This discussion
addresses only the attitude part, as the equations for the other components of the state
vector obey the usual EKF equations. The MEKF proceeds by iteration of three steps:
measurement update, state vector reset, and propagation to the next measurement time.
The measurement update step updates the local error state vector. The reset moves the
updated information from the local error state to the global attitude representation and
resets the components of the local error state to zero. The propagation step propagates the
global variables to the time of the next measurement. The local error state variables do not
need to be propagated because they are identically zero over the propagation step. These
three steps will be discussed in more detail.
8.9.1. Measurement Update The observation model in given in terms of the true
global state
y = h(qtrue ) + v (8.66)
but the measurement sensitivity matrix is the partial derivative with respect to the local
error state, so the measurement sensitivity matrix is
@h @qtrue
H= (8.67)
@q @( #)
Equations (8.64), (8.65), (8.18b), and (8.20) give, to first order in the error vector,
#/2 1
qtrue ⇡ ⌦ q̂ = q̂ + ⌅(q̂) # (8.68)
1 2
90 8. ATTITUDE ESTIMATION
Inserting the partial derivative of this with respect to # into Eq. (8.67) then gives
1 @h
H= ⌅(q̂) (8.69)
2 @q
Simplifications are possible in some special cases.
The Kalman gain computation and covariance update have the standard Kalman filter
forms. The error state update employs the first-order Taylor expansion
E{h(qtrue )} ⇡ h(q̂) + H #̂ (8.70)
giving
+ ⇥ ⇤ ⇥ ⇤
#̂ = #̂ + K y E{h(qtrue )} = (I KH) #̂ + K y h(q̂ ) (8.71)
8.9.2. Reset The discrete measurement update assigns a finite post-update value to
+
#̂ , but the global state still retains the value q̂ . A reset procedure is used to move the
update information to a post-update estimate global statef vector q̂+ , while simultaneously
resetting #̂ to 03 , the three-component vector with all zero components. The reset does
not change the overall estimate, so the reset must obey
+
q̂+ = q(03 ) ⌦ q̂+ = q( #̂ ) ⌦ q̂ (8.72)
Thus the reset moves information from one part of the estimate to another part.
Every EKF includes an additive reset of the global state vector, but this is usually
implicit rather than explicit. The multiplicative quaternion reset is the special feature of
the MEKF. This reset has to preserve the quaternion norm, so an exact unit-norm expression
for the functional dependence of q on # must be used, not the linear approximation of
Eq. (8.65). Using the Rodrigues parameter vector has the practical advantage that the reset
operation for this parameterization is
+
+ + 1 #̂ /2 ⌦ q̂
q̂ = q( #̂ ) ⌦ q̂ = q (8.73)
+ 1
1 + k #̂ /2k2
Using an argument similar to Eq. (8.68), this can be accomplished in two steps:
1 +
q0 = q̂ + ⌅(q̂ ) #̂ (8.74)
2
followed by
q0
q̂+ = 0 (8.75)
kq k
The first step is just the usual linear Kalman update, and the second step is equivalent to
a brute force normalization of the updated quaternion. Thus the MEKF using Rodrigues
parameters for the error vector provides a theoretical justification for brute force renormal-
ization, with the added advantage of completely avoiding the accumulation of quaternion
norm errors after many updates. The Rodrigues parameters also have the conceptual ad-
vantage that they map the rotation group into three-dimensional Euclidean space, with the
largest possible 180 attitude errors mapped to points at infinity. Thus probability distri-
butions with infinitely long tails, such as Gaussian distributions, make sense in Rodrigues
parameter space.
If a measurement update immediately follows a reset or propagation, the #̂ term on
the right side of Eq. (8.71) can be omitted because #̂ is zero. The reset is often delayed
8.9. MULTIPLICATIVE EKF (MEKF) 91
for computational efficiency until all the updates for a set of simultaneous measurements
have been performed, though, in which case #̂ may have a finite value and all the terms
in Eq. (8.71) must be included. It is imperative to perform a reset before beginning the time
propagation, however, to avoid the necessity of propagating #̂ between measurements.
There is some controversy over the question of whether the reset a↵ects the covariance.
One argument holds that it doesn’t because the covariance depends not on the actual
measurements but on their assumed statistics. Measurement errors are assumed to have
zero mean, so the mean reset is zero. But the reset is very di↵erent from the measurement
update in that it changes the reference frame for the attitude covariance, which might be
expected to modify the covariance even though it adds no new information. The change
in the covariance of # resulting from the e↵ect of the actual update, rather than its zero
expectation, can be computed to be [47, 60]
⇣ +
⌘ ⇣ +
⌘T
+
Preset
## = I 3 [ #̂ ⇥]/2 P ## I 3 [ #̂ ⇥]/2 (8.76)
+
to first order in #̂ . Comparison with Eq. (8.7) shows that the covariance reset looks to
+
this order like a rotation by #̂ /2, but this equivalence does not hold in higher orders. Most
applications omit this covariance reset, but Reynolds has found that it speeds convergence
and adds robustness in the presence of large updates, and that omitting it can even lead to
filter divergence in some cases [60].
8.9.3. Propagation An EKF must propagate the expectation and covariance of the
state. The MEKF is unusual in propagating the expectation q̂ and the covariance of the
error-state vector. The propagation of the attitude error is found by by di↵erentiating
Eq. (8.64):
q̇true = q̇ ⌦ q̂ + q ⌦ q̂˙ (8.77)
The true and estimated quaternions satisfy the kinematic equations
true 1 ! true
q̇ = ⌦ qtrue (8.78a)
2 0
1 ! ˆ
q̂˙ = ⌦ q̂ (8.78b)
2 0
where ! true and !
ˆ are the true and estimated angular rates, respectively. Substituting these
equations and Eq. (8.64) into Eq. (8.77), multiplying on the right by q̂ 1 , and rearranging
terms gives [43]
✓ true ◆
1 ! !ˆ
q̇ = ⌦ q q⌦ (8.79)
2 0 0
Substituting Eq. (8.65) and ! true = !
ˆ + !, where ! is the angular velocity error, and
multiplying by two leads to
#̇ ˆ
! #/2 #/2 ˆ
! ! #/2
= ⌦ ⌦ + ⌦ (8.80)
0 0 1 1 0 0 1
Ignoring products of the small terms ! and #, in the spirit of the (linearized) EKF, the
first three components of Eq. (8.80) are
#̇ = ˆ ⇥ #+ !
! (8.81)
92 8. ATTITUDE ESTIMATION
and the fourth component is 0 = 0. Equation (8.81) is the equation needed to propagate
the covariance of the attitude error-angle covariance.
The expectation of Eq. (8.81) is
˙
#̂ = ! ˆ ⇥ #̂ (8.82)
because ! has zero expectation. This says that if #̂ is zero at the beginning of a prop-
agation it will remain zero through the propagation, which is equivalent to saying that q̂
will be equal to the identity quaternion throughout the propagation.
CHAPTER 9
Usability Considerations
This chapter is a catch-all to address best practices for things like selective processing
of measurements, backup ephemeris, reinitializations and restarts, availability of ground-
commandable tuning parameters, etc.
9.1. Editing
Let r = y h(x), where y is the observed measurement, h(x) is the value of the measure-
ment computed from the state x, y = h(x) + v, and v is the measurement noise, E[v] = 0,
E[vvT ] = R. The quantity r is known as the innovation or sometimes the pre-fit residual.
The covariance of r is given by
W = HPHT + R (9.1)
where P = E[ee ], H = @h(x)/@x, and e is the error in the estimate of x. The squared
T
Experience has shown that the fairly drastic step of completely re-initializating the
filter may not always be necessary or prudent. In particular, under conditions in which it
is reasonably clear that the filter has begun to edit measurements because its covariance
matrix has become overly optimistic, but there remains reason to believe that the state
estimate has not yet become corrupted, it may be beneficial to reinitialize the covariance
while retaining the current state estimate. It may also be desirable to retain flexibility
to retain only the position/velocity state components, while reinitializing the various bias
components.
If the filter has halted for some reason other than divergence (e.g. a flight computer
reset), or if the start of the divergence can be reliably determined, it may be useful to
“restart” the filter from a previously saved state and covariance, especially if there would
otherwise be a long time required for re-convergence. To enable such a restart capability,
the full state and covariance must have been periodically saved, and then they must be
propagated to the restart time.
Periodically saving the filter state also enables the capability to maintain a backup
ephemeris, which provides an additional comparison source for evaluating filter divergence.
For flight phases of limited duration, the backup ephemeris may be propagated inertially
without any measurement updates. For extended operations, it will usually be necessary
to re-seed the backup with a current filter state at periodic intervals. In some applications,
such as GPS filtering, an independent “point solution” may also serve as a useful comparison
source.
9.3. Ground System Considerations
While it may seem intuitive that all parameters a↵ecting filter performance should be
available for re-tuning from the ground via mechanisms such as commands, table uploads,
etc., experience has shown that decisions about ground system design often limit the accessi-
bility of key tuning parameters. Adequate bandwidth in telemetry for full insight into filter
performance, including access to full covariance data, must be available, if only for limited
periods during commissioning and/or troubleshooting activities. The ground system must
be able to reproduce the onboard filter’s performance when provided with corresponding
input data via telemetry playback. And the ground system must also be able to form “best
estimated trajectories” for comparison to onboard filter performance, e.g. through the use
of smoothing algorithms.
CHAPTER 10
Smoothing
Since this work is primarily concerned with onboard navigation filters, one might ques-
tion the need for a chapter on best practices for smoothing. While the addition of a smoother
to an onboard navigation system has usually proved unnecessary, smoothing has nonethe-
less proved to be a useful ancillary capability for trajectory reconstruction by ground-based
analysts. Smoothed trajectories form the basis for our best proxies for truth, in the form of
“best estimated trajectories,” (BET) and McReynold’s “filter-smoother consistency test,”
propagated by Jim Wright [77], has proven to be a useful aid to tuning a filter using flight
data. Also, sequential smoothing techniques can provide optimal estimates of the process
noise sequence, as Appendix X of Bierman’s text [2] shows. These estimates may prove
useful as part of the filter tuning process.
It is also worth mentioning the topic of “smoothability.” As described in for exam-
ple Gelb [20], only states that are controllable from the process noise will be a↵ected by
smoothing. So for example, estimates of random constant measurement biases cannot be
improved by smoothing.
In point of fact there are three types of smoothing: fixed-interval smoothing, fixed-lag
smoothing, and fixed-point smoothing. The context described above is concerned with fixed-
interval smoothing. Maximum likelihood estimation (MLE) of states over a fixed interval has
been subject of investigation ever since the advent of the Kalman filter [36] in 1961. In 1962,
Bryson and Frazier first approached the problem from a continuous time perspective [4] and
obtained the smoother equations as necessary conditions of an optimal control problem†. In
1965, Rauch, Tung and Striebel [59] (RTS) continued the development of the MLE filters but
from a discrete time perspective. Their smoother, soon called the RTS smoother, was widely
used because of its ease of implementation. However, as Bierman [2] and others [51] pointed
out, there can sometimes arise numerical difficulties in implementing the RTS smoother. A
short time later Fraser and Potter [17,18] approached the problem a bit di↵erently, looking
at smoothing as a optimal combination of two optimal linear filters and obtained di↵erent,
yet equivalent, equations. Bierman’s Square-Root Information Filter [2] (SRIF) also has
an accompanying smoother form, suitable for applications utilizing the SRIF. Since the
Fraser-Potter form avoids the numerical issues of the RTS form, and since it can be easily
adapted from existing onboard Kalman-type forward filtering algorithms, it is generally to
be preferred.
†The Bryson-Frazier smoother is a continuous time instantiation of the smoother. It won’t be considered
here for we are interested in discrete smoothers. [4]
95
96 10. SMOOTHING
The boundary conditions for the Fraser-Potter (FP) smoother require the backward
covariance at the final time to be infinite, and the backward filter’s final state to be zero.
Fraser and Potter avoided the infinity by maintaining the backward filter covariance in
information form, so that both the information matrix and the information vector are
zero. As Brown points out [3], the backward filter may be retained in covariance form,
and the infinite boundary condition covariance replaced by either a covariance that is many
multiples of the forward filter’s initial covariance, or by the covariance and state from a short
batch least-squares solution using the final few measurements. Many practical smoother
implementations used by NASA have followed an approach of this sort.
Thus, a practical covariance form of the FP smoother results from running whatever
implementation of Algorithm 1.3 has proved suitable for the application at hand, but in
reverse time, and combining the backward filter results with the forward filter results at
each measurement time. Given the forward filter state and covariance, X̂F+i and P+ Fi , which
include the measurement at ti , and the backward filter state and covariance, X̂Bi and PBi ,
which do not include the measurement at ti , the optimally smoothed state and covariance
at ti are given by
h i
X̂iS = PSi (P+ Fi ) 1 +
X̂ Fi + (P Bi ) 1
X̂ Bi (10.1)
h i 1
PSi = (P+ Fi )
1
+ (PBi ) 1 (10.2)
If covariance form is to be retained, the tedious number of inverses apparent in Eqs. (10.1)
and (10.2) may be avoided as follows. Suppose we define the smoothed state as a linear
fusion of the forward and backward filter states:
X̂iS = WFi X̂F+i + WBi X̂Bi (10.3)
For X̂iS to be unbiased, we must choose either WFi = I WBi or WBi = I WFi . Choosing
the latter, the smoothed state becomes
X̂iS = WFi X̂F+i + (I WFi ) X̂Bi (10.4)
Given the enforced lack of correlation between the forward and backward filters, the fused
(smoothed) covariance is given by
PSi = WFi P+ T T
F i W F i + W B i P B i PB i W B i (10.5)
= W F i P+ T
Fi WFi + (I WFi ) PBi (I W Fi ) T (10.6)
Choosing WFi to minimize the trace of PSi results in
WFi = PBi (P+
F i + PB i )
1
(10.7)
To see that Eq. (10.6), with Eq. (10.7), is equal to Eq. (10.2), expand Eq. (10.6),
substituting Eq. (10.7), and recall Woodbury’s identity:
h i 1
(P+
Fi ) 1
+ (P Bi ) 1
= PBi PBi (P+ 1
F i + P B i ) PB i (10.8)
= P+
Fi P+ +
Fi (PFi + PBi )
1
P+
Fi (10.9)
To see that Eq. (10.4), with Eq. (10.7), is equal to Eq. (10.1), use Eq. (10.8) to show that
PSi (PBi ) 1 = WBi and use Eq. (10.9) to show that PSi (P+ Fi )
1 =W .
Fi
In a typical application, the forward filter has been running continuously onboard the
vehicle, and ground-based analysts will periodically wish to generate a BET over a particular
10. SMOOTHING 97
span of recently downlinked data. If the telemetry system has recorded and downlinked
the full state and covariance at each measurement time, along with the measurements,
the ground system need only run a “copy” of the forward filter backwards through the
measurements and fuse the data according to Eqs. (10.1) and (10.2). Care must be taken
that regeneration of the state transition matrices and process noise covariances is consistent
with the forward filter’s modeling, and with the negative flow of time.
For various reasons, it may be necessary or desirable to run the forward filter on the
ground as well, e.g. with higher-fidelity modeling than the onboard implementation permits.
If so, it is efficient to store the state transition matrices and process noise covariances
computed in the forward pass for use in the backward filter. In this case, Brown shows that
the backward covariance may be propagated using
h i
1
PBi = i+1,i P+
Bi+1 + S i+1
T
i+1,i (10.10)
CHAPTER 11
This chapter will describe advanced estimation algorithms that have yet to achieve the
status of best practices, but which appear to the contributors to have good potential for
someday reaching such status.
The Sigma-Point Filter In its most general form, the sigma-point filter performs
sequential estimation of the n-dimensional state, x, whose nonlinear dynamics over the
time interval [tk , tk+1 ] are given by
xk+1 = f (xk , wk ) (11.1)
The process noise input, w, consists of independent increments whose first two moments
are E[wk ] = 0 and E[wk w` ] = Qk k` , where k` is the Kronecker delta. Although the
second moment may be a function of the time index, this estimator assumes that all of the
samples of w arise from the same type of distribution, and this work further assumes that
this distribution is Gaussian, so that higher-order moments may be neglected.
The filter sequentially processes an ordered set of measurements, Yk = [y0 , y1 , ..., yk ] of
the form
yk = h(xk , vk ) (11.2)
where the measurement noise input, v, consists of independent and identically distributed
(again, in this work, Gaussian) increments whose first two moments are E[vk ] = 0 and
99
100 11. ADVANCED ESTIMATION ALGORITHMS
E[vk v` ] = Rk k` . By contrast, the ADDSPF utilizes models where the noise sources enter
additively:
xk+1 = f (xk ) + g(xk )wk (11.3)
and
yk = h(xk ) + vk . (11.4)
All sigma-point filters utilize a linear measurement update equation of the form
x̂+
k = x̂k + Kk yk ŷk (11.5)
where the accented variables in Eq. 11.5 denote conditional expectations, as in the Kalman
filter:
x̂+
k = E[xk |Yk ] (11.6)
x̂k = E[xk |Yk 1] (11.7)
ŷk = E[yk |Yk 1] (11.8)
The gain matrix, K, is based on conditional covariances, as in the Kalman filter:
1
Kk = Pxyk Pyyk (11.9)
h i
T
Pk = Pxxk = E xk x̂k xk x̂k |Yk 1 (11.10)
h i
T
Pxyk = E xk x̂k yk ŷk |Yk 1 (11.11)
h i
T
Pyyk = E yk ŷk yk ŷk |Yk 1 (11.12)
(11.13)
and the covariance associated with state estimate x̂+
k is
h i
T
P+ +
k = Pxxk = E xk x̂+
k xk x̂+
k |Yk (11.14)
Hereafter, equations will suppress the time index if it is the same for all variables in the
equation.
Estimators such as the Kalman filter estimate these conditional expectations by approx-
imating the nonlinear functions f and h with first-order Taylor series truncations, e.g.:
f (x) ⇡ f (x̂ ) + f 0 (x)(x x̂ ) (11.15)
where f 0 is an exact gradient. By contrast, the divided di↵erence filter uses a second-order
truncation along with numerical di↵erencing formulas for the derivatives:
(1) (2)
f (x) ⇡ f (x̂ ) + D̃ x f (x̂ ) + D̃ x f (x̂ ) (11.16)
(i)
where the divided di↵erence operators, D̃ x f (x̂ ), approximate the coefficients of the mul-
tidimensional Taylor series expansion using Stirling interpolations. These interpolators
di↵erence perturbations of f (x̂ ) across an interval, h, over a spanning basis set. Whether
they are first-order, such as the unscented filter, or second order, sigma-point filters choose
the interval so as to better approximate the moments required for the gain calculation, and
choose as the spanning basis a set of sigma points, which are derived from x̂ and the
columns of the Cholesky factors of P as follows.
THE SIGMA-POINT ESTIMATOR 101
The ADDSPF Let X̂ denote the array whose columns are a particular ordering of the
sigma points derived from x̂ and its corresponding covariance, P. Then
h p p p p i
c c c c
X̂ = x̂, x̂ + h P(:,1) , x̂ + h P(:,2) , ..., x̂ h P(:,1) , x̂ h P(:,2) , ... (11.17)
p p T
where the subscript (:, i) denotes column i of the corresponding array, andpP = c P c P
denotes a Cholesky factorization. In the sequel, the shorthand notation x̂±h c P will denote
the array on the right-hand side of the equation above. Then, for the ADDSPF, Ref. 41
shows that as each new measurement becomes available, an array of sigma points generated
from the prior update should be propagated to the new measurement time:
X̂k = f (X̂k+ 1 ) (11.18)
These propagated sigma points are then merged to form the state estimate just prior to
incorporating the new measurement as follows:
2n+1
h2 n 1 X
x̂ = µh (X̂ ) = X̂(:,1) + 2 X̂(:,i) (11.19)
h2 2h
i=2
To form an associated covariance, the following divided-di↵erences are next computed:
(1) 1 h i
D̃ x f (x̂ )(:,i) = X̂(:,i+1) X̂(:,i+1+n) (11.20)
2h
p
(2) h2 1 h i
D̃ x f (x̂ )(:,i) = X̂ (:,i+1) + X̂ (:,i+1+n) 2X̂ (:,1) (11.21)
2h2
Ref. 41 shows that the covariance may then be computed from
h (1) (2) p i h (1) (2) p iT
P = D̃ x f (x̂ ), D̃ x f (x̂ ), c Qd D̃ x f (x̂ ), D̃ x f (x̂ ), c Qd (11.22)
One advantage of sigma-point filters is that the full covariance need not be maintained,
but rather only its Cholesky factor. Although the factors in square brackets in Eq. 11.22
are not Cholesky factors, since each is a full n ⇥ 3n matrix, one may extract an n ⇥ n
triangular factor from it using the so-called “thin” version [23] of the QR decomposition1,
or alternatively using a Householder factorization [2]. Thus,
p h (1)
c
P
T
(2) p iT
M = D̃ x f (x̂ ), D̃ x f (x̂ ), c Qd (11.23)
O2n⇥n
where M is a full 3n ⇥ 3n orthonormal matrix, and O2n⇥n is a 2n ⇥ n matrix of zeros.
For the measurement update, a new array of sigma points must be generated from x̂
and P ; this array is denoted X̂ ⇤ . These sigma points are used to generate a set of sigma
points representing the measurement:
Ŷ = h(X̂ ⇤ ) (11.24)
In similar fashion to the time update, the sigma points of the measurement are then merged
to form the estimated measurement:
ŷ = µh (Ŷ ) (11.25)
1For Matlab users, this may be accomplished in several ways, e.g. by passing the transpose of this
matrix to the qr function, then keeping the first n non-zero rows from the second output, and transposing
this result.
102 11. ADVANCED ESTIMATION ALGORITHMS
least one free parameter. In the unscented filter, the weights for combining the sigma points
involve three parameters whose physical interpretation is perhaps less clear than with the
single parameter in the divided di↵erence algorithms, where the free parameter h is clearly
associated with the size of the perturbation in the numerical di↵erencing formulae. Ref. p53
shows that h should be bounded below by h > 1, and that for symmetric distributions, h
should be equal to the kurtosis, which for a Gaussian distribution is three3.
3Some authors subtract three from the definition of kurtosis, so that Gaussian distributions have zero
kurtosis.
APPENDIX A
A continuous random variable is a function that maps the outcomes of random events to
the real line. Realizations of random variables are thus real numbers. A vector of n random
variables maps outcomes of random events to Rn . For our purposes, random variables
will always be associated with a probability density function that indicates the likelihood
that a realization occurs within a particular interval of the real line, or within a particular
subspace of Rn for the vector case. It is common to assume that this density is the normal
or Gaussian density. For the vector case, the normal probability density function is
1 1
(x µ)T P 1
(x µ)
px (x) = p e 2 (A.1)
|2⇡P|
where µ is a vector of mean values for each component of x, and P is a matrix that con-
tains the variances of each component of x along its diagonal, and the covariances between
each component as its o↵-diagonal components. The covariances indicate the degree of
correlation between the random variables composing x. The matrix P is thus called the
variance-covariance matrix, which we will hereafter abbreviate to just “covariance matrix,”
or “covariance.” Since the normal density is completely characterized by its mean and co-
variance, we will use the following notation as a shorthand to describe drawing a realization
from a normally-distributed random vector:
x ⇠ N (µ, P) (A.2)
Thus, the model for realizations of a measurement noise vector is
v ⇠ N (0, R) (A.3)
For the scalar case, or for the vector case when the covariance is diagonal, we may
directly generate realizations of a normally-distributed random vector from normal random
number generators available in most software libraries. If P has non-zero o↵-diagonal el-
ements, we must model the specified correlations when we generate realizations. If P is
strictly positive definite, we can factor it as follows:
p
C
p
C T
P= P P (A.4)
p
where C P is a triangular matrix known as a Cholesky factor; this can be viewed as a “matrix
square root.” p The Cholesky factorization is available in many linear algebra libraries. We
C
can then use P to generate correlated realizations of x as follows. Let z be a realization
of a normally-distributed random vector of the same dimension as x, with zero mean and
105
106 A. RANDOM VARIABLES
so Eq.(B.16) becomes
!
T
e aa D
e aa U
e + T deb qb eT
e ab U
Ŭaa D̆aa Ŭaa = U aa U ab (B.18)
d˘b
and
n
X n
X
peii = e2ik dekk =
u u2ik dkk + cx2i (B.23)
k=i k=i
eii = uii = 1 and thus, for an n ⇥ n matrix, for j = n (i.e. the last column),
We recall that u
denn = dnn + c x2n (B.24)
ein denn u
pein = u enn = dnn uin unn + cxi xn (B.25)
so that
1
ein =
u (dnn uin + cxi xn ) (B.26)
denn
110 B. FACTORIZATION MATH
The second-to-the-last (n 1-th) column of U can now be can be operated on, by means
of the following decomposition of Eq. (B.22), as
n
X1 n
X1
eik dekk u
u ein denn u
ejk + u ejn = uik dkk ujk + uin dnn ujn + cxi xj (B.27)
k=j k=j
which leads to
n
X1 n
X1
eik dekk u
u ejk = uik dkk ujk + ⌥n (B.28)
k=j k=j
ein and u
If we work on the terms outside the two summations, using Eq. (B.26) for u ejn , ⌥n
becomes
⌥n = ein denn u
u ejn + uin dnn ujn + cxi xj
1
= [dnn uin + cxi xn ] [dnn ujn + cxj xn ]
denn
dnn + c x2n
+ (uin dnn ujn + cxi xj )
denn
1 ⇥
= c dnn ujn xn xi c dnn uin xn xj + c dnn xi xj + c dnn x2n uin ujn
e
dnn
c dnn
= (xi uin xn ) (xj ujn xn ) (B.29)
denn
Therefore, Eq. (B.27) can be written as
n
X1 n
X1 c dnn
eik dekk u
u ejk = uik dkk ujk + (xi uin xn ) (xj ujn xn ) (B.30)
k=j k=j denn
Now, if we operate a bit more on the quantity uein , we find from Eq. (B.26), that we get
dnn c
ein =
u uin + xi xn (B.31)
e
dnn e
dnn
denn c x2n c
= uin + xi xn (B.32)
denn denn
c xn
= uin + (xi uin xn ) (B.33)
denn
and if we define ↵i and vn as
↵i = (xi uin xn ) (B.34)
c xn
vn = (B.35)
denn
ein can be rewritten as
u
ein = uin + ↵i vn
u (B.36)
If we want to generalize this, we can write Eq. (B.30) as
n
X1 n
X1
eik dekk u
u ejk = uik dkk ujk + C n Xin Xjn (B.37)
k=j k=j
B.3. THE AGEE-TURNER RANK-ONE UPDATE 111
where
c dnn
Cn = (B.38)
denn
Xin = xi uin xn (B.39)
with
which produces
n
X2 n
X2
eik dekk u
u ejk = uik dkk ujk
k=j k=j
C n dn 1,n 1 n n
⇥ ⇤
+ [Xi ui,n 1 Xn ] Xjn uj,n n
1 Xn (B.41)
d˜n 1,n 1
so that using the same machinery as above, we get
n
X2 n
X2
eik dekk u
u ejk = uik dkk ujk + C n 1
Xin 1
Xjn 1
(B.42)
k=j k=j
C n dn 1,n 1
Cn 1
= (B.43)
den 1,n 1
Xin 1
= ↵in 1 = Xin ui,n n
1 Xn (B.44)
C n Xin
vn 1 = (B.45)
den 1,n 1
ei,n
u 1 = ui,n 1 + Xin 1
vn 1 (B.46)
den 1,n 1 = dn 1,n 1 + C n x2n 1 (B.47)
ei,i = 1.
We also are reminded that u
112 B. FACTORIZATION MATH
eD e T = D̄
eU 1 T
U vv (B.61)
↵
which can be rewritten as:
eD e T = ŪD̄ŪT
eU 1 T
U vv (B.62)
↵
where Ū = I. Following the reasoning in the description for the Rank-One update earlier
in this Appendix,
n
X 1
peij = eik dekk u
u ejk = vi vj (B.63)
↵
k=j
and
n
X 1 2
peii = e2ik dekk = dii
u v (B.64)
↵ i
k=i
ein and u
Substituting for u ein from Eq. (B.68), we find
n
X1
e 1 1 2
eik dkk u
u ejk = 1+ vn vi vj (B.70)
↵ e
↵dnn
k=j
But since
1 2 dnn
1+ vn = (B.71)
↵denn denn
114 B. FACTORIZATION MATH
e e 1 2
e2n
u 1,n 1 dn 1,n 1 e2n
+u 1,n dn,n = dn 1 v (B.74)
↵ n 1
en
Recalling that u 1,n 1 en
= 1 and u 1,n was obtained (with i = n 1) in Eq. (B.68), we get
" #
1 1
den 1,n 1 = dn 1,n 1 1+ vn vn2 1
2
(B.75)
↵ e
↵dn,n
Knowing that
" #
1 dn,n
1+ vn2 = (B.76)
↵den,n den,n
ei,n
Now we work on u 1. We recall that from Eq. (B.64), with j = n 1, we find that
e 1
ei,n
u en 1,n 1
1 dn 1,n 1 u ei,n den,n u
+u en 1,n = vi vn 1 (B.78)
↵
We substitute for den 1,n 1 from Eq. (B.77), for u
ei,n and uen 1,n from Eq.(B.68) and noting
en 1,n 1 = 1, we get
that u
" #
1 1 2
ei,n 1 =
u 1+ vn vi vn 1 (B.79)
↵den 1,n 1 ↵den,n
ei,n
Using Eq.(B.76), u 1 becomes
!
1 dn,n
ei,n
u 1 = vi vn 1 (B.80)
↵den 1,n 1 den,n
B.5. THE CARLSON RANK-ONE UPDATE 115
With this in mind, are now prepared to work on Eq. (B.73) and we find that
n
!" #
X2 d n,n d n,n
eik dekk u
u ejk = v2 1 vi vj
↵ en,n
d ↵ en,n n 1
d
k=j
! !
dn,n dn 1,n 1
= vi vj (B.81)
↵den,n den 1,n 1
This has the same form as Eq. (B.69), so this suggests a recursion as follows:
With Cn = 1/↵ for j = n, · · · , 1:
dejj = djj + C j vj2 (B.82)
eij
u = C j vi vj /dejj , k = 1, · · · , j 1 (B.83)
C j 1
= C djj /dejj
j
(B.84)
From Eq. (B.82), we get
dejj = djj + C j vj2
and from Eq. (B.84), we find that
djj C j djj djj
Cj 1
= Cj = = (B.85)
dejj djj + C j vj2 djj
+ vj2
Cj
This can be written as
1 1 vj2
= + (B.86)
Cj 1 Cj djj
or
1 1 vj2
= + (B.87)
Cj Cj 1 djj
Comparing Eqs. (B.60) and Eq. (B.87) we find that
1
↵j = (B.88)
Cj
Using this equation, we find that
✓ ◆
↵j 1
dejj = djj (B.89)
↵j
eij as
From Eq.(B.83) and using Eqs. (B.89) and (B.88), we can express u
vi vj
eij =
u (B.90)
djj ↵j 1
Recalling that vj = djj fj ,
vi fj
eij
u = (B.91)
↵j 1
If we define j as
fj
j = (B.92)
↵j 1
116 B. FACTORIZATION MATH
eij becomes
u
eij
u = j vi (B.93)
eij has the structure
U
2 3
1 2 v1 3 v1 4 v1 ··· n v1
6 0 1 3 v2 4 v2 ··· n v2
7
6 7
6 0 0 1 4 v3 ··· 7
n v3 7
e =6
U 6 7 (B.94)
6 0 0 0 1 ··· n v4 7
6 .. .. .. .. .. .. 7
4 . . . . . . 5
0 0 0 0 ··· 1
e as
We can rewrite U
⇥ ⇤
Ue = In + 0n⇥1 2v
(1)
3v
(2)
4v
(3) ··· nv
(n 1) (B.95)
where v(j) is an n ⇥ 1 vector defined as
⇥ ⇤T
v(j) = v1 v2 v3 · · · vj 0 ··· 0 (B.96)
We recall that
T 1 T T
UDU = U D vv Ū
↵
and
eD eT = D
eU 1 T
U vv
↵
and
e
U = UU and e
D=D
With this in mind, U is
⇥ (1) (3) (3) (n 1)
⇤
U=U+U 0n⇥1 2v 3v 4v ··· nv (B.97)
(j)
If we denote U(j) and U as the jth columns of U and U, respectively, we find that
(j)
U(j) = U + j Kj 1 (B.98)
where
(j)
Kj = Uv(j) = Kj 1 + vj U with K0 = 0n⇥1 (B.99)
Finally,
1
K= Kn (B.100)
↵n
APPENDIX C
This appendix describes a dual inertial-absolute state and dual inertial-relative state
navigation filter trade study performed for Orion. The formulation of each of these filters is
detailed, the advantages and disadvantages of each are discussed, and a recommendation to
use the dual-inertial formulation is made. This appendix is reproduced from CEV Flight
Dynamics Technical Brief Number FltDyn–CEV–07–141, dated December 21, 2007.
C.1. Introduction
Orion will need an efficient and well formulated relative navigation filter. Among the
many possibilities, two of the most promising will be discussed in this report. The two are
the dual inertial-absolute state navigation filter and the dual inertial-relative state naviga-
tion filter. The dual inertial-absolute state filter includes the absolute inertial state of both
vehicles (with respect to the center of mass of the central body). The dual inertial-relative
state navigation filter has as its states the absolute inertial state of the chaser (Orion) vehi-
cle and the relative inertial state of the target with respect to the chaser (xrel = xT xC ).
1The expectation operator E(·) for continuous random variables is defined as follows
Z 1
E(X) = x p(x) dx
1
117
118 C. DUAL INERTIAL VS. INERTIAL/RELATIVE TRADE
can be expressed as
ẋCnom = fC (xCnom ) (C.3)
ẋTnom = fT (xtnom ) (C.4)
Defining
xC = xC xCnom (C.5)
xT = xT xTnom (C.6)
and taking derivatives and expanding to first-order, we get
ẋC = AC (xCnom ) xC + wC (C.7)
ẋT = AT (xTnom ) xT + wT (C.8)
where
✓ ◆ ✓ ◆
@fC @fT
AC (xCnom ) = and AT (xTnom ) = (C.9)
@xC xC =xCnom @xT xT =xTnom
Equivalently, we can express the filter errors as
x̂C = x̂C xC (C.10)
x̂T = x̂T xT (C.11)
2
where, with a bit of abuse of notation
x̂C = E(xC ) and x̂T = E(xT ) (C.12)
so that the filter error dynamics evolve as
x̂˙ C = AC (xC ) x̂C + wC (C.13)
ẋT = AT (xT ) x̂T + wT (C.14)
We can, therefore, write the inertial-absolute3 filter error dynamics (dropping the functional
dependence for compactness) as
x̂˙ C AC 0 x̂C I 0 wC
˙x̂T = 0 AT x̂T
+
0 I wT
(C.15)
Defining
⇢
x̂C ⇥ ⇤ E[ x̂C x̂TC ] E[ x̂C x̂TT ]
PC,T PC,C
PIA = E x̂TC x̂TT = =
x̂T E[ x̂T x̂TC ] E[ x̂T x̂TT ]
PT,T PT,C
(C.16)
where the subscript IA denotes that this is the covariance associated with the inertial-
absolute filter. The di↵erential equation for the covariance (assuming that the plant/process
noise for the two vehicles are independent and are independent of the states of the two
vehicles) is
ṖIA = AIA PIA + PIA ATIA + GIA QIA GTIA (C.17)
2To be precise, x̂ should be written in terms of the conditional expectation
C
where
AC 0 I 0 QC 0
AIA = , GIA = , QIA = (C.18)
0 AT 0 I 0 QT
with the initial condition
PC,C 0 0
PIA = (C.19)
0 PT,T 0
where PC,C 0 and PT,T 0 are the initial covariances of the chaser and target inertial (absolute)
states, respectively. Finally, the covariance of the relative state is
h i
Prel,rel = E ( x̂C x̂T ) ( x̂C x̂T )T (C.20)
T
= PC,C + PT,T PT,C PC,T = PC,C + PT,T PT,C PT,C (C.21)
C.2.2. The Dual Inertial-Relative Filter Dynamics Consistent with the earlier
definitions, we define the inertial relative state as
xrel = xT xC (C.22)
Taking derivatives of this equation and substituting from Eqs.(1) and (2) yields
ẋrel = fT (xT ) fC (xC ) + wT wC (C.23)
Expanding this equation to first-order yields
ẋrel = AT (xT ) x̂T AC (xC ) x̂C + wT wC (C.24)
= AT (xT ) ( x̂rel + x̂C ) AC (xC ) x̂C + wT wC (C.25)
= (AT AC ) x̂C + AT x̂rel + (wT wC ) (C.26)
Therefore we write the inertial-relative4 filter error dynamics (once again dropping the
functional dependence for compactness) as
x̂˙ C AC 0 x̂C I 0 wC
= + (C.27)
x̂˙ rel AT AC AT x̂T I I wT
Defining, as before,
⇢
x̂C ⇥ ⇤ E[ x̂C x̂TC ] E[ x̂C x̂Trel ]
PIR = E x̂TC x̂Trel = (C.28)
x̂rel E[ x̂rel x̂TC ] E[ x̂rel x̂Trel ]
PC,C PC,rel
= (C.29)
Prel,C Prel,rel
where the subscript IR denotes that this is the covariance associated with the inertial-
relative filter. The di↵erential equation for the covariance is
ṖIR = AIR PIR + PIR ATIR + GIR QIR GTIR (C.30)
where
AC 0 I 0 QC 0
AIR = , GIR = , QIR = (C.31)
AT AC AT I I 0 QT
4We refer to this filter as the inertial-relative filter to distinguish it from the prior inertial-absolute filter.
In this formulation, the filter states consist of the inertial absolute chaser state and the inertial relative target
state.
120 C. DUAL INERTIAL VS. INERTIAL/RELATIVE TRADE
C.3.1.2. Range Measurements For the case of range measurements (either from the RF
link or from the Lidar), the measurement equation can be written simply as
p
zIA
range = (rT rC ) · (rT rC ) + brange + ⌫range (C.38)
with the range (measurement) noise statistics ⌫range ⇠ N (0, Rrange ). Since the target
state is a member of this filter’s state-space, the measurement partials associated with this
5We assume that at the initial time, the chaser and target initial error covariance matrices are
uncorrelated.
C.3. INCORPORATION OF MEASUREMENTS 121
where TSB is the transformation matrix from the body(IMU) frame to the sensor frame (with
qSB being the quaternion associated with the transformation from body frame to sensor
frame) and TBI is the transformation matrix from the inertial frame to the body(IMU)
frame (with qBI being the quaternion associated with the transformation from the inertial
frame to the body frame).
With this in hand, the measurement partials can be obtained (after a bit of manipulation
[1]) to be
@↵ uT↵
= TSB (qSB )TBI (qBI ) (C.47)
@rC r
@↵
= 01⇥3 (C.48)
@vC
@↵ uT↵
= TSB (qSB )TBI (qBI ) (C.49)
@rT r
@↵
= 01⇥3 (C.50)
@vT
122 C. DUAL INERTIAL VS. INERTIAL/RELATIVE TRADE
and
@ uT
= TSB (qSB )TBI (qBI ) (C.51)
@rC r
@
= 01⇥3 (C.52)
@vC
@ uT
= TSB (qSB )TBI (qBI ) (C.53)
@rT r
@
= 01⇥3 (C.54)
@vT
where
2 3
sin ↵
1 4
u↵ = cos ↵ 5 (C.55)
cos 0
2 3
cos ↵ sin
u = 4 cos ↵ cos 5 (C.56)
cos
C.3.2.2. Range Measurements For the case of range measurements with the inertial-
relative filter, the measurement equation can be written simply as
q
IA
zrange = rTrel rrel + brange + ⌫range (C.59)
with, as before, the range (measurement) noise statistics ⌫range ⇠ N (0, Rrange ). Since the
relative state is a member of this filter’s state-space, the measurement partials associated
with this measurement for the inertial-relative measurement are
@zIA
range
= 01⇥3 (C.60)
@rC
@zIA
range
= 01⇥3 (C.61)
@vC
@zIA
range rTrel
= (C.62)
@rrel |rrel |
@zIA
range
= 01⇥3 (C.63)
@vrel
C.4. ANALYSIS OF THE MERITS OF THE INERTIAL-ABSOLUTE AND INERTIAL-RELATIVE FILTERS
123
the quantities in Eq.(64) are defined in Section 3.1.3. The measurement partials are ex-
pressed as
@↵
= 01⇥3 (C.65)
@rC
@↵
= 01⇥3 (C.66)
@vC
@↵ uT↵
= TSB (qSB )TBI (qBI ) (C.67)
@rrel r
@↵
= 01⇥3 (C.68)
@vrel
and
@
= 01⇥3 (C.69)
@rC
@
= 01⇥3 (C.70)
@vC
@ uT
= TSB (qSB )TBI (qBI ) (C.71)
@rrel r
@
= 01⇥3 (C.72)
@vrel
where
2 3
sin ↵
1 4
u↵ = cos ↵ 5 (C.73)
cos 0
2 3
cos ↵ sin
u = 4 sin ↵ sin 5 (C.74)
cos
C.4. Analysis of the Merits of the Inertial-Absolute and Inertial-Relative
Filters
The Flight Day 1 rendezvous trajectory and models as described in [2] were used to
analyze the two filter formulations. Both formulations had the same driving dynamics
and measurement models – only the covariance propagation and covariance updates were
di↵erent. With this in mind, the comparison of the two filter formulation as it related to
covariance operations were analyzed.
C.4.1. Covariance Propagation First it must be pointed out that the propagation
of the filter dynamics are identical between both filter parameterizations. That is to say,
in each filter, the inertial absolute states of both the chaser vehicle and the target vehicle
will be propagated. The di↵erence arises in the propagation of the covariance matrices
124 C. DUAL INERTIAL VS. INERTIAL/RELATIVE TRADE
associated with each of the filter paramerizations. In the IA filter, the covariances (and
cross-covarinces) of the chaser inertial states and the target inertial states are computed. It
should be noted that the dynamics of the two vehicles’ states are inherently un-correlated
(see AIA in Eq.(18)). In contrast, for the IR filter the covariances (and cross-covariances)
the chaser inertial states and the inertial relative states (of the target with respect to
the chaser) are computed. It should be noted that the dynamics in this filter’s states
are inherently correlated (see AIR in Eq.(31)). Hence, there are inherently more non-zero
computations (both additions and multiplications) involved6. Hence, there is more room for
round-o↵ errors in the covariance propagation in the IR filter. In order to see this, Table 1
contains an analysis of the propagation error as a function of the propagation interval. This
propagation is carried out without process noise on either the chaser or target states. In
addition, the chaser and the target states are uncorrelated at the initial time. Hence, since
there are no measurements which could correlate the two vehicles’ states, the correlation
coefficients should remain zero (i.e. PC,T = 06⇥6 ) throughout the interval. This was verified
to be the case. Because of the reduced number of computations inherent in the IA filter
formulation, it was assumed that the IA filter propagation was the ‘truth’ and the IR filter
was compared to it.
It is clear that the additional non-zero multiplications and additions for the IR filter
formulation compared to the IA formulation result in a build-up of round-o↵ error. This has
6First, notice that in the IR filter, the term A AC will inherently cause a loss of precision. Second,
T
assuming only the gravity gradient term in A, which is symmetric, AT AC involves 6 additions/subtractions.
The term (AT AC )PCC in Eqs.(30) and (31) involve 18 multiplications and 12 additions/subtractions. The
term (AT AC )PC,rel (or Prel,C (AT AC )T ) involve 27 multiplications and 18 additions/subtractions. The
terms (AT AC )PCC + AT Prel,C and (AT AC )PC,rel + AT Prel,rel each involve 9 additions/subtractions.
These total 45 multiplications and 54 additions/subtractions. This is doubled because of the symmetric
nature of the covariance matrix. Therefore, there are 90 additional multiplications and 108 additional
additions/subtractions per function evaluation in the IR filter formulation over the IA filter formulation.
For a fourth-order Runge-Kutta integration method, the IR filter formulation results in an additional 360
multiplications and 432 additions/subtractions per integration step. Each of these operations results in a
numerical loss of precision in finite-state machines.
C.5. CONCLUSIONS 125
the e↵ect of reducing the propagation accuracy of the IR filter vis-à-vis the IA filter. The
additional operations, in concert with the accompanying loss of precision, make a strong
case for the use of the IA filter formulation.
C.4.2. Measurement Update It should be apparent from comparing Eqs. (35) and
(56) that for the case of the target ground-update, there are 18 more multiplications (because
of the identity matrix) for the IR formulation than the IA formulation for each target
ground-update.
For all other relative measurements, there are more operations for the IA filter formu-
lation than for the IR formulation. In fact, for the range measurements, there are more 54
multiplications and 54 more additions for the IA formulation than the IR formulation for
each range measurement update.
For bearing measurements, there are more 108 multiplications and 108 more additions
for the IA formulation than the IR formulation for each bearing measurement update.
So, for relative sensor measurements, clearly there are more multiplications and more
additions for the IA filter formulation than for the IR formulation.
C.5. Conclusions
While the inertial-absolute and inertial-relative filter formulations are mathematically
equivalent, the implementation on finite-state machines influences the choice.
With regard to propagation of the covariance matrices, there are more computations for
the IR filter than the (mathematically equivalent) IA filter. These additional computations,
in concert with the types of operations, result in a (significant) loss of precision with regard
to the propagation of the covariance matrices.
With regard to the measurement updates to the covariance matrices, for the case of
relative navigation measurements, there are fewer non-zero operations for the IR filter than
for the IA filter. This is one of the strengths of this filter (IR)formulation, and if the
computation and precision with regard to the propagation of the covariance matrices were
the same, the IR filter would be advantageous in terms of computations and precision.
After considering all these factors, the Inertial-Absolute filter is recommended for use
on Orion.
Bibliography
[1] W.S. Agee and R.H. Turner. Triangular decomposition of a positive definite matrix plus a symmetric
dyad with application to kalman filtering. White Sands Missile Range Tech. Rep. No. 38, 1972.
[2] G. J. Bierman. Factorization Methods for Discrete Sequential Estimation. Academic Press, Dover Pub-
lications, New York, 1977, 2006.
[3] Robert G. Brown and Patrick Y.C. Hwang. Introduction to Random Signals and Applied Kalman Fil-
tering. John Wiley and Sons, Inc., New York, NY, 3rd edition, 1997.
[4] A. E. Bryson and M. Frazier. Smoothing for linear and nonlinear dynamic systems. Technical Report
TDR-63-119, Aeronautical Systems Division, Wright-Patterson Air Force Base, Ohio, Sept. 1962.
[5] N.A. Carlson. Fast triangular factorization of the square root filter. AIAA Journal, 11(9):1259–1265,
September 1973.
[6] J. R. Carpenter and K. T. Alfriend. Navigation accuracy guidelines for orbital formation flying. Journal
of the Astronautical Sciences, 53(2):207–219, 2006. Also appears as AIAA Paper 2003–5443.
[7] J. Russell Carpenter. Navigation filter best practices. https://mediaex-
server.larc.nasa.gov/Academy/Catalog/Full/f1d0abb028d3491f8701da3fc64bcb2021, January 2015.
Accessed: January 4, 2018.
[8] J. Russell Carpenter and Kevin Berry. Artificial damping for stable long-term orbital covariance propa-
gation. In Astrodynamics 2007, volume 129 of Advances in the Astronautical Sciences, pages 1697–1707.
Univelt, 2008.
[9] J. Russell Carpenter and Emil R. Schiesser. Semimajor axis knowledge and gps orbit determination.
NAVIGATION: Journal of The Institute of Navigation, 48(1):57–68, Spring 2001. Also appears as AAS
Paper 99–190.
[10] Russell Carpenter and Taesul Lee. A stable clock error model using coupled first- and second-order
gauss-markov processes. In AAS/AIAA 18th Spaceflight Mechanics Meeting, volume 130, pages 151–
162. Univelt, 2008.
[11] W. H. Clohessy and R. S. Wiltshire. Terminal guidance for satellite rendezvous. Journal of the Aerospace
Sciences, 27(5):653–658, 674, 1960.
[12] W. F. Denham and S. Pines. Sequential estimation when measurement function nonlinearity is compa-
rable to measurement error. AIAA Journal, 4(6):1071–1076, June 1966.
[13] Cornelius J. Dennehy and J. Russell Carpenter. A summary of the rendezvous, proximity operations,
docking, and undocking (rpodu) lessons learned from the defense advanced research project agency
(darpa) orbital express (oe) demonstration system mission. NASA Technical Memorandum 2011-217088,
NASA Engineering and Safety Center, 2011.
[14] J. L. Farrell. Attitude determination by Kalman filtering. Automatica, 6:419–430, 1970.
[15] James L. Farrell. Attitude determination by Kalman filtering. Contractor Report NASA-CR-598, NASA
Goddard Space Flight Center, Washington, DC, Sept. 1964.
[16] P. Ferguson and J. How. Decentralized estimation algorithms for formation flying spacecraft. In AIAA
Guidance, Navigation and Control Conference, 2003.
[17] Donald C. Fraser. A New Technique for the Optimal Smoothing of Data. Sc.D. thesis, Massachusetts
Institute of Technology, Cambridge, Massachusetts, 1967.
[18] Donald C. Fraser and James E. Potter. The optimum smoother as a combination of two opimum linear
filters. IEEE Transactions on Automatic Control, AC-14(4):387–390, Aug. 1969.
[19] Eliezer Gai, Kevin Daly, James Harrison, and Linda Lemos. Star-sensor-based attiude/attitude rate
estimator. Journal of Guidance, Control, and Dynamics, 8(5):560–565, Sept.-Oct. 1985.
[20] Arthur Gelb, editor. Applied Optimal Estimation. The MIT Press, Cambridge, MA, 1974.
127
128 BIBLIOGRAPHY
[21] David K. Geller. Orbital rendezvous: When is autonomy required? Journal of Guidance, Control and
Dynamics, 30(4):974–981, July–August 2007.
[22] Herbert Goldstein. Classical Mechanics. Addison-Wesley Publishing Company, Reading, MA, 2nd edi-
tion, 1980.
[23] Gene Howard Golub and Charles F. Van Loan. Matrix Computations. JHU Press, 3rd edition, 1996.
[24] M. Grigoriu. Response of dynamic systems to poisson white noise. Journal of Sound and Vibration,
195(3):375–389, 1996.
[25] C. F. Hanak. Reducing the e↵ects of measurement ordering on the gkf algorithm via a hybrid lin-
ear/extended kalman filter. Technical Report FltDyn-CEV-06-0107, NASA Johnson Space Center, Au-
gust 2006.
[26] G. W. Hill. Researches in the lunar theory. American Journal of Mathematics, 1(1):5–26, 129–147,
245–260, 1878.
[27] Jonathan P. How, Louis S. Breger, Megan Mitchell, Kyle T. Alfriend, and Russell Carpenter. Di↵er-
ential semimajor axis estimation performance using carrier-phase di↵erential global positioning system
measurements. Journal of Guidance, Control, and Dynamics, 30(2):301–313, Mar-Apr 2007.
[28] Peter C. Hughes. Spacecraft Attitude Dynamics. Wiley, New York, NY, 1986.
[29] M. Idan. Estimation of Rodrigues parameters from vector observations. IEEE Transactions on Aerospace
and Electronic Systems, 32(2):578–586, April 1996.
[30] Andrew H. Jazwinski. Stochastic Processes and Filtering Theory. Academic Press, Inc., and Dover
Publications, Inc., New York, NY, and Mineola, NY, 1970 and 2007.
[31] S. C. Jenkins and D. K. Geller. State estimation and targeting for autonomous rendezvous and proximity
operations. In Proceedings of the AAS/AIAA Astrodynamics Specialists Conference. Mackinac Island,
MI, August 19–23 2007.
[32] S. J. Julier and J. K. Uhlmann. A new extension of the kalman filter to nonlinear systems. In Int. Symp.
Aerospace/Defense Sensing, Simul. and Controls, Orlando, FL, 1997.
[33] Simon Julier and Je↵rey Uhlmann. Authors’ reply. IEEE Transactions on Automatic Control,
47(8):1408–1409, August 2002.
[34] Simon Julier, Je↵rey Uhlmann, and Hugh F. Durrant-Whyte. A new method for the nonlinear transfor-
mation of means and covariances in filters and estimators. IEEE Transactions on Automatic Control,
45(3):477–482, March 2000.
[35] John L. Junkins and J. D. Turner. Optimal Spacecraft Rotational Maneuvers. Elsevier, New York, NY,
1986.
[36] R. E. Kalman. A new approach to linear filtering and prediction problems. Transactions of the ASME
– Journal of Basic Engineering, 82:35–45, 1960.
[37] Christopher D. Karlgaard and Hanspeter Schaub. Nonsingular attitude filtering using modified Ro-
drigues parameters. The Journal of the Astronautical Sciences, 57(4):777–791, Oct.-Dec. 2010.
[38] B. A. Kriegsman and Y. C. Tau. Shuttle navigation system for entry and landing mission phases. Journal
of Spacecraft, 12(4):213–219, April 1975.
[39] W. M. Lear. Multi-phase navigation program for the space shuttle orbiter. Internal Note No. 73-FM-132,
NASA Johnson Space Center, Houston, TX, 1973.
[40] William M. Lear. Kalman Filtering Techniques. Technical Report JSC-20688, NASA Johnson Space
Center, Houston, TX, 1985.
[41] Deok-Jin Lee and Kyle T. Alfriend. Additive divided di↵erence filtering for attitude estimation us-
ing modified rodrigues parameters. In John L. Crassidis et al., editors, Proceedings of the F. Landis
Markley Astronautics Symposium, volume 132 (CD–ROM Supplement) of Advances in the Astronauti-
cal Sciences. American Astronautical Society, Univelt, 2008.
[42] Tine Lefebvre, Herman Bruyninckx, and Joris De Schutter. Comment on ‘a new method for the nonlinear
transformation of means and covariances in filters and estimators’. IEEE Transactions on Automatic
Control, 47(8):1406–1408, August 2002.
[43] Eugene J. Le↵erts, F. Landis Markley, and Malcolm D. Shuster. Kalman filtering for spacecraft
attitude estimation. Journal of Guidance, Control, and Dynamics, 5(5):417–429, Sept.-Oct. 1982.
doi:10.2514/3.56190.
[44] M. Mandic. Distributed estimation architectures and algorithms for formation flying spacecraft. Master’s
thesis, Massachusetts Institute of Technology, 2006.
BIBLIOGRAPHY 129
[45] S. R. Marandi and V. J. Modi. A preferred coordinate system and the associated orientation represen-
tation in attitude dynamics. Acta Astronautica, 15(11):833–843, Nov. 1987.
[46] F. Landis Markley. Unit quaternion from rotation matrix. Journal of Guidance, Control, and Dynamics,
31(2):440–442, March-April 2008.
[47] F. Landis Markley. Lessons learned. The Journal of the Astronautical Sciences, 57(1 & 2):3–29, Jan.-
June 2009.
[48] F. Landis Markley and J. Rusell Carpenter. Generalized linear covariance analysis. The Journal of the
Astronautical Sciences, 57(1 & 2):233–260, Jan.-June 2009.
[49] F. Landis Markley and John L. Crassidis. Fundamentals of Spacecraft Attitude Determination and
Control, pages 46, 257–260, 263–269. Springer, New York, NY, 2014. doi:10.1007/978-1-4939-0802-8.
[50] Peter S. Maybeck. Stochastic Models, Estimation and Control, Vol. 1. Academic Press, New York, NY,
1979.
[51] Peter S. Maybeck. Stochastic Models, Estimation and Control, Vol. 2. Academic Press, New York, NY,
1979.
[52] Eugene S. Muller and Peter M. Kachmar. A new approach to on-board orbit navigation. Navigation:
Journal of the Institute of Navigation, 18(4):369–385, Winter 1971–72.
[53] Magnus Nørgaard, Niels K. Poulsen, and Ole Ravn. Advances in derivative-free state estimation for non-
linear systems. Technical Report IMM–REP–1998–15, Technical University of Denmark, 2800 Lyngby,
Denmark, April 7, 2000.
[54] Magnus Nørgaard, Niels K. Poulsen, and Ole Ravn. New developments in state estimation for nonlinear
estimation problems. Automatica, 36(11):1627–1638, November 2000.
[55] Young W. Park, Jack P. Brazzel, J. Russell Carpenter, Heather D. Hinkel, and James H. Newman.
Flight test results from real-time relative gps experiment on sts-69. In SPACEFLIGHT MECHANICS
1996, volume 93 of Advances in the Astronautical Sciences, pages 1277–1296, San Diego, CA, 1996.
Univelt.
[56] L. Perea, J. How, L. Breger, and P. Elosegui. Nonlinearity in sensor fusion: Divergence issues in ekf,
modified truncated sof, and ukf. In AIAA Guidance, Navigation and Control Conference and Exhibit,
Hilton Head, SC, August 20–23, 2007.
[57] Mark E. Pittelkau. An analysis of the quaternion attitude determination filter. The Journal of the
Astronautical Sciences, 51(1), Jan.-March 2003.
[58] H. Plinval. Analysis of relative navigation architectures for formation flying spacecraft. Master’s thesis,
Massachusetts Institute of Technology, 2006.
[59] H. E. Rauch, F. Tung, and C. T. Striebel. Maximum likelihood estimates of linear dynamic systems.
AIAA Journal, 3(8):1445–1450, Aug. 1965.
[60] Reid G. Reynolds. Asymptotically optimal attitude filtering with guaranteed covergence. Journal of
Guidance, Control, and Dynamics, 31(1):114–122, Jan.-Feb. 2008.
[61] Olinde Rodrigues. Des lois géométriques qui régissent les déplacements d’un système solide dans l’espace,
et de la variation des coordonnées provenant de ces déplacements considérés indépendamment des causes
qui peuvent les produire. Journal de Mathématiques Pures et Appliquées, 5:380–440, 1840.
[62] Hanspeter Schaub and John L. Junkins. Analytical Mechanics of Aerospace Systems. American Institute
of Aeronautics and Astronautics, Inc., New York, NY, 2nd edition, 2009.
[63] Emil Schiesser, Jack P. Brazzel, J. Russell Carpenter, and Heather D. Hinkel. Results of sts-80 relative
gps navigation flight experiment. In SPACEFLIGHT MECHANICS 1998, volume 99 of Advances in the
Astronautical Sciences, pages 1317–1334, San Diego, CA, 1998. Univelt.
[64] S. F. Schmidt. Application of state-space methods to navigation problems. In C. T. Leondes, editor,
Advances in Control Systems: Theory and Applications, volume 3, pages 293–340. Academic Press, New
York, 1966.
[65] Stanley. F. Schmidt. The kalman filter - its recognition and development for aerospace applications.
Journal of Guidance, Control, and Dynamics, 4(1):4–7, 2016/01/09 1981.
[66] John H. Seago, Jacob Griesback, James W. Woodburn, and David A. Vallado. Sequential orbit-
estimation with sparse tracking. In Space Flight Mechanics 2011, volume 140 of Advances in the Astro-
nautical Sciences, pages 281–299. Univelt, 2011.
[67] Malcolm D. Shuster. A survey of attitude representations. The Journal of the Astronautical Sciences,
41(4):439–517, Oct.-Dec. 1993.
130 BIBLIOGRAPHY
[68] John Stuelpnagel. On the parametrization of the three-dimensional rotation group. SIAM Review,
6(4):422–430, Oct. 1964.
[69] Byron D. Tapley, Bob E. Schutz, and George R. Born. Statistical Orbit Determination. Academic Press,
2004.
[70] C.L Thornton and G.J. Bierman. Gram-schmidt algorithms for covariance propagation. IEEE Confer-
ence on Decision and Control, pages 489–498, 1975.
[71] C.L Thornton and G.J. Bierman. Numerical comparison of discrete kalman filtering algorithms: An
orbit determination case study. JPL Technical Memorandum, 33-771, June 1976.
[72] Oldrich Vasicek. An equilibrium characterization of the term structure. J. Financial Economics,
5(2):177–188, November 1977.
[73] M. C. Wang and G. E. Uhlenbeck. On the theory of brownian motion ii. In N. Wax, editor, Selected
Papers on Noise and Stochastic Processes, pages 113–132. Dover, 1954.
[74] James R. Wertz, editor. Spacecraft Attitude Determination and Control. Kluwer Academic Publishers,
The Netherlands, 1978.
[75] Thomas F. Wiener. Theoretical Analysis of Gimballess Inertial Reference Equipment Using Delta-
Modulated Instruments. Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, MA,
1962.
[76] J. R. Wright. Sequential orbit determination with auto-correlated gravity modeling errors. Journal of
Guidance and Control, 4(3):304–309, May–June 1980.
[77] James R. Wright. Optimal orbit determination. In Kyle T. Alfriend et al., editor, Space Flight Mechanics
2002, volume 112 of Advances in the Astronautical Sciences, pages 1123–134. Univelt, 2002.
[78] Renato Zanetti, Kyle J. DeMars, and Robert H. Bishop. Underweighting nonlinear measurements.
Journal of Guidance, Control and Dynamics, 33(5):1670–1675, September–October 2010.
[79] Renato Zanetti and Christopher D’Souza. Recursive implementations of the consider filter. Proceedings
of the 2012 AAS Jer-Nan Juang Symposium, 2012.
Form Approved
REPORT DOCUMENTATION PAGE OMB No. 0704–0188
The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources,
gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection
of information, including suggestions for reducing this burden, to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports
(0704-0188), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be
subject to any penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.
PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS.
1. REPORT DATE (DD-MM-YYYY) 2. REPORT TYPE 3. DATES COVERED (From - To)
18-04-2018 Technical Publication
4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER
Navigation Filter Best Practices
5b. GRANT NUMBER
14. ABSTRACT
This work identifies best practices for onboard navigation filtering algorithms. These best practices have been collected by
NASA practitioners of the art and science of navigation over the first 50 years of the Space Age. While this is a NASA
document concerned with space navigation, it is likely that many of the principles would apply equally to the wider
navigation community.
16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF RESPONSIBLE PERSON
a. REPORT b. ABSTRACT c. THIS PAGE ABSTRACT OF STI Information Desk ([email protected])
PAGES
19b. TELEPHONE NUMBER (Include area code)
U U U UU 149 (757) 864-9658
Standard Form 298 (Rev. 8/98)
Prescribed by ANSI Std. Z39.18