Vision-Based Navigation For The NASA Mars Helicopter
Vision-Based Navigation For The NASA Mars Helicopter
2019-1411
7-11 January 2019, San Diego, California
AIAA Scitech 2019 Forum
David S. Bayard∗ Dylan T. Conway† Roland Brockers‡ Jeff Delaune§ Larry Matthies¶
A small helicopter has recently been approved by NASA as an addition to the Mars 2020 rover mis-
sion. The helicopter will be deployed by the rover after landing on Mars, and operate independently
thereafter. The main goal is to verify the feasibility of using helicopters for future Mars exploration
missions through a series of fully autonomous flight demonstrations. In addition to the sophisticated
dynamics and control functions needed to fly the helicopter in a thin Mars atmosphere, a key sup-
porting function is the capability for autonomous navigation. Specifically, the navigation system must
Downloaded by Leith Bade on April 27, 2021 | http://arc.aiaa.org | DOI: 10.2514/6.2019-1411
be reliable, fully self-contained, and operate without human intervention. This paper provides an
overview of the Mars Helicopter navigation system, architecture, sensors, vision processing and state
estimation algorithms. Special attention is given to the design choices to address unique constraints
arising when flying autonomously on Mars. Flight test results indicate navigation performance is
sufficient to support Mars flight operations.
I. Introduction
The use of helicopters promises to bridge a gap in current Mars exploration capabilities. Orbiters have provided
high-altitude aerial imagery of Mars, but with limited resolution. Rovers provide rich and detailed imagery of the
Martian surface, but move at a slow pace and are limited by traversability of the terrain and line-of-sight. Helicopters
can quickly traverse large distances without being hindered by terrain, while providing detailed imagery of the surface
from heights of a few meters to tens of meters above the surface. Paired with a rover, a helicopter can act as a scouting
platform, helping to identify promising science targets or mapping the terrain ahead of the rover. Looking further
ahead, helicopters may one day carry their own science payloads to areas that are inaccessible to rovers. An overview
of the Mars Helicopter Technology Demonstrator is given in [2], and its guidance and control functions are discussed
in [3][4]. The current paper will discuss the Mars Helicopter navigation system.
A CAD drawing of the Mars Helicopter is shown in Figure 1. The Mars Helicopter has two counter-rotating rotors
that are 1.21 m in diameter. The vehicle stands approximately 80 cm in height, and weighs 1.8 kg. Compared to an
Earth helicopter, the rotors are significantly oversized for its weight. This allows it to fly in the Martian atmosphere
which has only 1-2% the density for that of Earth. The helicopter carries a payload as part of its fuselage, which is a
cube-like structure containing the flight avionics, batteries, and sensors, all contained within a warm electronics box
that is insulated and heated to protect against low night-time temperatures. The batteries are sized to provide energy
for flights lasting over 90 s, while also supporting non-flight operations and night-time survival heating. A solar panel
at the very top of the vehicle is used to charge batteries between flights.
∗ Mars Helicopter Navigation Lead, Guidance & Control Section, JPL, AIAA Associate Fellow, [email protected]
† Guidance and Control Engineer, Guidance & Control Section, JPL, [email protected]
‡ Robotics Technologist, Robotics Section, JPL, [email protected]
§ Robotics Technologist, Robotics Section, JPL, [email protected]
¶ Senior Research Scientist, Robotics Section, JPL, [email protected]
k Mars Helicopter GNC & Aerodynamics Lead/Robotics Technologist, Guidance & Control Section, JPL, [email protected]
∗∗ Robotics Technologist, Robotics Section, JPL, [email protected]
†† Robotic Systems Engineer, Robotics Section, JPL, [email protected]
‡‡ Chief Engineer of Guidance & Control Section, JPL, [email protected]
1 of 22
Copyright © 2019 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved.
Downloaded by Leith Bade on April 27, 2021 | http://arc.aiaa.org | DOI: 10.2514/6.2019-1411
2 of 22
High-rate data (IMU, altimeter, inclinometer) is read into the FPGA which communicates synchronously with the FC
and the sensor devices, and asynchronously with the NP. Camera images are read directly into the Nav Processor which
uses cell-phone technology and can directly ingest video image sequences. The dedicated Nav Processor allows the
CPU-intensive Vision Processing and State Estimation functions to be handled very efficiently using COTS cell phone
technology. The Nav Processor serves to offload the FC and FPGA allowing them to more reliably handle important
helicopter guidance and control functions. As shown in Figure 3, the Vision Processing and State Estimator functions
are each assigned a full core of the quad-core Nav Processor.
The FPGA samples the IMU, the altimeter and inclinometer. The IMU is comprised of a gyro and accelerometer
which are sampled at 3200 Hz and 1600 Hz, respectively. The IMU outputs in the body frame are first notch filtered
at the first six rotor harmonic frequencies and then smoothed with a 90 Hz anti-aliasing filter. Data is then sent to
the FC where coning and sculling corrections are applied at the high sampling rates, and then bias corrected (with a
fixed initial gyro and accelerometer bias estimate only), and mapped to an inertial frame where it is integrated down
to 500 Hz. During integration, the IMU data is time-aligned to a perfect 500 Hz grid which is needed because 500
Hz is not rationally commensurate with the original 1600 or 3200 Hz IMU data. The resulting “cleaned-up” 500 Hz
delta-theta and delta-v IMU data and 50 Hz altimeter measurements are passed to the NP (via the FPGA) where they
are ingested by the State Estimator function. IMU propagation is intentionally made redundant between the Flight
Computer and Nav Processor in order to accommodate latencies and non-uniform packet arrival times associated with
using asynchronous communications between the FPGA and the NP. Furthermore, the FC is architected to be a free-
running integration of the IMU with updates that incorporate the latest bias and state information from the last good
NP packet. With this approach, the system is robust to NP dropout and/or failure because the FC always continues to
propagate the state from the last good navigation filter update.
It is worth noting that the helicopter avionics are largely comprised of commercial of-the-shelf (COTS) components.
This is very different than most spacecraft payloads which require their avionics to be hardened in face of the harsh
space environment and high radiation levels. While posing additional risk, this choice was essential for meeting tight
mission cost, mass, and power constraints. Moreover, all components undergo vibration, thermal, and radiation tests to
make sure that risks are consistent with a NASA Class D technology demonstration.
3 of 22
An alternative approach is to generate features on the fly and in real time as the vehicle maneuvers above the surface.
Most generally, this problem can be addressed using SLAM (Simultaneous Localization and Mapping) algorithms from
the literature [5][6]. While there has been some success applying SLAM to Earth-bound assets, full SLAM solutions
are challenging for real-time space applications due to their over sized filter state dimensions. For example, SLAM
augments the Kalman Filter with 3 states for each of the N features observed, so that retaining a memory of, say
N=100 features, would require a filter state whose dimension exceeds 300. Such high-order filters are only marginally
numerically stable and demand large amounts of on-board computation. While methods are being developed to perform
management and real-time pruning of the number of features, such methods are relatively new and raise separate
questions about reliability and adding complexity to the implementation.
The unavailability of mapped landmarks coupled with the challenges involved in flying a real-time high-order
SLAM implementation, has motivated an alternative approach to Mars Helicopter navigation based on velocimetry.
Here, the vision system is used to characterize relative motions of the vehicle from one image to the next, rather than
to determine its absolute position. Providing good velocity information is also one of the main goals of the navigation
system since it is essential for best supporting real-time control of a vehicle with complex dynamics. Of course a
disadvantage of using velocimetry is that the navigation solution will drift with time since one is effectively integrating
a ”noisy” velocity measurement. However this is offset by the short (approximately 90 second), flights for the Mars
Helicopter, and the preference to implement simpler and lower-order navigation filters.
The velocimetry-based algorithm chosen to use for the Mars Helicopter application is MAVeN (minimal augmented
4 of 22
B. Algorithm Description
Downloaded by Leith Bade on April 27, 2021 | http://arc.aiaa.org | DOI: 10.2514/6.2019-1411
Here, pS , vS , qS are the position, velocity and attitude quaternion states which comprise the Search state; ba , bg are
bias terms for the accelerometer and gyro; and pB , qB are the position and attitude quaternion which comprise the Base
state. Attitudes are represented using quaternions denoted using q which has 4 elements, of which only 3 unconstrained
degrees-of-freedom are counted in the filter state size. Attitude is also represented equivalently using a DCOS matrix
A, in which case the associated quaternion will be notated as q(A). The Base state is a cloned version of the position
and attitude part of the Search state, and is used for updating the EKF with information derived from simultaneously
processing two images taken at different times. Cloned states are often added to vision-based methods help process
relative measurement information [10][11][12]. As explained in [8] for the MAVeN algorithm, Base states are copied
from Search states at the time instants tB when Base images are taken (i.e., pB (tB ) = pS (tB ), and qB (tB ) = qS (tB )),
and propagate with constant dynamics between Base images (i.e., ṗB = 0, q̇B = 0).
MAVeN’s unique properties follow from the novel approach of projecting image features onto a shape model of the
ground surface to use as pseudo-landmarks for the next image. This process is briefly sketched here with the use of
Figure 4.
[4] Match Search image features to the pseudo-landmarks mapped from most recent Base image. Assume that there
are m matches.
5 of 22
[5] Combine the m pseudo-landmark matches with current geometry to form a measurement that is a function of both
the current Base and Search states,
REMARK 1 In a motionless hover condition, MAVeN is capable of sitting on the same Base image indefinitely which
leads to very stable behavior. This property is generally not possible with non-SLAM approaches to velocimetry (cf.,
[14][15]).
REMARK 2 Updating the MAVeN EKF with consecutive Search images and altimeter updates automatically relocates
the feature projections (e.g., f1 , f2 , f3 in Figure 4), on the planar surface model to be consistent with the updated state
estimate pB , qB , while holding bearing directions constant as observed in the previous Base image.
6 of 22
Using this notation, a noiseless camera measurement z = [u, v] of line-of-sight vector r can be written as,
u rx /rz
z= = π(r) = (4)
v ry /rz
The corresponding unit direction vector d associated with the decomposition r = dkrk can then be reconstructed from
its noiseless measurement z as
Π(z)
d= (5)
kΠ(z)k
where,
rx /rz
z π(r)
Π[z] = = = ry /rz (6)
1 1
1
Based on this discussion, a noiseless camera measurement z of a line-of-sight vector r can be taken equivalently as its
unit direction vector d in the decomposition r = dkrk. It will be convenient to apply this decomposition to the camera
line-of-sight vectors rBi and rSi which will be split into their magnitude and direction parts as rBi = dBi krBi k and
rSi = dSi krSi k, respectively.
The various vectors shown in Figure 5 are assumed to be resolved as follows:
pB , pS , fi , N ∈ FG
rBi , dBi ∈ FB
rSi , dSi ∈ FS
Forming a triangle at Base time tB in Figure 5 and resolving all vectors in FG gives
fi = pB + ATB rB (7)
7 of 22
Since N is normal to the ground plane in which fi lies, it follows that N T fi = 0, and the above equation can be
rearranged to solve for krB i k as
−N T pB
krB i k = T T (10)
N AB dB i
Substituting (10) into (8) gives
ATB dB i N T ATB dB i N T
fi = pB − pB = I− pB (11)
N T ATB dB i N T ATB dB i
This is a useful formula showing that the feature vector fi is a function of the cloned part of the state pB , AB , and the
noiseless camera measurement dBi of the i’th feature taken at Base time tB .
Consider the camera measurement update at Search time tS . Forming a triangle at Search time tS in Figure 5 and
resolving all vectors in FG gives
rS i = AS fi − AS pS (12)
Unfortunately, the vector fi in this expression corresponds to an unmapped landmark and its coordinates are unknown.
As such, expression (12) cannot be used directly to generate a standard mapped landmark type measurement update.
Of course, one approach is to add fi to the filter state which is an approach similar to SLAM. However, in the interest
of keeping the filter dimension low, the main insight of MAVeN is to instead use the expression (11) for fi derived from
the earlier Base image. Specifically, substituting fi from (11) into (12) gives,
ATB dB i N T
rS i = AS I − T T pB − AS pS (13)
N AB d B i
It is worth noting that the formulation can progress from this point without any need to augment the state with fi . Here,
the formula (11) for fi can be interpreted as defining pseudo-landmarks that can be used as if they were real landmarks
8 of 22
the form
yi = zi + vi (16)
where yi is a noisy measurement, and vi is the measurement noise. Fortunately, the additive noise form (16)) remains
consistent with standard nonlinear filtering formulations.
At time tB the bearing measurement dBi will also be noisy. Since camera noise in dBi enters at Base time tB , it
becomes correlated with all subsequent Search frame measurement updates of the form (16). For simplicity, MAVeN
assumes that the camera noise at each Base Frame is sufficiently small so that this correlation can be neglected. Rig-
orously dealing with the issue of correlated measurement noise remains as an area for future investigation. This small
camera noise assumption at Base frame times, together with the planar ground assumption represent the only two
approximations required in formulating MAVeN as a nonlinear filtering problem.
E. Lessons Learned
1. Vertical Channel Tuning
In the vertical channel, there is a difference between inertial altitude (as measured by inertial sensors such as an IMU
or GPS), and altitude above ground level (AGL, as measured by the LRF and camera). These two quantities will be
the same only when traversing perfectly flat and level ground. When traversing non-flat or irregular terrain, these two
definitions of altitude conflict and can differ significantly. Without using GPS, and without using mapped landmarks
or a digital elevation map (DEM), MAVeN will be unable to completely separate these two altitude definitions.
Nevertheless, the navigation filter is intentionally detuned to try to produce an AGL estimate of vertical position,
while producing an inertial estimate of vertical velocity. This design is necessary because AGL height information is
critical for avoiding collisions with the ground, while inertial velocity information is critical for properly controlling
helicopter dynamics. Roughly speaking, filter tuning is performed by increasing LRF and camera weighting in the
vertical position, and IMU weighting in the vertical velocity. Since detuning represents a compromise, the filter still
generates noticeable systematic errors in vertical velocity estimates when flying over rough terrain. Fortunately, ex-
pected terrain for the Mars Helicopter mission is expected to be fairly flat. However for stress testing purposes, such
systematic velocity errors are intentionally demonstrated in Section VI.
9 of 22
Visual feature tracking inevitably produces occasional outlier measurements (i.e. those with very large centroiding
error). Outlier measurements can corrupt the filter state estimate if they are not down-weighted or discarded. One
challenge in designing an algorithm for feature suppression is the possibility of lockout: the current state estimate
is in error which causes good measurements to be incorrectly suppressed and therefore prevents a state update. The
navigation system has a two-tiered approach to deal with this. The first tier is a base-to-search homography RANSAC
algorithm to identify and discard outliers. This algorithm finds the largest set of base and search features that are
consistent with a homography that maps base features to search features. By making the first tier independent of the
navigation state, the lockout problem is avoided.
The first tier can fail if there are many outliers that are mutually consistent. This can happen when many features are
detected on the shadow of the helicopter. To provide robustness to this failure mode, a second tier of outlier suppression
is implemented inside of the MAVeN feature update. The magnitude of the innovation is used to assign a measurement
weight on a per-feature basis. The weighting function is the Huber loss function [17] which assigns a weight between 0
and 1. A weight of 1 is assigned if the residual is less than a threshold. Beyond this threshold, the weight monotonically
decreases. Down-weighting, as compared to a hard accept-reject rule, reduces the risk of lockout because a persistent
update signal will still pass through to the estimated state.
10 of 22
B. Feature Detection
To provide the template-based feature tracker with a set of distinct pixel positions (visual features) that maximize the
potential of being easily trackable in subsequent frames [18], we deploy a feature detection step at each Base frame to
identify pixels with significant brightness changes in their local vicinity. To optimize performance and run-time of the
feature detector, we use a modified FAST corner detector [19][20] to detect candidate feature points.
FAST explores the differences in brightness between an evaluated center pixel and neighboring pixels that are
located on a circle around the center pixel. If neighboring pixels on a continuous arc around the center pixel (FAST
Downloaded by Leith Bade on April 27, 2021 | http://arc.aiaa.org | DOI: 10.2514/6.2019-1411
arc) have consistently higher (or lower) brightness values than the center pixel, the center pixel is classified as a FAST
feature. The detection result can be influenced by choosing a brightness threshold that defines the minimum brightness
deviation between neighboring pixels and the center pixel, the radius of the circle in which neighboring pixels are
evaluated, and the minimum required length of the FAST arc. Common values for these parameters are e.g. a radius
of 5 pixels which yields a FAST window of 11x11 pixels and defines a circle around the center pixel containing 16
neighbors, and a minimum arc length of 9 pixels. Common FAST brightness thresholds - referred to as FAST threshold
in the rest of the paper - are usually within the 5% range of the maximum brightness interval of the image.
FAST achieves a significant speed-up by initially evaluating only a few selected pixel positions on the neighbor
arc to discard candidate features early on if the continuous arc condition is violated - eliminating the need to examine
all neighbors for each pixel. While this accounts for significant speed-up in the majority of images, it introduces a
run-time dependence on the texture content of the image. To limit the execution time of feature detection, we deploy
a scanning scheme that uses a row stride length to go through the image during feature evaluation. Feature detection
stops either when the whole image is scanned, or a maximum number of candidate features are found. This maximizes
the distribution of feature candidates spread over the image, while bounding the maximum execution time.
To reduce the number of features around strong image brightness changes, we perform a non-maximum suppression
step on all candidate features. Features are assigned a FAST score based on the brightness differences of the center
pixel and the pixels on the FAST arc, and then tested for maximum FAST score in a 3x3 local neighborhood. If another
feature with a higher score is located next to the tested feature, the tested feature is eliminated.
All surviving features pass through a final sorting step to only select the strongest features for ingestion into the
navigation filter. To maximize the distribution of features over the image, the image is divided into a 3x3 grid of tiles,
and features in each tile are sorted based on their FAST score. Finally, the n strongest features in each tile are accepted
as Base frame features.
C. Feature Tracking
For each new image, we deploy a feature tracking step that matches feature positions in the previous frame to the new
frame. This is done independently of Base frame generation to ensure continuous tracking.
To track features, we use a Kanade-Lucas-Tomasi (KLT) tracking framework. KLT uses an iterative gradient-
descent search algorithm based on pixel differences in a local window (template) around a feature position [21] to
estimate a new feature position in the current frame. To maximize robustness and minimize execution time, we deploy
KLT on an image pyramid comparable to [22] with a fixed number of 3 levels and a template window size of 11x11 pix-
els. Additionally, feature positions are initialized prior to tracking through a derotation step, which integrates delta gyro
measurements between frames to predict future feature locations through large rotations (see Section F below).
We further limit the maximum number of KLT iterations and discard any feature that did not converge during the
iteration.
11 of 22
D. Outlier Rejection
Since KLT might get caught in a local minimum during its gradient-descend based search algorithm, a small number of
false feature matches might survive the tracking step. To eliminate potential false matches, we apply an outlier rejection
step to all newly matched features, that involves a homography-based RANSAC algorithm to identify the largest feature
inlier set between the most recent Base frame and the current Search frame. Using a homography constraint serves the
additional purpose of helping to enforce the ground plane assumption of the navigation filter. Specifically, only features
that are located on a common ground plane between a Base frame and a Search frame survive the outlier rejection which
naturally better conditions the selected features for the MAVeN state estimator. As an additional benefit, this scheme
guarantees that all features that are detected on non-static texture (e.g., the heliocpter shadow), are also eliminated by
the outlier rejection step under vehicle translation (but not pure rotation). Figure 6 shows an example of a Base frame
and a subsequent Search frame with feature tracks that are continuously tracked over 10 frames.
12 of 22
G. Computational Constraints
In order to achieve the required image processing frame rate of 30 Hz, all image processing has to be executed within
the 33 ms time limit between two successive frames. In a worst case scenario, the same image will need to be processed
as both a Search frame and a Base frame. Specifically, this happens when feature tracking is performed on a Search
frame and then a new Base frame is triggered because of insufficient feature tracks. This means that all components
of the vision processing software combined have to fit within one frame period. Execution times for all algorithm
steps on the target system are illustrated in Table 1. Results indicate a maximum execution time of 21.6 ms under this
worst-case scnario, providing a reasonable margin against the 33 ms limit.
Component min. [ms] max. [ms] avg. [ms]
Base Frame 4.3 8.8 4.7
Downloaded by Leith Bade on April 27, 2021 | http://arc.aiaa.org | DOI: 10.2514/6.2019-1411
Table 1. Runtimes on single Krait 400 core of Snapdragon 801 using well-textured images from simulation. Averages are calculated over
470 images.
H. Lessons Learned
1. FAST Threshold
As described in Section B, the number of derived FAST features for a given image depends on the chosen FAST
threshold that establishes a minimum brightness difference between a candidate feature and its neighboring pixels on
the FAST arc. Features with large brightness changes usually mark unique image content, increasing the likelihood
of being tracked in subsequent frames. But if the FAST threshold is chosen too high, fewer features are found and a
minimum number of desired features might not be reached. This problem is commonly solved by applying the FAST
detector to the same image with decreasing FAST thresholds, until the desired number of features is reached. Since
this dramatically increases the number of runs through the image and the required execution time, we chose to set the
FAST threshold to a low value and limit the number of extracted candidate features while applying the described stride
scheme to ensure good feature distribution in the image. This has the advantage that only one run through the image is
needed. Nevertheless, execution time is increased by the non-sequential memory access of the stride scheme, but we
found that this effect is only marginal for images of VGA size and the capability of modern CPUs to store significant
amounts of data in cache memory (the Snapdragon 801 has 2MB L2 cache on-chip).
Figure 7 illustrates the dependence of the number of detected features on the texture content of an image. Here, we
apply the FAST detector with various score thresholds to a stress case environment of very low-textured terrain. As
can be seen, a larger FAST threshold limits the number of features detected in the image.
To guarantee a sufficient amount of features in equivalent scenes, a FAST threshold of 3 was selected for our feature
tracking frame work.
13 of 22
2. COTS Hardware
The Mars Helicopter would not have been possible without the low size, weight and power functionality offered by
COTS hardware. The Snapdragon processor’s four cores were assigned such that an entire core was dedicated to vision
processing and an entire core to the navigation filter. This turned out to be essential for achieving the 30 Hz image frame
rate needed to track features through wind-gust-induced vehicle rates of up to 80 deg/sec. There were however several
major challenges that had to be overcome. Performance of the Snapdragon processor is temperature dependent, and it
throttles back when it gets hot. Fortunately, thermal issues are less of a problem in the relatively short Mars Helicopter
mission flights. Nevertheless, alternative approaches may be needed on future missions requiring long duration flights.
Another problem is that the COTS Linus OS (Linaro) is not a real-time operating system, and its latencies are load-
dependent and increase under heavy computation. Load-dependent latencies caused significant challenges for software
repeatability and developing reliable real-time estimation and control functions. Finally, proprietary closed source
software drivers for camera control and internal image processing made it difficult to change behavior of internal
functions. Autoexposure, contrast settings, and internal image sharpening filters were all largely black box functions
that had to be optimized using tedious manual cut-and-try methods based on dedicated flight tests.
3. Autoexposure
The OV7251 camera sensor requires a companion software autoexposure and autogain algorithm to cope with terrain
having changing albedo, as it operates with manually adjusted exposure time and analog gain parameter setpoints.
We use a vendor-supplied algorithm that adjusts both of these parameters based on image statistics in an attempt to
optimize a data-driven function predicting image quality for feature tracking. The algorithm is tuned to reduce analog
gain in order to minimize image noise, while relying on the expected radiance at the image sensor when viewing typical
scenes on Mars to keep the exposure times low enough to eliminate motion blur.
In addition to adjusting the exposure time and analog gain as described above, the maximum allowable exposure
change between subsequent frames is clamped to a specific value. The value is chosen such that the KLT tracker,
which incorporates a constant brightness assumption at the core of the gradient-descent algorithm, is able to reliably
track through exposure adjustments. In order to speed initial convergence of the autoexposure algorithm, the frame-
to-frame clamp is removed during initialization of the navigation algorithm on the ground, and then reapplied during
flight.
14 of 22
Simple methods could not be found to turn it off. A binary routine was eventually obtained from Qualcomm that could
turn it off, but it came too late to implement since all major testing and validation had already been performed.
15 of 22
As expected, drifts accumulate with time over the 200 seconds. The worst-case error norms are given by 0.6 m
in position and 32 cm/sec in velocity. The 32 cm/sec is dominated by ground truth errors as seen in the top right
velocity plot of Figure 8. This performance provides a good demonstration of MAVeN’s stable hover properties, and is
consistent with accuracies needed to support the Mars Helicopter mission.
16 of 22
17 of 22
18 of 22
19 of 22
ambiguity, resulting in reduced computational cost and improved numerical conditioning. Furthermore, unlike some
loosely coupled architectures [13], MAVeN immediately benefits from feature data once it begins receiving images,
without requiring a dedicated initialization step of the vision subsystem. Finally, unlike many other methods [14],
MAVeN does not require vehicle motion to maintain observability during hover, which makes it ideal for hovering
helicopter missions.
The two main disadvantages of MAVeN are sensitivity to rough terrain, due to the ground-plane assumption, and
long-term drift in position and heading. For the Mars Helicopter technology demonstration, this is an acceptable
tradeoff, because accuracy degradation is graceful and the algorithm has proven to be highly robust in both simulation
and experiments.
VIII. Conclusions
A navigation system has been developed for the NASA Mars Helicopter. The design is driven by the need for
completely autonomous operations and high reliability. The MAVeN navigation filter was deemed most useful because
it has a relatively low filter order (21 states), and the unique property that it is able to hover stably. The novelty
and intuition behind MAVeN’s relative measurement update was outlined and discussed. Special vision-processing
algorithms were discussed that address feature tracking over Mars-like terrain, under various stress conditions, and at
high vehicle rotation rates. Also included are summaries of lessons learned from the development of the navigation
system, and working with COTS hardware. Performance results indicate navigation accuracies of 1-3 m in position and
10-50 cm/sec in velocity over a flight envelope that includes flights having forward flight velocities of 1-5 m/s, hover
durations of 200 sec, and 400 meters total distance traversed. These results are consistent with accuracies needed to
support the Mars Helicopter mission.
Acknowledgments
This work was performed at the Jet Propulsion Laboratory, California Institute of Technology, under contract with
the National Aeronautics and Space Administration. This paper makes use of flight test data products acquired under a
separate JPL internally funded research & technology development program (joint with CalTech), focused on studying
the applicability of COTS hardware for future space missions [27]. The authors would like to thank all members of the
COTS study team who implemented the ECM3 avionics flight test article and helped with its flight testing. In addition
to authors, the JPL team included Theodore Tzanetos, Fernando Mier-Hicks, Gerik Kubiak, Lucas Leach, Russell
Smith, and the CalTech team included Prof. Soon-Jo Chung, Thayjes Srivas, Amarbold Batzorig, Kai Matsuka, Alexei
Harvard, Reza Nemovi, and Elaine Lowinger.
20 of 22
Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 6, 2007.
[6] Montiel, J. M., Civera, J., and Davison, A. J., “Unified inverse depth parametrization for monocular SLAM,”
Robotics: Science and Systems, 2006.
[7] R. Smith, M. Self, and P. Cheeseman, “Estimating uncertain spatial relationships in robotics,” arXiv preprint
arXiv:1304.3111, 2013.
[8] A. Miguel San Martin, David S. Bayard, Dylan T. Conway, Milan Mandic, Erik S. Bailey. “A Minimal State
Augmentation Algorithm for Vision-Based Navigation without Using Mapped Landmarks,” 10th International
ESA Conference on Guidance, Navigation & Control Systems, GNC 2017, Salzburg, Austria, May 29 - 2 June 2,
2017.
[9] A. Miguel San Martin, David S. Bayard, Dylan T. Conway, Milan Mandic, Erik S. Bailey, “A Minimal State Aug-
mentation Algorithm for Vision-Based Navigation Without Using Mapped Landmarks (MAVeN),” New Technol-
ogy Report, NTR # 50296, Software Copyright of Invention NPO 50296-CP, October 21, 2016.
[10] S. I. Roumeliotis and J. W. Burdick, “Stochastic cloning: A generalized framework for processing relative state
measurements,” Proc. IEEE Int. Conf. Robot. Autom., Washington, DC, pp. 1788-1795, 2002.
[11] D.S. Bayard and P.B. Brugarolas, “An On-Board Vision-Based Spacecraft Estimation Algorithm for Small Body
Exploration,” IEEE Transactions on Aerospace and Electronic Systems., Vol. 44, No. 1, pp. 243-260, January
2008.
[12] D.S. Bayard, “Reduced-Order Kalman Filtering with Relative Measurements,” AIAA J. Guidance, Control and
Dynamics, Vol. 32, No. 2, pp. 679-686, March-April, 2009.
[13] Klein, G., and Murray, D., “Parallel tracking and mapping for small AR workspaces,” 6th IEEE and ACM Inter-
national Symposium on Mixed and Augmented Reality, ISMAR 2007, pp. 225234, 2007.
[14] D.G. Kottas, K.J. Wu, and S.I. Roumeliotis, “Detecting and dealing with hovering maneuvers in vision-aided
inertial navigation systems,” Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference,
pp. 3172-3179, 2013.
[15] A.I. Mourikis and S.I. Roumeliotis, “A multi-state constraint Kalman filter for vision-aided inertial navigation,”
Robotics and Automation, 2007 IEEE International Conference, pp. 3565-3572, 2007.
[16] D. Nister, “An efficient solution to the five-point relative pose problem,” IEEE Trans. on Pattern Analysis and
Machine Intelligence, Vol. 26, No. 6, June 2004.
21 of 22
[19] E. Rosten, R. Porter, and T. Drummond, ”Faster and better: A machine learning approach to corner detection,”
IEEE Trans. Pattern Analysis and Machine Learning, Vol. 32, No. 1, January 2010.
[20] E. Rosten and T. Drummond. ”Machine learning for high-speed corner detection,” European Conference on Com-
puter Vision, pp. 430-443. Springer, Berlin, Heidelberg, 2006.
[21] B. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” Proceed-
ings of the International Joint Conference on Artificial Intelligence, pages 674-679, 1981.
[22] Jean-Yves Bouguet, ”Pyramidal implementation of the affine Lucas Kanade feature tracker description of the
algorithm,” Intel Corporation 5, no. 1-10, pp. 4, 2001.
Downloaded by Leith Bade on April 27, 2021 | http://arc.aiaa.org | DOI: 10.2514/6.2019-1411
[23] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. (2nd ed.). Cambridge University Press,
2003.
[24] S. Weiss, M.W. Achtelik, S. Lynen, M. Chli and R. Siegwart, “Real-time onboard visual-inertial state estimation
and self-calibration of mavs in unknown environments,” Robotics and Automation (ICRA), 2012 IEEE Interna-
tional Conference, 957-964, 2012.
[25] M. Li and A.I. Mourikis “Vision-aided inertial navigation for resource-constrained systems,” Intelligent Robots
and Systems (IROS), 2012 IEEE/RSJ International Conference, pp. 1057-1063, 2012.
[26] B. Williams, N. Hudson, B. Tweddle, R. Brockers and L. Matthies, “Feature and pose constrained visual aided
inertial navigation for computationally constrained aerial vehicles,” pp. 431-438, 2011.
[27] D. S. Bayard, D. Conway, R. Brockers, J. Delaune, H. Grip, F. Mier-Hicks, G. Kubiak, G. Merewether, L. Leach,
R. Smith, L. Matthies, and T. Tzanetos, “6-DOF visual inertial navigation technology on SWAP constrained
COTS platforms: Hexacopter Project Final Report,” Tech. Rep., NASA JPL, October 18, 2018.
22 of 22