0% found this document useful (0 votes)
65 views15 pages

Satellite Pose Estimation Challenge: Dataset, Competition Design and Results

Reliable pose estimation of uncooperative satellites is a key technology for enabling future on-orbit servicing and debris removal missions. The Kelvins Satellite Pose Estimation Challenge aims at evaluating and comparing monocular visionbased approaches and pushing the state-of-the-art on this problem. This work is based on the Satellite Pose Estimation Dataset, the first publicly available machine learning set of synthetic and real spacecraft imageries.

Uploaded by

cds.asal.dz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views15 pages

Satellite Pose Estimation Challenge: Dataset, Competition Design and Results

Reliable pose estimation of uncooperative satellites is a key technology for enabling future on-orbit servicing and debris removal missions. The Kelvins Satellite Pose Estimation Challenge aims at evaluating and comparing monocular visionbased approaches and pushing the state-of-the-art on this problem. This work is based on the Satellite Pose Estimation Dataset, the first publicly available machine learning set of synthetic and real spacecraft imageries.

Uploaded by

cds.asal.dz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 1

Satellite Pose Estimation Challenge: Dataset,


Competition Design and Results
Mate Kisantal, Sumant Sharma, Member, IEEE, Tae Ha Park, Student Member, IEEE, Dario Izzo,
Marcus Märtens, and Simone D’Amico

Abstract—Reliable pose estimation of uncooperative satellites relative to the servicer spacecraft (i.e., pose). However, the
is a key technology for enabling future on-orbit servicing and targets of interest, including defunct satellites and debris pieces,
debris removal missions. The Kelvins Satellite Pose Estimation
arXiv:1911.02050v2 [cs.CV] 24 Apr 2020

are uncooperative and thus incapable of providing the servicer


Challenge aims at evaluating and comparing monocular vision-
based approaches and pushing the state-of-the-art on this prob- the information on their state. Moreover, the servicer cannot rely
lem. This work is based on the Satellite Pose Estimation Dataset, on the availability of known fiduciary markers on these targets.
the first publicly available machine learning set of synthetic and Overall, the servicer must be able to estimate and predict
real spacecraft imageries. The choice of dataset reflects one of the target’s relative pose on-board without human-in-the-loop.
the unique challenges associated with spaceborne computer vision It is especially attractive to perform pose estimation using a
tasks, namely the lack of spaceborne images to train and validate
the developed algorithms. This work briefly reviews the basic vision-based sensor such as a camera due to its small mass
properties and the collection process of the dataset which was and power requirements compared to other active sensors such
made publicly available. The competition design, including the as Light Detection and Ranging (LIDAR) or Range Detection
definition of performance metrics and the adopted testbed, is also and Ranging (RADAR). Moreover, monocular cameras are
discussed. The main contribution of this paper is the analysis favored over stereo systems due to their relative simplicity and
of the submissions of the 48 competitors, which compares the
performance of different approaches and uncovers what factors the fact that spacecraft, especially emerging small spacecraft
make the satellite pose estimation problem especially challenging. such as CubeSats, do not allow for a large enough baseline
to make stereovision effective. In order to enable autonomous
Index Terms—Satellites, pose estimation, computer vision, ma- pose estimation, the servicer then must harness fast and robust
chine learning, convolutional neural networks, feature detection computer vision algorithms to compute relative position and
attitude of the target from a single or a set of monocular images.
Cassinis et al. [4] provide a comprehensive survey of different
approaches for pose estimation of uncooperative spacecraft.
I. I NTRODUCTION
Starting with the the success of AlexNet [5] in the ILSVRC

I N recent years, mission concepts such as debris removal challenge [6] in 2012, deep learning models have been
and on-orbit servicing have gained increasing attention from outperforming traditional approaches on a number of computer
academia and industry in order to address the congestion in vision problems. However, deep learning relies on large
Earth orbits and extend the lifetime of geostationary satellites. annotated datasets. While there is a plethora of large-scale
These include the RemoveDEBRIS mission by Surrey Space datasets for various terrestrial applications of computer vision
Centre [1], the Phoenix program by DARPA [2], the Restore-L and pose estimation that allows training the state-of-the-art
mission by NASA [3], and the on-orbit servicing programs machine learning models, there is a lack of such datasets for
proposed by startup companies such as Infinite Orbits1 and spacecraft pose estimation. The main reason arises from the
Effective Space2 . A key to performing these tasks is the difficulty of acquiring thousands of spaceborne images of the
availability of the target spacecraft’s position and attitude desired target spacecraft with accurately annotated pose labels.
Moreover, a lack of common datasets makes it impossible
M. Kisantal was with the Advanced Concepts Team of the European Space
Agency, Noordwijk, The Netherlands (email: [email protected]) to systematically evaluate and compare the performance of
S. Sharma is a Computer Vision Engineer at Wisk Aero LLC, 2700 Broderick different pose estimation algorithms. In order to address these
Way, Mountain View, CA 94043 USA (e-mail: [email protected]) difficulties, the Satellite Pose Estimation Challenge (SPEC)
T. Park, and S. D’Amico are with the Department of Aeronautics and
Astronautics, Stanford University, 496 Lomita Mall, Stanford, CA 94305 USA was organized by the Space Rendezvous Laboratory (SLAB) at
(email: {tpark94, damicos}@stanford.edu). Stanford University and the Advanced Concepts Team (ACT) of
D. Izzo and M. Märtens are with the Advanced Concepts Team of the the European Space Agency (ESA). The challenge was hosted
European Space Agency, Noordwijk, The Netherlands (email: {dario.izzo,
marcus.maertens}@esa.int) on the ACT’s Kelvins competition website3 , a platform hosting
c 2020 IEEE. Personal use of this material is permitted. Permission from a number of space-related competitions. The primary aim of
IEEE must be obtained for all other uses, in any current or future media, the SPEC was to provide a common benchmark for satellite
including reprinting/republishing this material for advertising or promotional
purposes, creating new collective works, for resale or redistribution to servers pose estimation algorithms, identify the state-of-the-art, and
or lists, or reuse of any copyrighted component of this work in other works. show where further improvements can be made. Furthermore,
1 https://www.infiniteorbits.io/
2 https://www.effective.space/ 3 https://kelvins.esa.int/

0000–0000/00$00.00 c 2020 IEEE


IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 2

such dedicated challenges have potential to raise awareness phase of the PRISMA mission [16], [26]. By grouping edge
of the problems of the satellite pose estimation in the wider features into a geometrically meaningful shape, they were able
scientific community, bringing in new ideas and researchers to to reduce the size of the feature correspondence search space.
this field. The work was followed by Sharma et al. [27] who additionally
The dataset for the SPEC, named Spacecraft Pose Estimation introduced Weak Gradient Elimination (WGE) technique to
Dataset (SPEED) [7], [8], mostly consists of synthetic images essentially separate the spacecraft’s edge features from the
and the submissions were solely ranked by their accuracy weak edge features of the background. While the proposed
as evaluated on these images. The dataset also includes a architecture showed improved performance on the spaceborne
smaller amount of real images which were collected using images from the PRISMA mission, the method was affected
a realistic satellite mockup and the Testbed for Rendezvous by low availability of high confidence solutions.
and Optical Navigation (TRON) facility of SLAB [9], [10]. On the other hand, recent years have seen a significant
Even though the domain adaptation was not the main focus of breakthrough in computer vision with the advent of Deep
the competition, evaluating the submissions on these images Neural Networks (DNN). It was made possible by increasing
provides an indication of the generalization capability of the computational resources represented by the Graphical Process-
proposed algorithms. ing Units (GPU) and the availability of large-scale datasets
The main contribution of this work is the analysis of the to train the DNN, such as ImageNet for classification [5],
SPEC results. On the one hand, samples of the dataset are MS COCO for object detection [28], and LINEMOD for
ranked based on performance of the submitted algorithms pose estimation [29] of ordinary household objects. While
to uncover which factors contribute to the difficulty of the various DNN-based approaches have been proposed to perform
pose estimation task the most. Target distance and background pose estimation [30]–[40], current state-of-the-art methods
were found to be the main challenges. On the other hand, an employ Convolutional Neural Networks (CNN) that either
analysis of the submissions and comparison of the efficacy of directly predict the 6D pose or an intermediate information
different approaches are presented based on a survey conducted that can be used to compute the 6D pose, notably a set
among the participants. Perspective-n-Point (PnP) solver-based of keypoints defined a priori. For example, PoseCNN [37]
approaches were found to be significantly more accurate directly regresses a 3D translation vector and a unit quaternion
compared to direct pose estimation approaches. Including a representing the relative attitude of the target, whereas SPN
separate detection step was also found to be an important [7], [9] poses attitude prediction as a classification problem by
element of high performing pose estimation pipelines. It allows discretizing the viewpoint space into a finite number of bins.
cropping the relevant part of the images and zooming on the Most recently, architectures like KPD [39] and PVNet [40]
satellite, which brings significant benefits in terms of orientation have been proposed to predict the locations of the 2D keypoints
accuracy. on the target’s surface. Given the corresponding 3D coordinates
After a review of the related pose estimation research in of the keypoints from available models, one can solve the PnP
Section II, Section III discusses the creation of the dataset, problem [41] to compute the relative position and attitude. It is
and Section IV briefly discusses the competition design noteworthy to mention that terrestrial applications of the object
considerations. This is followed by an in-depth analysis of the pose estimation are not typically subject to strict navigation and
final submissions in Section V. Finally, the recommendations computation requirements as for satellite on-orbit servicing.
for further improvements are given in Section VI.

II. R ELATED W ORK III. DATASET


The classical approach to monocular-based pose estimation
of a target spacecraft [11]–[17] would first extract hand- This section provides a high-level description of SPEED,
crafted features of the target from a 2D image. These features which comprises the training and test images of this challenge.
include Harris corners [18], Canny edges [19], lines via Hough SPEED [8] represents the first publicly available machine
transform [20], or scale-invariant features such as SIFT [21], learning data set for spacecraft pose estimation initially released
SURF [22], and ORB features [23]. Upon successful extraction in February 2019. The images of the Tango spacecraft from
of said features, iterative algorithms are required to predict the the PRISMA mission [16], [26] are generated from two
best pose solution that minimizes a certain error criterion in different sources, referred to as synthetic and real images in
the presence of outliers and unknown features correspondences. the following. Both images are created using the same camera
The process is crucial in providing a good initial pose estimate model. Specifically, the real images are captured using the
to the on-board vision-based navigation system [24], [25]. Point Grey Grasshopper 3 camera with a Xenoplan 1.9/17 mm
Earlier works on initial pose estimation tended to rely on a lens, while the synthetic images are created using the same
coarse a priori knowledge of the target’s pose [11], [13], [14] or camera properties. The ground-truth pose labels, consisting
assumed the availability of active fiduciary markers or sensors of the translation vector and a unit quaternion describing the
on the target [12]. Without making any such assumptions, relative orientation of the Tango spacecraft with respect to the
D’Amico et al. [16] were one of the first to publish pose camera, are released along with the associated training images.
estimation results using Hough transform and Canny edge The readers are encouraged to read [7] and [9] for more details
detector on spaceborne images captured during the rendezvous on the dataset.
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 3

Fig. 1. Examples of synthetic training images from SPEED.

(a) Flight imagery (b) Beierle (c) SPEED (d) Histogram of pixel intensities
Fig. 2. Cropped versions of (a) the flight imagery captured during the PRISMA mission [26], (b) synthetic imagery in Beierle and D’Amico [43], (c) SPEED
synthetic imagery, and (d) histogram comparison of image pixel intensities of the three images. They are cropped from the downscaled 224 × 224 pixel images.

Fig. 3. Left: TRON facility at SLAB. Right: Two examples of real training images from SPEED.

A. Creation of the synthetic dataset rendering pipeline to generate photo-realistic images of the
Tango spacecraft with desired ground-truth poses (examples
The synthetic images of the Tango spacecraft are created
are shown on Fig. 1). Random Earth images captured by the
using the camera emulator software of the Optical Stimulator
[42], [43]. The software uses the OpenGL-based image
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 4

Fig. 4. Left: SPEED synthetic imagery. Right: SPEED real imagery.

Himawari-8 geostationary weather satellite4 are inserted to the TABLE I


background of half of the synthetic images. For these images, N UMBER OF I MAGES IN D IFFERENT PARTITIONS OF THE DATASET
the illumination conditions are created to best match those of
Synthetic Real
the background Earth images. Finally, all images are processed
with Gaussian blurring (σ = 1) and zero-mean, Gaussian white Training set 12000 5
noise (σ 2 = 0.0022) using the MATLAB’s imgaussfilt Test set 2998 300
and imnoise commands, respectively.
From Fig. 2, it is clear that the synthetic imagery of SPEED
can closely emulate the illumination conditions captured from ground-truth poses and general direction of Earth albedo, one
the actual flight imagery, indicated by the overlapping histogram can readily observe a number of discrepancies in the image
curves of the image pixel intensities of both imageries. This properties, such as the spacecraft’s texture, illumination and
demonstrates significant improvement of SPEED’s image eclipse of certain spacecraft features.
rendering pipeline over the previous work by Beierle and
D’Amico [43] and its capability of generating photorealistic C. Basic Dataset Properties
images of any desired spacecraft with specified pose labels. The released dataset contains almost 15000 synthetic and
300 real images and is partitioned into the training and test
B. Collecting real images with TRON sets according to Table I. Note that while synthetic images are
The real images of the Tango spacecraft are captured using partitioned into 8:2 ratio, only five real images are provided
the TRON facility of SLAB [9], [43] as shown in Fig. 3. At with labels for training. It represents a common situation in
the time of image generation, the facility consisted of a 1:1 spaceborne applications in which the images of an orbiting
mockup model of the Tango spacecraft and a ceiling-mounted satellite are scarce and difficult to obtain. All images are
seven degrees-of-freedom robotic arm, which holds the camera grayscale with high resolution (1920 × 1200 pixels).
at its end-effector. The facility also includes custom Light- Fig. 5 graphically describes the spacecraft body and camera
Emitting Diode (LED) wall panels which can simulate the reference frames to visualize the position and orientation
diffused illumination conditions due to Earth albedo and a distributions of the dataset. Specifically, zC is aligned with
xenon short-arc lamp to simulate collimated sunlight in various the camera boresight in the camera reference frame, while zB
orbit regimes. The ground-truth pose labels for the real images is perpendicular to the solar panel in the Tango’s body reference
are acquired using ten Vicon cameras [44] that track infrared frame. (x C , y C ) and (x B , yB ) then form a plane perpendicular

(IR) markers on the satellite mockup and the test camera. to z C and z B , respectively, as shown in Fig. 5.
Careful calibration processes outlined in [9] are performed Fig. 6 shows the range of relative position distributions in
to remove any biases in the estimated target and camera the dataset in the camera frame. The distance of the satellite
reference frames. Overall, the independent pose measurement in the synthetic images is between 3 and 40.5 meters. Due to
of the calibrated Vicon system provides the pose labels with physical limitations of the TRON facility in combination with
degree-level and centimeter-level accuracy [9]. Current work is the size of the satellite mockup, the distance distribution of
undergoing to improve the accuracy of the ground-truth pose real images is much more constrained, ranging from 2.8 to 4.7
by one order of magnitude by fusing Vicon cameras and robot meters.
measurements concurrently. Fig. 7 visualizes the relative orientation and position distribu-
Fig. 4 provides a qualitative comparison of synthetic and real tions for real images in the satellite body frame. For synthetic
images of SPEED. Note that while both images share identical images, Fig. 8 visualizes the relative position distribution in
the satellite body frame. It especially visualizes the fact that for
4 https://himawari8.nict.go.jp/ synthetic images, the relative orientations are well distributed
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 5

SPEC particularly aimed to focus community efforts on


xC the problem of estimating pose of uncooperative satellites.
RBC The following sections describe the competition setup and the
baseline solutions provided to the participants and introduce
zC the competition metric.
xB tBC
yC
A. Competition setup - the Kelvins competition platform
u Kelvins, the platform which hosts SPEC and many other
zB satellite-related challenges, was designed to provide a seamless
experience for the participants. It features a live leaderboard

yB v that is a key for maintaining community engagement over


longer intervals. Teams have direct information about how
their latest submission compares to their peers, the limits are
constantly pushed further, and the competitive aspect brings
more motivation for teams to put in effort. Another important
feature is the automated evaluation of submissions. This allows
for keeping the test set private, which helps ensuring a fair
competition. During the competition only 20% of the test set
was used for evaluation and placement in the leaderboard in
order to prevent the participants from overfitting on the entire
Fig. 5. Definition of spacecraft body reference frame (B), camera reference test set.
frame (C), relative position (tBC ), and relative orientation (RBC ).

B. Competition Metric
across the 3D space. However, in case of real images, the
diversity of orientations and distances is restricted due to The competition metric has to faithfully reflect the underlying
physical limitations. scientific problem in order to ensure that the high-scoring
solutions are meaningful also outside the context of the
competition. While it is not uncommon to have separate
IV. C OMPETITION D ESIGN orientation and position metrics [32], a single scalar score
was used instead to rank the submissions on the leaderboard.
In an open scientific competition such as SPEC and other
To evaluate the submitted pose solutions, separate position
Kelvins competitions, scientific problems are turned into
(et ) and orientation (eq ) errors are computed. Fig. 5 graphically
well-formulated mathematical problems that are solved by
describes the relevant reference frames to compute the errors.
engaging the broader scientific community and citizen scientists.
The position error, et , is defined as
Therefore, there are two key factors that are considered in
setting up the competition: et = tBC − t̂BC 2 , (1)
• community engagement: The participants and the effort
they put into solving the problems are our main resource. the magnitude (2-norm) of difference between the ground-truth
Therefore, a broad audience has to be reached to attract many (tBC ) and estimated (t̂BC ) position vectors from the origin of
individuals and teams. Then, the barrier to entry into the the camera reference frame C to that of the target body frame
competition has to be as low as possible. Finally, engagement of B. The normalized position error, ēt is also defined as
the participants has to be maintained. This last point involves et
ēt = , (2)
making sure that the problem can be solved based on the |tBC |2
released dataset (e.g., images contain sufficient information for
pose estimation, samples are well distributed in the pose space, which penalizes the position errors more heavily when the
etc.), that solutions are quickly evaluated and added to a live target satellite is closer.
leader board, and in general that the competition is fair (e.g., The orientation error eq is calculated as the angular distance
by keeping the test set private). between the predicted, q̂ = q(R̂BC ), and true, q = q(RBC ),
• competition metric: The creation of the competition metric unit quaternions, i.e., the magnitude of the rotation that aligns
is the process in which the scientific problem of interest is the target body frame B with the camera reference frame C,
turned into an optimization problem. Care should be taken in
eq = 2 · arccos (|hq̂, qi|) . (3)
designing the competition metric, as it has to directly reflect
the important aspects of the problem. Otherwise a solution where hq̂, qi denotes the inner product of two unit quaternions.
to the optimization problem might not be relevant to the The pose error epose for a single image is the sum (1-norm)
original scientific problem. In case the metric can be cheated, of the orientation and the normalized position error,
participants may focus on specific solutions that might lead to
good scores but are of less practical value. epose = eq + ēt , (4)
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 6

Fig. 6. Position distributions of the pose labels across the dataset in the camera frame (C), for synthetic (left) and real (right) samples.

Finally, the total error E is the average of the pose errors over from the origin of the target’s body frame. Furthermore, the
all N images of the test set, reprojection error is numerically unstable in the case when
N
predicted keypoints lie very close to the image plane.
1 X (i)
E= e . (5)
N i=1 pose C. Baseline solutions
Two different example solutions are provided to the partici-
A main concern during the creation of the competition metric
pants in Python using two popular deep learning frameworks,
was to balance its sensitivity to position and orientation errors
Keras and PyTorch5 . The main reason for providing these
and avoid situations where one factor dominates the metric
baseline solutions is to lower the barriers of entering the
while neglecting the other. Note that since the position error is
competition. While the performance of these baselines is
dependent on the target distance, the balance between the two
intentionally rather weak, it still allows competitors to submit
contributions also depends on the particular distance distribution
their first result within an hour. Along with the example
of the test set.
solutions, the competition platform provides useful tools that
In order to check the balance of the sensitivities, the
facilitate working with the dataset, such as functions to visualize
total error E was calculated over the test set for two cases:
samples and corresponding pose labels, or data loaders for the
introducing 1◦ of orientation error in the first case, and adding
two deep learning frameworks.
0.1 m translation error in the second case. It was shown that 0.1
The baseline solutions rely on pre-trained ResNet models
m translation error, on average, is equivalent to 0.7094◦ error
[46] where the last layer is replaced with a layer containing
for the particular distribution of poses in the test in the first
seven linear outputs for the pose variables. The models are fed
case. Likewise, 1◦ orientation error was shown to be equivalent
with downscaled 224×224 pixel images and trained with simple
to 0.141 m translation error in the second case. Such behavior
Mean-Squared Error (MSE) loss for 20 epochs. These baselines
is expected due to the underlying perspective equations which
leave quite some room for improvements. For instance, the
drive image formation. This suggested the contributions of
outputs are not normalized, or the predicted distance along the
each error type are reasonably balanced, thus the total score
camera boresight is typically one order of magnitude larger
combines both errors without the introduction of additional
than all the other output variables. Using the MSE loss, errors
scaling factors.
in this direction dominate the loss. Furthermore, MSE loss
Two alternative metrics were also considered. The reprojec- does not account for the periodicity of orientation.
tion error is the average distance between projected keypoints Keeping the baseline solutions intentionally simple and
measured in 2D on the image plane [45]. The average distance weak helped to engage the participants in the competition.
error is the 3D distance between the ground truth and predicted These baselines allow for incremental improvements, such as
keypoints (usually referred to as ADD metric [29]). Both have replacing the loss function or training on larger input images.
the disadvantage that the orientation and position sensitivity Additionally, a stronger third baseline solution, also based on
is dependent on the choice of keypoints, since the slope of
orientation error is proportional to the distance of the keypoints 5 https://gitlab.com/EuropeanSpaceAgency/speed-utils
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 7

Fig. 7. Camera poses for real images in the Tango’s body frame (B) from two views. The simplified wireframe model of the satellite is plotted in green,
camera poses are plotted in red and black for test and training samples, respectively.

Fig. 8. Distribution of the camera’s relative positions of the synthetic images in the Tango’s body frame (B) from two views. The satellite is in the origin,
training and test camera poses are plotted in red and black for test and training samples respectively.
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 8

CNN, was developed during the competition by SLAB and is image. The bounding box labels for the training images
used for comparison purposes. are obtained by projecting the recovered landmarks onto
2D image plane using the provided ground-truth pose
V. C OMPETITION R ESULTS labels. Then, the team trains a landmark regression CNN
During the competition, 48 teams participated and submitted on cropped images to obtain the 2D locations of all
results. 20 teams filled a post-competition questionnaire and 11 landmarks. The team uses HRNet [50] to regress
provided detailed descriptions about their approaches. This the heatmaps associated with each landmark instead of
section analyzes and compares their submissions, provides directly regressing the 2D coordinates. Finally, given
a brief description of the top 4 approaches, evaluates the predicted 2D-3D correspondences of the landmarks, they
performance of the different approaches, and identifies difficult perform a robust nonlinear optimization to compute the
samples to show what are the current limits of this technology. pose estimates.
2) EPFL_cvlab8 : Similar to the approach of the team
UniAdelaide, the team EPFL_cvlab regresses 8
A. Final results landmarks corresponding to 8 corners of the satellite’s
Fig. 9 illustrates the final scores. The first 20 teams cubic body. Their approach is based on segmentation-
significantly outperformed the initial baseline with the top driven CNN [51], which divides the image into S × S
teams getting a two orders of magnitude improvement over grids and has each grid predict the presence of the object
the baseline solutions.6 (segmentation) and 2D locations of the 8 keypoints
While the primary competition ranking criteria was the score along with their confidence values. Then, out of all
on the synthetic test set, submissions were also evaluated on keypoint candidates, n most confident keypoints are used
the real test set. Results on real images are weaker compared to to compute the pose estimates using a RANSAC-based
those on synthetic images for most teams, except for three of PnP solver [52] .
the solutions. Machine learning models are generally expected 3) pedro_fairspace [48]: The team adopts the ResNet-
to perform worse when evaluated on data with a statistical based architecture [46] to directly regress the 3D position
distribution that significantly differs from their training set. It and unit quaternion vectors. Such mechanism allows for
is possible that the reason those three teams achieved better directly optimizing the competition pose error as defined
results on real imagery is related to its limited pose distribution. in Eq. 4. Instead of a norm-based loss of unit quaternions,
The results of the top ten teams are collected in Table II and the team proposes to formulate the orientation regression
compared to the baseline network developed by SLAB [49] dur- as soft classification based on a Gaussian mixture model
ing the course of the competition. While team UniAdelaide to handle the attitude ambiguity.
[47] won the competition by achieving the highest score on the 4) SLAB Baseline [49]: The team SLAB Baseline
synthetic test set, EPFL_cvlab7 achieved the highest accuracy exploits a pose estimation architecture similar to that
on real images. pedro_fairspace [48] submitted the best of the team UniAdelaide, incorporating the recovery
submission that did not rely on PnP solvers, finishing on the of 11 keypoints and a separate object detection CNN to
third place. These top three solutions were the only submissions crop the most relevant areas of the images. They use
to outperform the SLAB baseline. Before the competition, the YOLO-based CNN architectures [53], [54] for both object
best published result on SPEED was Spacacraft Pose Network detection and keypoint regression tasks. However, the
(SPN) by Sharma and D’Amico [7], [9]. SPN was also the architectures exploit depth-wise separable convolution
first published result on SPEED benchmark prior to its public operations [55] to significantly reduce the number of
release, and its reported performances in terms of the mean parameters associated with each network. They use EPnP
orientation and position error are eq = 8.43◦ and et = 0.783 [41] to compute the pose estimates.
m.
C. Survey on methods
B. Approaches of the top 4 competitors Shortly after the competition, all participants were asked
Here, the technical details of the top 4 competitors (i.e., top to answer a short surveying questionnaire regarding their
3 performers and the SLAB baseline) are briefly covered for backgrounds, the approaches they used, and how they dealt
completeness. For detailed descriptions of each method, the with certain aspects of the problem. 20 teams, including the
readers are encouraged to refer to the relevant materials. top 13 competitors, answered the survey. Most of the teams
1) UniAdelaide [47]: The team UniAdelaide first (except for three) consisted of a single individual contributor,
recovers the 3D coordinates of arbitrarily chosen 11 affiliated with academic institutions (35%) or industry (30%).
landmark points, or keypoints, on the Tango satellite via It is noteworthy that only half of the teams were involved with
multi-view triangulation. They also regress the 2D bound- space related research, and 65% were not working on pose
ing box around the satellite using an object detection estimation problems at all.
CNN, which is used to crop the satellite from an original Deep learning approaches dominated the submissions, as
all teams used deep learning either in an end-to-end fashion
6 Final leaderboard: https://kelvins.esa.int/satellite-pose-estimation- or as an intermediate process in their pipelines. The teams
challenge/results/
7 https://www.epfl.ch/labs/cvlab/ 8 https://indico.esa.int/event/319/attachments/3561/4754/pose gerard segmentation.pdf
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 9

101

100
synthetic test set
Pose score

real test set


synthetic baseline
real baseline
10 1

10 2

1 5 10 15 20 25 30 35 40 45 50
Final rank

Fig. 9. Final results on the synthetic and real test sets.

TABLE II
D ETAILED R ESULTS OF THE T OP T EN S UBMISSIONS C OMPARED TO THE SLAB’ S BASELINE P ERFORMANCE

Team Esyn Ereal eq [◦ ] et [m] PnP

1. UniAdelaide [47] 0.0094 0.3752 0.41 ± 1.50 0.032 ± 0.095 Yes


2. EPFL cvlab 0.0215 0.1140 0.91 ± 1.29 0.073 ± 0.587 Yes
3. pedro fairspace [48] 0.0571 0.1555 2.49 ± 3.02 0.145 ± 0.239 No
SLAB Baseline [49] 0.0626 0.3951 2.62 ± 2.90 0.209 ± 1.133 Yes
4. Team Platypus 0.0703 1.7201 3.11 ± 4.31 0.221 ± 0.530 No
5. motokimura1 0.0758 0.6011 3.28 ± 3.56 0.259 ± 0.598 No
6. Magpies 0.1393 1.2659 6.25 ± 13.21 0.314 ± 0.568 No
7. GabrielA 0.2423 2.6209 12.03 ± 12.87 0.318 ± 0.323 No
8. stainsby 0.3711 5.0004 17.75 ± 22.01 0.714 ± 1.012 No
9. VSI Feeney 0.4658 1.5993 23.42 ± 33.57 0.734 ± 1.273 No
10. jblumenkamp 0.8859 4.2418 35.92 ± 49.72 2.656 ± 2.149 Yes
Best results for each metric are highlighted with bold fonts. The mean and the standard deviation of the orientation errors (eq )
as in (3) and position errors (et ) as in (1) are measured on the synthetic test set.

addressed the pose estimation problem as a regression task, CNN to perform localization before cropping in order to prevent
except for one team that framed orientation prediction as a any loss of information due to downscaling. 60% of the teams
soft classification problem. Various architectures were used used ImageNet pre-trained models that expect three channel
from well known pre-trained models, such as ResNets [46], RGB input images. Since the dataset consists of single channel
Inception v3 [56], and YOLO [53], [54], to custom models grayscale images, this provided additional freedom for teams
trained from scratch. 18 of the 20 teams made use of the data for constructing their input. While most teams simply stacked
augmentation techniques to maximize their performance, such the same input channel to have RGB input, two teams included
as geometric transformations (e.g., rotation around the camera masked or filtered versions of the input on the extra channels.
axis, zooming and cropping) and pixel intensity changes (e.g.,
Since the 3D model for the satellite was not released as
adding noise, changing brightness).
part of the competition, some teams chose to reconstruct the
SPEED consists of high resolution images that are not satellite model in order to use any keypoints-based architecture.
suitable as direct inputs to a neural network due to memory limi- Specifically, seven teams reconstructed the 3D coordinate
tations of GPUs. Therefore, all teams performed downscaling of locations of 8 to 11 keypoints using 10 to 20 hand-selected
the given images to a variety of sizes ranging from 224 × 224 images and the provided pose labels. The keypoints generally
to 960 × 640 pixels. Some teams cropped the input image, correspond to the corners of the satellite body and the tips of the
either taking a sufficiently large central crop or localizing the antennae. The method of reconstruction ranged from manually
satellite first and then cropping the relevant part of the image. aligning the vertices to triangulation or reprojection-based
Specifically, a number of top-scoring teams used a separate optimization. The resulting models were used for generating
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 10

120 45
direct
40 PnP
100
35
80 30
Normalized count

Normalized count
25
60
20
40 15
10
20
5
0 0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Normalized position error Orientation error

Fig. 10. Position (left) and orientation (right) error distributions for direct and PnP solver based methods.

16 8
separate localization
14 7 combined

12 6

10 5
Normalized count

Normalized count

8 4

6 3

4 2

2 1

0 0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
Normalized position error Orientation error

Fig. 11. Position (left) and orientation (right) error distributions highlighting the effect of a localization step prior to orientation estimation.

bounding box or segmentation ground truth from the available network. Then, they use a keypoint matching technique such
pose labels, and in some cases directly in the pose estimation as a PnP solver to align a known model of the satellite
process with PnP solvers. (e.g., reconstructed 3D keypoint coordinates) with the detected
keypoints. While the PnP optimization is prone to local minima,
it allows for explicitly incorporating the geometric constraints
D. Comparing approaches
in the pose estimation process.
This section provides the analysis of survey results and
submissions together to compare design decisions in light Fig. 10 illustrates the error distributions for the solutions
of the final results. In particular, it discusses how keypoint based on PnP and direct pose estimation separately for position
matching techniques compare to pure deep learning approaches and orientation error. Specifically, the performances of the top
and what the effect of a separate localization step is in the 10 teams were analyzed to compare the PnP solutions and strong
pose estimation pipeline. direct pose estimation submissions. In the submissions, PnP-
1) Keypoint matching techniques: Most teams designed an based solutions significantly outperform direct pose estimation
architecture that predicts the target’s pose in an end-to-end both in terms of position and orientation performance, ranking
fashion. However, four teams designed an architecture that on the first, second and fourth places. The average orientation
first predicts a set of pre-defined keypoints using a neural errors and their deviations are 9.76◦ ±18.51◦ and 1.31◦ ±2.24◦
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 11

for direct and PnP methods, respectively, while relative position in some cases the contrast of the satellite to the background is
errors are 0.0328 ± 0.0430 m and 0.0083 ± 0.0269 m. minimal. This makes pose estimation particularly challenging.
2) The effect of separate localization: Another recurring In fact, just spotting the satellite in these images is a demanding
technique across the participants is the use of a separate task for humans as well (see first four images in Fig. 12).
localization step. In this case, the first step is the detection of Fig. 13 also highlights the importance of the inter-spacecraft
the satellite, either by segmenting its contour or identifying a distance. The figure plots the distribution of minimum pose
tight-fitting bounding box around it. This step separates the errors with respect to the inter-spacecraft distance, with the
position and orientation estimation tasks, and allows to train mean and standard deviation of minimum pose errors calculated
separate models. The main advantage is that an intermediate over 1 meter wide distance bins. The distribution of scores is
detection result allows for cropping the original high resolution correlated with the target distance, i.e., it is harder to estimate
image, to use only the relevant part of the images downstream. the pose of the satellites that are farther away. This is expected,
The disadvantages of this approach are the added complexity since larger target distance results in a smaller apparent size
and the need for segmentation/bounding box annotation via a of the satellite, corresponding to less pixels associated with
separate model reconstruction step. the spacecraft.
Fig. 11 compares the error distributions of the top 8
teams that use direct pose estimation methods (i.e., no PnP
VI. C ONCLUSION AND FUTURE WORK
solver). Specifically, the half of the selected teams uses an
independent localization step in their direct pose estimation The aim of organizing the Satellite Pose Estimation Chal-
approach, whereas the other half uses a combined architecture lenge (SPEC) was to draw more attention to the satellite pose
that performs localization and pose estimation simultaneously. estimation problem and to provide a benchmark to gauge
Interestingly, the position error distributions are nearly identi- different approaches. Nearly 50 teams participated during the 5
cal, while separate localization significantly outperforms the month long duration of the competition. This paper summarizes
combined approach in terms of the orientation. This suggests the creation of the dataset and the considerations put into
that localization does not bring any benefits in terms of designing this competition. Based on the submissions and a
detecting the position, having it predicted simultaneously with survey conducted amongst the top performing participants, the
the orientation of the satellite is just as accurate. However, analysis is presented on different approaches to the problem.
the capability to crop irrelevant parts and zoom in on the The top performing participants managed to significantly
important part of the image makes a significant difference in outperform the previous state-of-the-art and push the boundaries
orientation estimation. Specifically, the mean orientation error of the vision-based satellite pose estimation further.
and deviation is 29.66◦ ±46.10◦ as opposed to 48.03◦ ±49.38◦ The analysis on the submissions discovered that the target
of the combined approach. distance and cluttered backgrounds are the most significant
factors contributing to the difficulty of samples. A general trend
E. Difficulty of samples in computer vision also observed in this competition is the
domination of deep learning approaches. Virtually all teams
In order to uncover which factors contribute the most to relied on Deep Neural Networks (DNN), at least in some steps
the difficulty of the satellite pose estimation task, the best of their pose estimation pipeline. However, while DNNs proved
prediction from all submissions is selected for each image of to be indispensable in solving the problem of perception, they
the test set. This ‘super pose estimator’ is used as a proxy of are still not the best choice throughout all steps of a pose
how difficult the pose estimation task is on a certain sample. estimation pipeline. Perspective-n-Point (PnP)-based keypoint
The resulting score distribution is plotted in Fig. 12 along with matching techniques that used keypoints detected by DNNs
a number of selected images. Except for a few outliers, the won the first two places. Another finding was that with the
error distribution is flat with pose errors well below 0.05. In availability of high resolution images and Graphical Processing
fact, the average orientation error and its standard deviation is Unit (GPU) memories that limit input resolution, a separate
0.34◦ ± 0.38◦ , while the average position error is 0.09 ± 0.09 localization step can bring significant improvements in pose
m.9 accuracy, as it allows for cropping the irrelevant parts of the
The general trend is that the images with black background, image.
representing the case of an under-exposed star field, are easier
Overall, the scores of the top submissions indicate that vari-
compared to the samples with Earth background. Black back-
ous DNN architectures are able to perform good pose estimation
ground makes the detection of the satellite a straightforward
of a uncooperative spacecraft, provided the servicer has access
task, given the sharp contrast of the satellite to its background.
to the target’s 3D model or 3D keypoint coordinates as designed
Having a cluttered Earth background makes the pose estimation
by the mission operators. However, the performances of the
more difficult.
same architectures on real images are relatively poor, as the real
The most challenging samples are the images with Earth
images have different statistical distributions from the synthetic
background and small target due to large inter-spacecraft
images that were used to train the DNNs. As any DNNs
distance. In this situation, the apparent size of the satellite
deployed in future space missions will undoubtedly utilize
can be comparable with features on the background image, and
synthetic images as a main source of training, future SPEC must
9 In comparison, the winning team U NI A DELAIDE achieved 0.41◦ ± 1.50◦ design the datasets and competition metrics that better reflect
orientation error and 0.13 ± 0.09 m relative position error. the significance of domain gap. Ultimately, to support debris
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 12

1 Earth background
black background
2
Minimum pose error

5 6 7 8 9

Sorted test images

img007739.jpg img008398.jpg img009630.jpg

1 2 3

img010668.jpg img011888.jpg img003930.jpg

4 5 6

img006869.jpg img004629.jpg img001729.jpg

7 8 9

Fig. 12. Top: Test images ranked by difficulty, measured as minimum pose error across all submissions. Bottom: Nine example images from different parts of
the distribution. Images are shown with scaled colors to maximize contrast.
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 13

10 1
mean
1 sigma range
Minimum pose error

10 2

10 3

5.5 10.0 15.0 20.0 25.0 28.5


Inter-spacecraft distance [m]
Fig. 13. Distribution of the minimum pose error in logarithmic scale with respect to the inter-spacecraft distance. The minimum pose error is computed across
all submissions. Mean and standard deviation are calculated over one meter wide distance bins.

removal and other representative mission scenarios, future [10] C. Beierle, “High fidelity validation of vision-based sensors and algo-
SPEC must address the issue of estimating the pose of an rithms for spaceborne navigation,” Ph.D. dissertation, Stanford University,
Department of Aeronautics & Astronautics, Mar 2019.
unknown resident space object. [11] A. Cropp and P. Palmer, “Pose estimation and relative orbit determination
of a nearby target microsatellite using passive imager,” in 5th Cranfield
Conference on Dynamics and Control of Systems and Structures in Space
ACKNOWLEDGMENT 2002, 2002, pp. 389 – 395.
[12] M. R. Leinz, C.-T. Chen, M. W. Beaven, T. P. Weismuller, D. L.
The authors would like to thank OHB Sweden for the 3D Caballero, W. B. Gaumer, P. W. Sabasteanski, P. A. Scott, and M. A.
model of the Tango spacecraft used to create the images used Lundgren, “Orbital Express Autonomous Rendezvous and Capture
in this work and for the flight images collected during the Sensor System (ARCSS) flight test results,” in Sensors and Systems for
Space Applications II, R. T. Howard and P. Motaghedi, Eds., vol. 6958,
PRISMA extended mission. International Society for Optics and Photonics. SPIE, 2008, pp. 62 –
74. [Online]. Available: https://doi.org/10.1117/12.779595
[13] S. Zhang and X. Cao, “Closed-form solution of monocular vision-based
R EFERENCES relative pose determination for RVD spacecrafts,” Aircraft Engineering
and Aerospace Technology, vol. 77, pp. 192–198, 06 2005.
[1] J. L. Forshaw, G. S. Aglietti, N. Navarathinam, H. Kadhem, T. Salmon,
[14] A. Petit, E. Marchand, and K. Kanani, “Vision-based space autonomous
A. Pisseloup, E. Joffre, T. Chabot, I. Retat, R. Axthelm, and et al.,
rendezvous: A case study,” in 2011 IEEE/RSJ International Conference
“RemoveDEBRIS: An in-orbit active debris removal demonstration
on Intelligent Robots and Systems, Sep. 2011, pp. 619–624.
mission,” Acta Astronautica, vol. 127, p. 448463, 2016.
[15] A. A. Grompone, “Vision-based 3D motion estimation for on-orbit
[2] B. Sullivan, D. Barnhart, L. Hill, P. Oppenheimer, B. L. Benedict, G. V.
proximity satellite tracking and navigation,” 2015. [Online]. Available:
Ommering, L. Chappell, J. Ratti, and P. Will, “DARPA Phoenix Payload
https://calhoun.nps.edu/handle/10945/45863
Orbital Delivery System (PODs): FedEx to GEO,” AIAA SPACE 2013
Conference and Exposition, 2013. [16] S. D’Amico, M. Benn, and J. L. Jørgensen, “Pose estimation of an
[3] B. B. Reed, R. C. Smith, B. J. Naasz, J. F. Pellegrino, and C. E. Bacon, uncooperative spacecraft from actual space imagery,” International
“The restore-l servicing mission,” Aiaa Space 2016, 2016. Journal of Space Science and Engineering, vol. 2, no. 2, p. 171, 2014.
[4] L. Pasqualetto Cassinis, R. Fonod, and E. Gill, “Review of the robustness [17] K. Kanani, A. Petit, E. Marchand, T. Chabot, and B. Gerber, “Vision based
and applicability of monocular pose estimation systems for relative navigation for debris removal missions,” Proceedings of the International
navigation with an uncooperative spacecraft,” Progress in Aerospace Astronautical Congress, IAC, vol. 4, 01 2012.
Sciences, 06 2019. [18] C. Harris and M. Stephens, “A combined corner and edge detector,” in
[5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification In Proc. of Fourth Alvey Vision Conference, 1988, pp. 147–151.
with deep convolutional neural networks,” in Advances in Neural [19] J. F. Canny, “A computational approach to edge detection,” IEEE Trans.
Information Processing Systems (NIPS), 2012, pp. 11061114,. Pattern Anal. Mach. Intell., vol. 8, no. 6, pp. 679–698, Jun. 1986.
[6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: [Online]. Available: https://doi.org/10.1109/TPAMI.1986.4767851
A large-scale hierarchical image database,” 2009 IEEE Conference on [20] D. Ballard, “Generalizing the hough transform to detect arbitrary shapes,”
Computer Vision and Pattern Recognition, pp. 248–255, 2009. Pattern Recognition, vol. 13, no. 2, p. 111122, 1981.
[7] S. Sharma and S. D’Amico, “Pose estimation for non-cooperative [21] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”
rendezvous using neural networks,” in 2019 AAS/AIAA Astrodynamics International Journal of Computer Vision, vol. 60, no. 2, p. 91110, 2004.
Specialist Conference, Ka’anapali, Maui, HI, January 13-17 2019. [22] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-
[8] S. Sharma, T. H. Park, and S. D’Amico, “Spacecraft pose estimation up robust features (surf),” Comput. Vis. Image Underst., vol.
dataset (SPEED),” Stanford Digital Repository. Available at: https://doi. 110, no. 3, pp. 346–359, Jun. 2008. [Online]. Available: http:
org/10.25740/dz692fn7184, 2019. //dx.doi.org/10.1016/j.cviu.2007.09.014
[9] S. Sharma, “Pose estimation of uncooperative spacecraft using monocular [23] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient
vision and deep learning,” Ph.D. dissertation, Stanford University, alternative to SIFT or SURF,” in Proceedings of the 2011 International
Department of Aeronautics & Astronautics, Aug 2019. Conference on Computer Vision, ser. ICCV ’11. Washington, DC, USA:
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 14

IEEE Computer Society, 2011, pp. 2564–2571. [Online]. Available: [48] P. F. Proença and Y. Gao, “Deep learning for spacecraft pose
http://dx.doi.org/10.1109/ICCV.2011.6126544 estimation from photorealistic rendering,” 2019. [Online]. Available:
[24] S. Sharma and S. D’Amico, “Reduced-dynamics pose estimation for http://arxiv.org/abs/1907.04298
non-cooperative spacecraft rendezvous using monocular vision,” in 38th [49] T. H. Park, S. Sharma, and S. D’Amico, “Towards robust learning-
AAS Guidance and Control Conference, Breckenridge, Colorado, 2017. based pose estimation of noncooperative spacecraft,” in 2019 AAS/AIAA
[25] S.-G. Kim, J. Crassidis, Y. Cheng, A. Fosbury, and J. Junkins, “Kalman Astrodynamics Specialist Conference, Portland, Maine, August 11-15
filtering for relative spacecraft attitude and position estimation,” Journal 2019.
of Guidance Control and Dynamics - J GUID CONTROL DYNAM, [50] K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang,
vol. 30, 01 2007. W. Liu, and J. Wang, “High-resolution representations for labeling pixels
[26] S. D’Amico, P. Bodin, M. Delpech, and R. Noteborn, “PRISMA,” and regions,” arXiv preprint arXiv:1904.04514, 2019.
in Distributed Space Missions for Earth System Monitoring Space [51] Y. Hu, J. Hugonot, P. Fua, and M. Salzmann, “Segmentation-driven 6d
Technology Library, M. D’Errico, Ed., 2013, vol. 31, ch. 21, pp. 599–637. object pose estimation,” in IEEE Conference on Computer Vision and
[27] S. Sharma, J. Ventura, and S. DAmico, “Robust model-based monocular Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20,
pose initialization for noncooperative spacecraft rendezvous,” Journal of 2019, 2019, pp. 3385–3394.
Spacecraft and Rockets, vol. 55, no. 6, pp. 1414–1429, 2018. [52] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm
[28] T.-Y. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, for model fitting with applications to image analysis and automated
J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft cartography,” Readings in Computer Vision, p. 726740, 1987.
COCO: Common objects in context,” in ECCV, 2014. [53] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
[29] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige,
2017.
and N. Navab, “Model based training, detection and pose estimation of
[54] ——, “YOLOv3: An incremental improvement,” CoRR, vol.
texture-less 3D objects in heavily cluttered scenes,” Computer Vision
abs/1804.02767, 2018.
ACCV 2012 Lecture Notes in Computer Science, p. 548562, 2013.
[55] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen, “Mo-
[30] S. Tulsiani and J. Malik, “Viewpoints and keypoints,” 2015 IEEE bileNetV2: Inverted residuals and linear bottlenecks,” in 2018 IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), 2015. Conference on Computer Vision and Pattern Recognition, June 2018, pp.
[31] H. Su, C. R. Qi, Y. Li, and L. J. Guibas, “Render for CNN: Viewpoint 4510–4520.
estimation in images using CNNs trained with rendered 3D model views,” [56] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
2015 IEEE International Conference on Computer Vision (ICCV), 2015. the inception architecture for computer vision,” 2016 IEEE Conference
[32] A. Kendall, M. Grimes, and R. Cipolla, “PoseNet: A convolutional on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826,
network for real-time 6-DOF camera relocalization,” 2015 IEEE Interna- 2015.
tional Conference on Computer Vision (ICCV), 2015.
[33] M. Rad and V. Lepetit, “BB8: A scalable, accurate, robust to partial
occlusion method for predicting the 3D poses of challenging objects
without using depth,” 2017 IEEE International Conference on Computer
Vision (ICCV), 2017.
[34] W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab, “SSD-6D:
Making RGB-based 3D detection and 6D pose estimation great again,”
2017 IEEE International Conference on Computer Vision (ICCV), 2017. Mate Kisantal received a B.Sc. degree in Vehicle
[35] M. Sundermeyer, Z.-C. Marton, M. Durner, M. Brucker, and R. Triebel, Engineering from the Budapest University of Technol-
“Implicit 3D orientation learning for 6D object detection from RGB ogy and Economics and an M.Sc. degree from Delft
images,” in The European Conference on Computer Vision (ECCV), University of Technology, where he did research in
September 2018. the Micro Air Vehicle Laboratory. In 2018 he joined
[36] S. Mahendran, H. Ali, and R. Vidal, “3D pose regression using the European Space Agency as a Young Graduate
convolutional neural networks,” 2017 IEEE International Conference on Trainee in Artificial Intelligence, and worked in the
Computer Vision Workshops (ICCVW), 2017. Advanced Concepts Team. His research interest is
[37] Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “PoseCNN: A in the intersection of mobile robotics, autonomy,
convolutional neural network for 6D object pose estimation in cluttered computer vision, and machine learning.
scenes,” Robotics: Science and Systems XIV, 2018.
[38] B. Tekin, S. N. Sinha, and P. Fua, “Real-time seamless single shot 6D
object pose prediction,” CVPR, 2018.
[39] Z. Zhao, G. Peng, H. Wang, H. Fang, C. Li, and C. Lu, “Estimating
6D pose from localizing designated surface keypoints,” ArXiv, vol.
abs/1812.01387, 2018.
[40] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, “PVNet: Pixel-wise
voting network for 6DoF pose estimation,” in The IEEE Conference on Sumant Sharma received the B.S. degree from the
Computer Vision and Pattern Recognition (CVPR) Oral, 2019. Georgia Institute of Technology (2013), M.S. degree
[41] V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate O(n) (2015), and Ph.D. degree (2019) from the Department
solution to the PnP problem,” International Journal of Computer Vision, of Aeronautics and Astronautics at Stanford Univer-
vol. 81, no. 2, p. 155166, 2008. sity. As a Ph.D. student in the Space Rendezvous
[42] S. Sharma, C. Beierle, and S. D’Amico, “Pose estimation for non- Laboratory at Stanford University, his research fo-
cooperative spacecraft rendezvous using convolutional neural networks,” cused on monocular computer vision algorithms
in 2018 IEEE Aerospace Conference, March 2018, pp. 1–12. to enable navigation systems for on-orbit servicing
[43] C. Beierle and S. D’Amico, “Variable-magnification optical stimulator for and rendezvous missions requiring close proximity.
training and validation of spaceborne vision-based navigation,” Journal During 2016 and 2017, he was the Co-chairman of the
of Spacecraft and Rockets, vol. 56, pp. 1–13, 02 2019. Nominations Commission of the Associated Students
of Stanford University, overseeing the appointment of Stanford students to
[44] “Vero family.” [Online]. Available: https://www.vicon.com/hardware/
university committees. From 2017 to 2018, he worked as a Systems Engineer
cameras/vero/
at NASA Ames Research Center comparing the performance of machine
[45] E. Brachmann, F. Michel, A. Krull, M. Y. Yang, S. Gumhold, and
learning-based methods against conventional feature-based methods for on-
C. Rother, “Uncertainty-driven 6D pose estimation of objects and scenes
orbit servicing applications. Dr. Sharma is currently a computer vision engineer
from a single RGB image,” 2016 IEEE Conference on Computer Vision
at Wisk Aero, a company based in Mountain View, California, working on
and Pattern Recognition (CVPR), pp. 3364–3372, 2016.
navigation algorithms for electric-powered air transportation. Dr. Sharma
[46] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image is currently a peer reviewer for the IEEE Transactions on Aerospace and
recognition,” arXiv preprint arXiv:1512.03385, 2015. Electronic Systems and IEEE Access.
[47] B. Chen, J. Cao, A. Parra, and T.-J. Chin, “Satellite pose estimation
with deep landmark regression and nonlinear pose refinement,” 2019.
[Online]. Available: http://arxiv.org/abs/1908.11542
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 15

Tae Ha Park is a Ph.D. candidate in the Space Simone D’Amico received the B.S. and M.S. de-
Rendezvous Laboratory, Stanford University. He grad- grees from Politecnico di Milano (2003) and the
uated from Harvey Mudd College with a Bachelor of Ph.D. degree from Delft University of Technology
Science degree (2017) in engineering. His research (2010). From 2003 to 2014, he was research scientist
interest is in the development of machine learning and team leader at the German Aerospace Center
techniques and GN&C algorithms for spaceborne (DLR). There, he gave key contributions to the design,
computer vision tasks, specifically on robust and development, and operations of spacecraft formation-
accurate determination of the relative position and flying and rendezvous missions such as GRACE
attitude of arbitrary resident space objects using (United States/Germany), TanDEM-X (Germany),
monocular vision. Potential applications include PRISMA (Sweden/Germany/France), and PROBA-3
space debris removal and refueling of defunct geo- (ESA). Since 2014, he has been Assistant Professor
stationary satellites with unprecedented autonomy and safety measures. of Aeronautics and Astronautics at Stanford University, Founding director of
the Space Rendezvous Laboratory (SLAB), and Satellite Advisor of the Student
Space Initiative (SSSI), Stanfords largest undergraduate organization. He has
over 150 scientific publications and 2500 google scholars citations, including
conference proceedings, peer-reviewed journal articles, and book chapters.
D’Amico’s research aims at enabling future miniature distributed space systems
for unprecedented science and exploration. His efforts lie at the intersection
of advanced astrodynamics, GN&C, and space system engineering to meet the
tight requirements posed by these novel space architectures. The most recent
mission concepts developed by Dr. D’Amico are a miniaturized distributed
occulter/telescope (mDOT) system for direct imaging of exozodiacal dust and
exoplanets and the Autonomous Nanosatellite Swarming (ANS) mission for
characterization of small celestial bodies. He is Chairman of the NASA’s
Starshade Science and Technology Working Group (TSWG) and Fellow of the
NAEs US FOE Symposium. DAmicos research is supported by NASA, NSF,
Dario Izzo graduated as a Doctor of Aeronautical AFRL, AFOSR, KACST, and Industry. He is member of the advisory board
Engineering from the University Sapienza of Rome of space startup companies and VC edge funds. He is member of the Space-
(Italy). He then took a second master in Satellite Flight Mechanics Technical Committee of the AAS, Associate Fellow of AIAA,
Platforms at the University of Cranfield in the United Associate Editor of the AIAA Journal of Guidance, Control, and Dynamics
Kingdom and completed his Ph.D. in Mathematical and the IEEE Transactions of Aerospace and Electronic Systems. Dr. DAmico
Modelling at the University Sapienza of Rome where was recipient of the Leonardo 500 Award by the Leonardo Da Vinci Society
he lectured classical mechanics and space flight and ISSNAF (2019), the Stanfords Introductory Seminar Excellence Award
mechanics. (2019), the FAI/NAAs Group Diploma of Honor (2018), the Exemplary System
Dario Izzo later joined the European Space Agency Engineering Doctoral Dissertation Award by the International Honor Society for
and became the scientific coordinator of its Advanced Systems Engineering OAA (2016), the DLRs Sabbatical/Forschungssemester
Concepts Team. He devised and managed the Global in honor of scientific achievements (2012), the DLRs Wissenschaft Preis in
Trajectory Optimization Competitions events, the ESA Summer of Code in honor of scientific achievements (2006), and the NASAs Group Achievement
Space and the Kelvins innovation and competition platform. He published Award for the Gravity Recovery and Climate Experiment, GRACE (2004).
more than 170 papers in international journals and conferences making
key contributions to the understanding of flight mechanics and spacecraft
control and pioneering techniques based on evolutionary and machine learning
approaches.
Dario Izzo received the Humies Gold Medal and led the team winning the
8th edition of the Global Trajectory Optimization Competition.

Marcus Märtens graduated from the University


of Paderborn (Germany) with a Masters degree in
computer science. He joined the European Space
Agency as a Young Graduate Trainee in artificial
intelligence where he worked on multi-objective
optimization of spacecraft trajectories. He was part
of the winning team of the 8th edition of the Global
Trajectory Optimization Competition (GTOC) and
received a HUMIES gold medal for developing
algorithms achieving human competitive results in
trajectory design. The Delft University of Technology
awarded him a Ph.D. for his thesis on information propagation in complex
networks. After his time at the network architectures and services group in
Delft (Netherlands), Marcus rejoined the European Space Agency, where he
works as a research follow in the Advanced Concepts Team. While his main
focus is on applied artificial intelligence and evolutionary optimization, Marcus
has worked together with experts from different fields and authored works
related to neuroscience, cyber-security and gaming.

You might also like