0% found this document useful (0 votes)

35 views10 pages

PhoCaL - A Multi-Modal Dataset For Category-Level Object Pose Estimation With Photometrically Challenging Objects - Toyota

PhoCaL is a new multi-modal dataset for category-level object pose estimation containing 60 household objects across 8 categories, including photometrically challenging objects like transparent and reflective objects. It provides RGB, depth, and polarization images with highly accurate 6D pose annotations obtained via a novel robotic annotation pipeline. This dataset will enable research on estimating poses of transparent and reflective objects, which current RGB-D methods struggle with. It aims to provide a benchmark for evaluating category-level pose estimation methods, especially on photometrically challenging objects.

Uploaded by

seokheehan06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views10 pages

PhoCaL - A Multi-Modal Dataset For Category-Level Object Pose Estimation With Photometrically Challenging Objects - Toyota

Uploaded by

seokheehan06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation

with Photometrically Challenging Objects

Pengyuan Wang∗1 , HyunJun Jung∗1 , Yitong Li1 , Siyuan Shen1 , Rahul Parthasarathy Srikanth1 ,
Lorenzo Garattoni2 , Sven Meier2 , Nassir Navab1 , Benjamin Busam1
∗ 1 2
Equal Contribution Technical University of Munich Toyota Motor Europe
[email protected] [email protected] [email protected]

Figure 1. PhoCaL comprises 60 high quality 3D models of household object in 8 categories with different photometric complexity.
The selected objects include challenging texture-less, occluded, symmetric, reflective and transparent objects. Our robotic-induced pose
annotation pipeline provides highly accurate 6D pose labels even for objects that are hard to capture by modern RGBD sensors. The figure
shows RGB, 3D bounding boxes and rendered Normalized Object Coordinate Space (NOCS) map for 4 example scenes.

Abstract 1. Introduction

Object pose estimation is crucial for robotic applica- Vision systems interacting with their environment need
tions and augmented reality. Beyond instance level 6D to estimate the position and orientation of objects in space,
object pose estimation methods, estimating category-level which highlights why 6D object pose estimation is an im-
pose and shape has become a promising trend. As such, a portant task for robotic applications. Even though there
new research field needs to be supported by well-designed have been great advances in the field [6, 42], instance-level
datasets. To provide a benchmark with high-quality ground 6D pose methods require pre-scanned object models and
truth annotations to the community, we introduce a mul- support limited number of objects. Category-level object
timodal dataset for category-level object pose estimation pose estimation [40] scales better to the needs of real oper-
with photometrically challenging objects termed PhoCaL. ating environments. However, photometrically challenging
PhoCaL comprises 60 high quality 3D models of household objects such as shiny, e.g. metallic, and transparent, e.g.
objects over 8 categories including highly reflective, trans- glass, objects are very common in our daily life and little
parent and symmetric objects. We developed a novel robot- work has been done to estimate their 6D poses within prac-
supported multi-modal (RGB, depth, polarisation) data ac- tical accuracy on a category-level. The difficulty arises from
quisition and annotation process. It ensures sub-millimeter two aspects: first, it is difficult to annotate 6D pose ground
accuracy of the pose for opaque textured, shiny and trans- truth for photometrically challenging objects since no tex-
parent objects, no motion blur and perfect camera synchro- ture can be used to determine key points; second, commonly
nisation. used depth sensors fail to return the correct depth infor-
To set a benchmark for our dataset, state-of-the-art mation, as structured light and stereo method often fail to
RGB-D and monocular RGB methods are evaluated on the correctly interpret reflection and refraction artefacts. As a
challenging scenes of PhoCaL. consequence, RGB-D methods [25, 40] do not work reli-
ably with photometrically challenging objects. We intro-

21222
Figure 2. Our dataset comprises 60 household objects among 8 object categories. The training and test split is depicted here.

duce PhoCaL, a class-level dataset of photometrically chal- setup, we designed and 3D printed a rig that holds multi-
lenging objects with high-quality ground-truth annotations. ple cameras, each mounted on it and carefully calibrated.
The dataset provides multi-modal data such as RGB, depth During recording, a pre-defined trajectory is repeated by
and polarization which enables investigation into object’s the robot arm. The robot arm stops when capturing images
surface reflectance properties. from all cameras, which avoids motion blur and diminished
We obtain highly accurate ground truth poses with a effects from imperfect synchronization.
novel method using a collaborative robot arm in gravity In summary, our main contributions are:
compensated mode and a calibrated mechanical tip. In or-
der to annotate the 6D pose of transparent and non-textured 1. We propose PhoCaL, a multi-modal (RGBD +
objects, a specially designed tip is mounted on the robot RGBP) dataset for category-level object pose esti-
arm. With the calibrated tip, the positions of pre-defined mation. The dataset comprises 60 high-quality 3D
points on the object surface are acquired on the real ob- models of household objects including symmetric,
ject and matched to a scan thereof. Using this method, the transparent and reflective objects in 8 categories with
object pose can be determined with an order of magnitude 24 sequences featuring occlusion, partial visibility and
more accuracy than previous methods. For transparent and clutter.
textureless objects, topographic key points are used instead
2. We introduce a new and highly accurate pose anno-
of textural ones. The points gathered in this way are then
tation method using a robotic manipulator that al-
matched to the object model in a final ICP [2] step to yield
lows for sub-millimeter precision 6D pose annotations
an accurate fit.
of photometrically challenging objects even with re-
The camera to robot end-effector transformation is flective or transparent surfaces.
needed to obtain the object poses in camera coordinates.
Typically, hand-eye calibration approaches solve this by vi-
sually estimating the marker position and optimizing for 2. Related Work & Current Challenges
the transformation between camera and end-effector. To
minimize the error propagation and obtain highly accurate Standardized datasets are used in the field of object pose
ground truth labels, we instead used the end-effector tip of and shape estimation to quantify and compare contributions
the arm in gravity-compensated mode to measure the posi- and advances in the field. These datasets generally fall in
tion of 12 points on a ChArUco [1] board. This allows us two domains: instance-level datasets, where the 3D model
to use the robot’s accurate position system to obtain both of the object is known a priori; and category-level datasets,
object poses and camera poses for image sequences. where the exact CAD model is unknown. Tab. 1 provides
Beyond photometrically challenging categories and an overview of related datasets in both domains.
high-quality annotations, multi-modal input is another high-
2.1. Instance-level 6D Object Pose Dataset
light of PhoCaL. As the active depth sensors fail on metallic
and transparent surfaces, we include an additional passive One of the earliest, most widely used publicly available
sensor modality in the form of a polarization camera. It pro- datasets for instance level pose estimation is LineMOD [19]
vides valuable information on object surfaces [22]. In our and its occlusion extension LM-Occlusion [5]. Their data

21223
Polarisation

Robotic GT
Multi-View

Transparent

Categories
Occlusion

Sequences
Reflective
Symmetry

License
Objects
Depth
RGB

Real
Dataset

FAT [38] ✓ ✓ ✓ ✓ ✓ ± 21 > 1k CC BY-NC-SA 4.0

BlenderProc [12] ✓ ✓ ✓ ✓ ✓ ± ± > 1k GNU GPL 3.0
LabelFusion [31] ✓ ✓ ✓ ✓ ± 12 138 BSD 3-Clause
Toyota Light [21] ✓ ✓ ✓ ✓ ± 21 21 MIT
YCB [8, 41] ✓ ✓ ✓ ✓ ✓ ± 21 92 MIT
Linemod [5, 19] ✓ ✓ ✓ ✓ ✓ ± 15 15 CC BY 4.0
GraspNet-1Billion [15] ✓ ✓ ✓ ✓ ✓ ± 88 190 CC BY-NC-SA 4.0
T-LESS [20] ✓ ✓ ✓ ✓ ✓ ± 30 20 CC BY 4.0
HomebrewedDB [23] ✓ ✓ ✓ ✓ ✓ ± 33 13 CC0 1.0 Universal
ITODD [14] ✓ ✓ ✓ ✓ ✓ (✓) ± 28 800 CC BY-NC-SA 4.0
StereoOBJ-1M [26] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ± 18 183 Not (yet) released
kPAM [30] ✓ ✓ ✓ ✓ ✓ 2 91 362 MIT
CAMERA25 [40] ✓ ✓ (✓) ✓ ✓ 6 42 30 MIT
REAL275 [40] ✓ ✓ ✓ ✓ 6 42 13 MIT
TOD [27] ✓ ✓ ✓ ✓ ✓ ✓ 3 20 10 CC BY 4.0
Ours (PhoCaL) ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 8 60 24 CC BY 4.0

Table 1. Overview of pose estimation datasets. The upper part shows instance-level datasets while the lower part includes category-level
setups. PhoCaL is the only dataset that includes both photometrically challenging objects with high quality (robotic) pose annotations and
all three modalities, RGB, depth, and polarisation.

was acquired using a PrimeSense RGB-D Carmine sen- between synthetic data for evaluations and real-world ap-
sor and a marker board was used to keep track of the pearances faced in the final applications.
relative sensor pose. While undoubtedly pioneering this
field, the 3D model quality is now outdated and the leader 2.2. Category-level Object Poses and Datasets
boards on these datasets have become saturated. Home-
In real-world applications, a 3D model is not always
brewedDB [23] accounts for the latter shortcoming by pro-
available, but pose information is still required. Detection
viding high quality 3D models scanned with a structured
of such objects under these conditions has classically been
light sensor. Including three models from LineMOD, they
tackled using 3D geometric primitives [3, 4, 9].
add 30 more toy, household and industrial objects. Differ-
While these methods consider outdoor scenes for which
ent illumination conditions and occlusions make the scenes
kitti [18] provides 3D bounding box annotations, they lack
more challenging. Other datasets also include household
object shape comparison and the information is often too
objects [13, 21, 34, 37] or focus on industrial parts [14, 20]
inaccurate for robotic grasping tasks. The pioneering work
with low texture for which it is also possible to manually
of NOCS [40] was the first category-level method that
design or retrieve accurate CAD models [20]. The BOP 6D
could detect object pose and shape in indoor environments.
pose benchmark [21] includes a summary of these datasets
Further investigations consider correspondence-free meth-
with standardized metrics in a common format.
ods [10] where a deep generative model learns a canoni-
While the datasets mentioned so far provide individual cal shape space from RGBD and a method to estimate pose
frames, the YCB-Video dataset [41] also includes video and shape for fully unseen objects is also proposed [32], al-
sequences of 21 household objects. While YCB uses La- beit this method requires a reference image for latent code
belFusion [31] for semi-manual frame annotation and pose generation. CPS [28] demonstrates how to estimate pose
propagation through the sequence, Garon et al. [16] lever- and metric shapes at category level, using only a monocular
age tiny markers on the object to estimate the poses in their view. The extension CPS++ [29] further utilizes synthetic
videos directly at the cost of synthetic data cleaning after- data and a domain transfer approach using self-supervised
wards. The advent of photo realistic rendering further en- refinement with a differentiable renderer from RGBD data
ables a branch of works that leverages training on purely without annotations. SGPA [11] explores shape priors to es-
synthetic data [12, 38]. Although this circumvents the cum- timate the object pose. DualPoseNet [25] leverages spheri-
bersome pose labelling process, it introduces a domain gap cal fusion for better encoding of the object information.

21224
2
3

GT Corrected 3D
4
Figure 4. Annotation quality for poses in datasets Linemod [19]
RGB D 1 (projected green silhouette, left) and YCB [8] (rendered overlay,
Figure 3. Limitations of RGBD sensors. The depth for photomet- right) together with its correction [7] (right).
rically challenging objects is difficult to measure with a commod-
ity depth sensor. The intel RealSense D515 LiDAR ToF sensor
used here is affected by reflections that lead to invalid (1) incorrect 2.4. Ground Truth Pose Annotation
(2) distance estimates. Moreover, the glassware becomes invisible Manual annotation of 6D pose is difficult and extremely
to the sensor (3) and causes noise (4). time-consuming. Therefore, most datasets rely on semi-
manual processes for ground truth annotation. The data
from a depth sensor, if available, is often used to register
We leverage the standard RGBD method NOCS and the
the 3D model and manual adjustments are applied to visu-
strong state-of-the-art RGB method CPS to set the baselines
ally refine the pose for this one frame. Relative camera mo-
on our new dataset. While task-specific datasets for gen-
tion is typically calculated using visual markers [19, 23] to
eral object detection exist for robot grasping [15, 30], meth-
propagate the pose information through a sequence of im-
ods for category-level pose estimation are typically tested
ages. The use of depth sensors for ICP-based alignment of
on NOCS [40] data. The NOCS objects comprise various
pose labels reduces labour and improves fully-manual an-
categories, but do not contain photometric challenges often
notation quality. However, depth maps from RGBD sensors
present in everyday objects such as reflectance and trans-
are erroneous or invalid for photometrically challenging ob-
parency.
jects with high reflectance and translucent or transparent
2.3. Photometric Challenges and Multimodalities surfaces [26]. An examples is shown in Fig. 3.
Ensuring high quality of pose labels over a series of im-
While texture-less objects [20] were initially challeng- ages is difficult and errors accumulate as the examples in
ing for pose estimation, transparency presents an even big- Fig. 4 show. This equally affects depth-based refinement
ger hurdle. While the problem is not new, previous meth- strategies of 6D pose pipelines [21, 24]. We propose a me-
ods have addressed this using RGB stereo without a 3D chanical measurement process using a robotic manipulator
model to identify grasping points only [36]. Rotational ob- to circumvent this issue and allow for high precision labels
ject symmetry can be leveraged by contour fitting for trans- that omits the error propagation of relative camera pose re-
parent object reconstruction [33] using template matching. trieval from images.
ClearGrasp [35] proposes a method for geometry estima-
tion of transparent objects based on RGBD. However, this 3. Dataset Acquisition Pipeline
method passes over the transparent regions from the depth
map and predicts depth from RGB in these areas instead. Our dataset features multiple object classes including
Liu et al. [27] investigate instance- and category-level pose photometrically challenging classes such as objects with re-
estimation from stereo imagery. Since their depth sensing flective surfaces or transparent material. It also provides
fails on transparent objects, they use an opaque object twin multi-modal sensor data with highly accurate 6D pose an-
as proxy to establish ground truth depth. More recently notation. This section describes our dataset acquisition
StereOBJ-1M proposed [26] a large dataset including trans- pipeline as shown in Fig. 5.
parent and translucent objects with specular reflections and
3.1. Objects Model Acquisition
symmetry. However, at the time of this writing it is not yet
available for download. To represent a cross section of common household
For 2D object detection, information from multiple or- objects, we selected eight common categories for our
thogonal sensor modalities such as polarisation (RGBP) can category-level 6D object pose dataset: bottle, box, can, cup,
help for transparent object segmentation [22]. This modal- remote, teapot, cutlery, glassware. All object models are
ity can provide information in regions were depth sensors scanned using an EinScan-SP 3D Scanner (SHINING 3D
fail. Their inherent connection with surface normals [43] Tech. Co., Ltd., Hangzhou, China). The scanner is a struc-
can also make them attractive for pose estimation of photo- tured light stereo system with a single shot accuracy of ≤
metrically challenging objects. 0.05 mm in a scanning volume of 1200×1200×1200 mm3 .

21225
Figure 5. Overview of dataset acquisition pipeline. (a): 3D models are extracted with a structured light scanner. (b): Pivot calibration
calibrates a tipping tool to robot coordinates. (c): 6D poses are annotated using the tool and manual movements of the robot. (d): The
camera trajectory is saved. (e): Dataset is recorded automatically following the planned trajectory.

Figure 6. Overview of hand-eye-calibration and its evaluation. (a): shows the marker-to-robot calibration. (b): illustrates camera-to-robot
hand-eye calibration. (c) depicts our accuracy evaluation.

The models from the first six categories are provided 3.3. Tip Calibration
as textured obj files. Since the cutlery and glassware ob-
We use a rigid, pointy metallic tip to obtain the coordi-
jects are photometrically challenging with their highly re-
nate position of selected points on the object. Tip calibra-
flective and transparent surfaces, we apply a self-vanishing
tion is therefore essential to ensure the accuracy of the sys-
3D scanning spray (AESUB Blue, Aesub, Recklinghausen,
tem. The rig attached to the robot’s end-effector consists of
Germany) to make the objects temporarily opaque for scan-
custom 3D printed mount which holds the tool-tip rigidly.
ning. We scan the object and provide an obj file without
The pivot calibration is performed as shown in Fig. 8 (left),
texture. The spray sublimes after approx. 4 h.
where the tip point is placed in a fixed position, while only
the robot end-effector position is changed. We collect data
from N such tip positions with corresponding end-effector
3.2. Scene Acquistion Setup poses, i Teb , which contain rotation i Reb and translation i tbe ,
the final translation tet of the end-effector is calculated as
For each scene, 5-8 objects are placed on the table with follows:
the random background. We use a KUKA LBR iiwa 7
†  b
R800 (KUKA Roboter GmbH, Augsburg, Germany) 7 DoF b
− 2 Reb b
 
1 Re 1 te − 2 t e
robotic arm that guarantees a positional reproducibility of  2 Reb − 3 Reb   b b
  2 te − 3 t e 
±0.1 mm. The vision system comprises a Phoenix 5.0 tet =  · (1)

..   .. 
MP Polarization camera (IMX264MZR/MYR) with Sony
 .   . 
b
IMX264MYR CMOS (Color) Polarsens (i.e. PHX050S1- n Re − 1 Reb b
n te − 1 tbe
QC) (LUCID Vision Labs, Inc., Richmond B.C., Canada)
where † denotes the pseudo-inverse. We evaluate the tip
with a Universe Compact lens with C-Mount 5MP 2/3º
calibration by calculating the variance of each tip location
6mm f/2.0 (Universe, New York, USA). As depth camera,
at the pivot point. The variance of the tip location in our
the Time-of-Flight (ToF) sensor Intel® RealSense™LiDAR
setup is ε = 0.057 mm.
L515 is used, which acquires depth images at a resolution
of 1024x768 pixels in an operating range between 25 cm
3.4. 6D Pose Annotation
and 9 m with a field-of-view of 70°x 55°and an accuracy
of 5 ± 2.5 mm at 1 m distance up to 14 ± 15.5 mm at 9 m Annotating the precise 6D pose of the objects is a chal-
distance. lenging task as mentioned in Sec. 2.4. Here, we utilize

21226
Figure 7. Example of annotation quality before and after ICP based refinement on the textureless object. (a) Initial pose of mesh overlaid
with measured surface points (red dots) shows error in initial pose (red arrow). (b) After the ICP, refined pose matches with the surface
points properly (blue arrow). (c) Shows improvement in 6D pose annotation. Rendering of the mesh with initial pose (d) and refined pose
(e) shows a significant difference in quality.

and it gives an average RMSE of 0.20 mm in translation and

0.38◦ in rotation.
It is observed that ICP improves the annotation in real
life scenario particularly on textureless objects, where it is
difficult to find exact correspondence from the mesh. An
extreme example of annotated poses before and after ICP
on the textureless objects is shown in Fig. 7.

3.5. Hand-Eye Calibration

Figure 8. Tip calibration (left) with its pivot point (red). Tip mea-
suring points of object surface (right) and its correspondence on Traditional hand-eye calibration, such as [39] requires
the object’s model mesh (blue). detection of the marker from the camera in various posi-
tions to obtain an accurate calibration result. The transfor-
mation from camera to end-effector is difficult to estimate
the robot accuracy and its reproducible encoders to an- as the marker transformation to robot base is unknown and
notate the object pose. Our annotation steps are as fol- both have to be jointly estimated. In our case, however, the
lows: first, we attach the tool tip on the robot’s end-effector marker position can be accurately measured with the robot
and measure several keypoints along with 20-30 surface tip. Considering this fact, we measure 12 selected points
points of the given object by hand guiding the end-effector on the marker board and calculate TM arker→RB (Fig. 6 (a))
while the robot is in gravity compensation mode (Fig 5 (c), to link the end-effector pose to the camera frame. From
Fig 8 (right)). Then, corresponding keypoints are manually TM arker→RB , the Thandeye is calculated as shown in Fig. 6
picked on the object model’s mesh to obtain the initial pose (b).
of the respective objects (Fig 8 (right) (blue)). Finally, ICP The overall accuracy of the entire procedure is mea-
is applied to align the dense mesh points of the object and sured as shown in Fig. 6 (c). TM arker→cam is formed by
the measured sparse surface points as the refinement step applying Thandeye and multiply Tmarker→cam of n dif-
for the initial object poses. ferent views to transform the 12 points from the marker
To evaluate the refinement performance, 25 points on a board to the robot base (Ptransf ormed n ). RMSE is cal-
specific area of the object surface are picked and uniformly culated by comparing the result to Pmeasured . We evalu-
distributed noise of ±0.2mm is added to simulate the mea- ate our hand-eye calibrations in one of our scenes on both
surement noise. We then apply a small perturbation of ran- RGBD and Polarization camera with the mentioned ap-
dom translation errors of range ±2mm in x,y,z and a ro- proach with n = 10 and obtained RMSERGBD = 0.89 mm
tation error about a random axis with an angle of up to 4 and RMSEPolarization = 0.83 mm across all the view points.
degrees to the object pose to simulate the error introduced This calibration is performed procedure for all cameras be-
by the point correspondences. Thereafter, we apply ICP before recording each scene as shown in Fig. 9.
tween the picked surface points and the perturbed mesh to
3.6. Synchronized Robot Pose Capture with Images
refine the pose. We test this pipeline with 3 selected objects
with 5 different random perturbation before applying ICP RGBD and polarization cameras are used for the data ac-
to recover the initial pose. The pose error is measured in quisition. A specially designed and 3D printed rig is used
translation and rotational distance [17] after the refinement to mount both cameras tightly on the end-effector. The tra-

21227
the dataset. With the predicted normalized object shape
from NOCS map, the depth information is used to lift 2D
detection to 3D space using ICP. Considering the artifacts
in the depth data from metallic and transparent objects in
the dataset, along with the occlusion, the test sequences are
very challenging for RGBD methods.
Similiar to NOCS, CPS first detects 2D bounding boxes.
Then lifting modules for each class transform 2D image fea-
tures to 6D pose and scales. Simultaneously the method
Figure 9. Measuring the marker points for the calibration on the also estimates the point cloud shape for the respective ob-
scene (left) and detected marker from one of the cameras (right) ject class. CPS is trained on approximately 1000 object in-
stance models for each category to learn a deep point cloud
encoding of each class. The 2D detection and lifting mod-
jectory of all joints of the robot is recorded by manually ules are trained together for 100k steps with a learning rate
moving the end-effector while the robot arm is in gravity of 1e-4, decaying to 1e-5 at 60k steps.
compensated mode. Thereafter, we record the images of
the scene by replaying the joint trajectory while stopping 4.1. Evaluation Pipeline
the robot every 5-7 joint positions to capture the images and
the robot pose (approx 10-15 fps). This ensures no mo- Our dataset consists of 24 image sequences in total with
tion blur and camera synchronization artefacts are recorded training and testing split in each sequence. In our evalua-
while reproducing the original hand-held camera trajectory. tion pipeline, the training split of the first 12 sequences are
used to train the network. To have an evaluation on both the
3.7. Evaluation of Overall Annotation Quality known and novel objects in each category, two experiments
are designed. To evaluate on seen objects firstly, the net-
We evaluate overall annotation quality of our dataset by
work is trained on the training split of the first 12 sequences
running simulated data acquisition with two measured er-
and tested on the testing split of the same sequences. To fur-
ror statistics : ICP error (Sec 3.4) and hand-eye calibration
ther evaluate the generalization ability of NOCS and CPS to
error (Sec 3.5). For both RGBD and Polarization camera,
novel objects in the same category, the same training split
setup from one of the scenes is used including the objects
of the first 12 sequences is used, but we evaluate the result
and the trajectories. The acquisition is simulated twice, with
on the testing split of the latter 12 sequences, where objects
and without the aforementioned error. In the end, RMSE er-
are mostly unseen. With this way, generalization ability of
ror is calculated pointwise in mm between the acquisitions.
the methods to novel objects in the category is emphasized,
We averaged the error per object and per each frame in the
which is a common issue in real operating environments.
trajectories.
The evaluation metric is the intersection over union (IoU)
RMSE error for RGBD camera is 0.84 mm and for po-
result with a threshold of 25% and 50%.
larization camera is 0.76 mm. Detailed description of this
procedure is attached in the supplementary material. The 4.2. Evaluation Result
annotation quality in comparison with other dataset acqui-
sition principles is shown in Tab. 2. The 3D IoU at 25% and 50% evaluations of NOCS for
the first experiment setup is shown in Tab. 3. The mean
Dataset RGBD Dataset TOD [27] StereOBJ [26] Ours average precision (mAP) for 3D IoU at 25% is 43.34%. It
3D Labeling Depth Map Multi-View Robot is observed in the experiment that even if the segmentation
Point RMSE ≥ 17mm 3.4mm 2.3mm 0.80mm
and normalized object coordinate map predictions are ac-
curate, the lifting from NOCS map to 6D space is sensitive
Table 2. Comparison of pose annotation quality for different
to artifacts in depth maps. Since the objects are highly oc-
dataset setups. The error for RGBD is exemplified with the stan-
dard deviation of the Microsoft Azure Kinect [26].
cluded in the PhoCaL dataset, and depth measurements are
inaccurate because of cutlery and glassware categories, the
method does not have a good performance on the dataset
which indicates the drawbacks of RGBD methods in these
4. Benchmarks and Experiments
photometrically challenging cases. The average precision
Both monocular (CPS) and RGB-D based (NOCS) of each category with respect to 3D IoU threshold is plotted
category-level methods are considered for the baseline eval- in Fig. 10a. Note that the results of cutlery and glassware
uation on the PhoCaL dataset. For the evaluation of NOCS, categories are among the worst three categories.
the normal object coordinate space maps are rendered for For comparison, the result of CPS is also listed in Tab.
each training image and will be published together with 3. As can be seen from the table, CPS has a higher preci-

21228
3D25 / 3D50 Bottle Box Can Cup Remote Teapot Cutlery Glassware Mean
NOCS [40] 91.17 / 0.65 16.10 / 0.01 85.44 / 23.01 51.83 / 1.48 93.26 / 86.05 0.00 / 0.00 4.89 / 0.01 4.00 / 0.06 43.34 / 13.91
CPS [28] 80.08 / 40.30 31.68 / 28.18 68.96 / 6.69 81.60 / 70.24 86.30 / 37.08 67.43 / 4.31 44.00 / 24.95 30.33 / 17.74 61.30 / 28.69

Table 3. Class-wise evaluation of 3D IoU for NOCS [40] and CPS [28] on test split of known objects.

3D25 / 3D50 Bottle Box Can Cup Remote Teapot Cutlery Glassware Mean
Experiment 1 91.17 / 0.65 16.10 / 0.01 85.44 / 23.01 51.83 / 1.48 93.26 / 86.05 0.00 / 0.00 4.89 / 0.01 4.00 / 0.06 43.3 / 13.91
Experiment 2 13.70 / 1.28 27.74 / 0.00 48.17 / 0.00 61.77 / 0.00 8.35 / 0.00 4.90 / 0.00 16.10 / 0.00 0.83 / 0.00 22.70 / 0.17

Table 4. Class-wise evaluation of 3D IoU for NOCS [40] on seen (Experiment 1) and mostly unseen (Experiment 2) objects.

(a) NOCS result in the first experiment (b) CPS result in the first experiment (c) NOCS result in the second experiment

Figure 10. Plots of average precision (AP) with respect to 3D IoU thresholds for each category.

sion for cutlery and glassware categories. Monocular meth- gles in the image sequences which is an issue the PhoCaL
ods are not affected by artifacts in depth images, which ex- shares with other robotic acquisition setups. The hand eye
plains the result from the experiment. CPS has a higher calibration of the camera plays a key role for the annotation
mAP of 61.30%, which means RGB has an advantage in quality. If the camera resolution is low, a good calibration
dealing with photometrically challenging objects. The de- result requires significantly more input images from differ-
tailed APs for each category are plotted in Fig. 10b. ent angles.
In addition, the NOCS evaluation on both experiments
are compared in table 4. The evaluation result for the sec-
ond experiment has a lower mAP for 3D IoU at 25% and 5. Conclusion
50% as expected, as most of the test objects are novel in
the second experiment. Fig. 10c plots NOCS APs in the
In this paper we introduce the PhoCaL dataset, which
second experiment. In comparison to NOCS, the CPS re-
contains photometrically challenging categories. High-
sult drops significantly in the second experiment and the 3D
quality 6D pose annotations are provided for all categories
IoU at 25% is 4.3%. The result shows that pretraining with
and multiple camera modalities, namely RGBD and RGBP.
a large amount of synthetics images is necessary for monoc-
With our manipulator-driven annotation pipeline, we reach
ular methods, to learn the correct lifting from 2D detection
pose accuracy levels that are one order of magnitude more
to 3D space without the help of depth images.
precise than previous vision-sensor-only pipelines even for
photometrically complex objects. Moreover, baselines are
4.3. Limitations
provided for future works on category-level 6D pose on our
Even though the proposed pipeline for annotating the 6D dataset by evaluating both monocular and RGB-D methods.
pose ground truth is accurate, annotating the objects with The evaluation shows the difficulty level of the dataset in
deformable surface, such as empty boxes, poses a challenge particular for objects that include reflective and transpar-
during the surface measurement step in the workflow due ent surfaces. PhoCaL therefore constitutes a challenging
to its light deformation which could deteriorate the quality dataset with accurate ground truth that can pave the way for
of both initial pose and ICP based refinement. Moreover, future pose pipelines that are applicable to more realistic
the limited workspace of the robot constrains the view an- scenarios with everyday objects.

21229
References IEEE Conference on Computer Vision and Pattern Recogni-
tion, pages 3583±3592, 2016. 3
[1] Gwon Hwan An, Siyeong Lee, Min-Woo Seo, Kugjin Yun, [14] Bertram Drost, Markus Ulrich, Paul Bergmann, Philipp
Won-Sik Cheong, and Suk-Ju Kang. Charuco board-based Hartinger, and Carsten Steger. Introducing mvtec itodd - a
omnidirectional camera calibration method. Electronics, dataset for 3d object recognition in industry. In Proceedings
7(12):421, 2018. 2 of the IEEE International Conference on Computer Vision
[2] Paul J Besl and Neil D McKay. Method for registration of Workshops, Oct 2017. 3
3-d shapes. In Sensor fusion IV: control paradigms and data [15] Hao-Shu Fang, Chenxi Wang, Minghao Gou, and Cewu Lu.
structures, volume 1611, pages 586±606. International Soci- Graspnet-1billion: A large-scale benchmark for general ob-
ety for Optics and Photonics, 1992. 2 ject grasping. In Proceedings of the IEEE Conference on
[3] Tolga Birdal, Benjamin Busam, Nassir Navab, Slobodan Ilic, Computer Vision and Pattern Recognition, pages 11444±
and Peter Sturm. A minimalist approach to type-agnostic de- 11453, 2020. 3, 4
tection of quadrics in point clouds. In Proceedings of the [16] Mathieu Garon, Denis Laurendeau, and Jean-FrancËois
IEEE Conference on Computer Vision and Pattern Recogni- Lalonde. A framework for evaluating 6-dof object trackers.
tion, pages 3530±3540, 2018. 3 In Proceedings of the European Conference on Computer Vi-
[4] Tolga Birdal, Benjamin Busam, Nassir Navab, Slobodan Ilic, sion, pages 582±597, 2018. 3
and Peter Sturm. Generic primitive detection in point clouds [17] Mathieu Garon, Denis Laurendeau, and Jean-FrancËois
using novel minimal quadric fits. IEEE transactions on pat- Lalonde. A framework for evaluating 6-DOF object trackers.
tern analysis and machine intelligence, 42(6):1333±1347, In Proceedings of the European Conference on Computer Vi-
2019. 3 sion, 2018. 6
[5] Eric Brachmann, Alexander Krull, Frank Michel, Stefan [18] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we
Gumhold, Jamie Shotton, and Carsten Rother. Learning 6d ready for autonomous driving? the kitti vision benchmark
object pose estimation using 3d object coordinates. In Pro- suite. In Proceedings of the IEEE Conference on Computer
ceedings of the European Conference on Computer Vision, Vision and Pattern Recognition, pages 3354±3361. IEEE,
pages 536±551. Springer, 2014. 2, 3 2012. 3
[6] Yannick Bukschat and Marcus Vetter. Efficientpose: An [19] Stefan Hinterstoisser, Stefan Holzer, Cedric Cagniart, Slobo-
efficient, accurate and scalable end-to-end 6d multi object dan Ilic, Kurt Konolige, Nassir Navab, and Vincent Lepetit.
pose estimation approach. arXiv preprint arXiv:2011.04307, Multimodal templates for real-time detection of texture-less
2020. 1 objects in heavily cluttered scenes. In Proceedings of the
[7] Benjamin Busam, Hyun Jun Jung, and Nassir Navab. I like IEEE International Conference on Computer Vision, pages
to move it: 6d pose estimation as an action decision process. 858±865. IEEE, 2011. 2, 3, 4
arXiv preprint arXiv:2009.12678, 2020. 4 [20] TomÂaš Hodan, Pavel Haluza, ŠtepÂan ObdržÂalek, Jiri Matas,
Manolis Lourakis, and Xenophon Zabulis. T-less: An rgb-
[8] Berk Calli, Aaron Walsman, Arjun Singh, Siddhartha Srini-
d dataset for 6d pose estimation of texture-less objects. In
vasa, Pieter Abbeel, and Aaron M Dollar. Benchmarking
2017 IEEE Winter Conference on Applications of Computer
in manipulation research: The ycb object and model set and
Vision (WACV), pages 880±888. IEEE, 2017. 3, 4
benchmarking protocols. arXiv preprint arXiv:1502.03143,
[21] Tomas Hodan, Frank Michel, Eric Brachmann, Wadim Kehl,
2015. 3, 4
Anders GlentBuch, Dirk Kraft, Bertram Drost, Joel Vidal,
[9] Peter Carr, Yaser Sheikh, and Iain Matthews. Monocular
Stephan Ihrke, Xenophon Zabulis, et al. Bop: Benchmark for
object detection using 3d geometric primitives. In Proceed-
6d object pose estimation. In Proceedings of the European
ings of the European Conference on Computer Vision, pages
Conference on Computer Vision, pages 19±34, 2018. 3, 4
864±878. Springer, 2012. 3
[22] Agastya Kalra, Vage Taamazyan, Supreeth Krishna
[10] Dengsheng Chen, Jun Li, Zheng Wang, and Kai Xu. Learn- Rao, Kartik Venkataraman, Ramesh Raskar, and Achuta
ing canonical shape space for category-level 6d object pose Kadambi. Deep polarization cues for transparent object
and size estimation. In Proceedings of the IEEE Conference segmentation. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 11973± on Computer Vision and Pattern Recognition, pages
11982, 2020. 3 8602±8611, 2020. 2, 4
[11] Kai Chen and Qi Dou. Sgpa: Structure-guided prior adap- [23] Roman Kaskman, Sergey Zakharov, Ivan Shugurov, and Slo-
tation for category-level 6d object pose estimation. In Pro- bodan Ilic. Homebreweddb: Rgb-d dataset for 6d pose esti-
ceedings of the IEEE International Conference on Computer mation of 3d objects. Proceedings of the IEEE International
Vision, pages 2773±2782, 2021. 3 Conference on Computer Vision Workshops, 2019. 3, 4
[12] Maximilian Denninger, Martin Sundermeyer, Dominik [24] Wadim Kehl, Fabian Manhardt, Federico Tombari, Slobodan
Winkelbauer, Youssef Zidan, Dmitry Olefir, Mohamad El- Ilic, and Nassir Navab. Ssd-6d: Making rgb-based 3d de-
badrawy, Ahsan Lodhi, and Harinandan Katam. Blender- tection and 6d pose estimation great again. In Proceedings
proc. arXiv preprint arXiv:1911.01911, 2019. 3 of the IEEE International Conference on Computer Vision,
[13] Andreas Doumanoglou, Rigas Kouskouridas, Sotiris Malas- pages 1521±1529, 2017. 4
siotis, and Tae-Kyun Kim. Recovering 6d object pose and [25] Jiehong Lin, Zewei Wei, Zhihao Li, Songcen Xu, Kui
predicting next-best-view in the crowd. In Proceedings of the Jia, and Yuanqing Li. Dualposenet: Category-level 6d

21230
object pose and size estimation using dual pose network [38] Jonathan Tremblay, Thang To, and Stan Birchfield. Falling
with refined learning of pose consistency. arXiv preprint things: A synthetic dataset for 3d object detection and pose
arXiv:2103.06526, 2021. 1, 3 estimation. In Proceedings of the IEEE Conference on
[26] Xingyu Liu, Shun Iwase, and Kris M Kitani. Stereobj-1m: Computer Vision and Pattern Recognition Workshops, pages
Large-scale stereo image dataset for 6d object pose estima- 2038±2041, 2018. 3
tion. In Proceedings of the IEEE International Conference [39] Roger Y Tsai, Reimar K Lenz, et al. A new technique for
on Computer Vision, pages 10870±10879, 2021. 3, 4, 7 fully autonomous and efficient 3 d robotics hand/eye cal-
[27] Xingyu Liu, Rico Jonschkowski, Anelia Angelova, and Kurt ibration. IEEE Transactions on robotics and automation,
Konolige. Keypose: Multi-view 3d labeling and keypoint 5(3):345±358, 1989. 6
estimation for transparent objects. In Proceedings of the [40] He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin,
IEEE Conference on Computer Vision and Pattern Recog- Shuran Song, and Leonidas J Guibas. Normalized object co-
nition, pages 11602±11610, 2020. 3, 4, 7 ordinate space for category-level 6d object pose and size esti-
[28] Fabian Manhardt, Manuel Nickel, Sven Meier, Luca Min- mation. In Proceedings of the IEEE Conference on Computer
ciullo, and Nassir Navab. Cps: Class-level 6d pose and Vision and Pattern Recognition, pages 2642±2651, 2019. 1,
shape estimation from monocular images. arXiv preprint 3, 4, 8
arXiv:2003.05848, 2020. 3, 8 [41] Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and
[29] Fabian Manhardt, Gu Wang, Benjamin Busam, Manuel Dieter Fox. Posecnn: A convolutional neural network for 6d
Nickel, Sven Meier, Luca Minciullo, Xiangyang Ji, and object pose estimation in cluttered scenes. Robotics: Science
Nassir Navab. Cps++: Improving class-level 6d pose and Systems, 2018. 3
and shape estimation from monocular images with self- [42] Sergey Zakharov, Ivan Shugurov, and Slobodan Ilic. Dpod:
supervised learning. arXiv preprint arXiv:2003.05848, 2020. 6d pose object detector and refiner. In Proceedings of the
3 IEEE International Conference on Computer Vision, pages
[30] Lucas Manuelli, Wei Gao, Peter Florence, and Russ Tedrake. 1941±1950, 2019. 1
kpam: Keypoint affordances for category-level robotic ma- [43] Shihao Zou, Xinxin Zuo, Yiming Qian, Sen Wang, Chi Xu,
nipulation. arXiv preprint arXiv:1903.06684, 2019. 3, 4 Minglun Gong, and Li Cheng. 3d human shape recon-
[31] Pat Marion, Peter R Florence, Lucas Manuelli, and Russ struction from a polarization image. In Computer Vision±
Tedrake. Label fusion: A pipeline for generating ground ECCV 2020: 16th European Conference, Glasgow, UK, Au-
truth labels for real rgbd data of cluttered scenes. In IEEE In- gust 23±28, 2020, Proceedings, Part XIV 16, pages 351±368.
ternational Conference on Robotics and Automation, pages Springer, 2020. 4
3235±3242. IEEE, 2018. 3
[32] Keunhong Park, Arsalan Mousavian, Yu Xiang, and Dieter
Fox. Latentfusion: End-to-end differentiable reconstruction
and rendering for unseen object pose estimation. In Proceed-
ings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 10710±10719, 2020. 3
[33] Cody J Phillips, Matthieu Lecce, and Kostas Daniilidis. See-
ing glassware: from edge detection to pose estimation and
shape recovery. In Robotics: Science and Systems, volume 3,
2016. 4
[34] Colin Rennie, Rahul Shome, Kostas E Bekris, and Alberto F
De Souza. A dataset for improved rgbd-based object de-
tection and pose estimation for warehouse pick-and-place.
IEEE Robotics and Automation Letters, 1(2):1179±1185,
2016. 3
[35] Shreeyak Sajjan, Matthew Moore, Mike Pan, Ganesh Na-
garaja, Johnny Lee, Andy Zeng, and Shuran Song. Clear
grasp: 3d shape estimation of transparent objects for manip-
ulation. In IEEE International Conference on Robotics and
Automation, pages 3634±3642. IEEE, 2020. 4
[36] Ashutosh Saxena, Justin Driemeyer, and Andrew Y Ng.
Robotic grasping of novel objects using vision. The Interna-
tional Journal of Robotics Research, 27(2):157±173, 2008.
4
[37] Alykhan Tejani, Danhang Tang, Rigas Kouskouridas, and
Tae-Kyun Kim. Latent-class hough forests for 3D object de-
tection and pose estimation. In Proceedings of the European
Conference on Computer Vision, pages 462±477. Springer,
2014. 3

21231

Phased Array - Studies
100% (1)
Phased Array - Studies
48 pages
GROUND-BASED LiDAR Rock Slope Mapping and Assessment
No ratings yet
GROUND-BASED LiDAR Rock Slope Mapping and Assessment
113 pages
Fully Autonomous Vehicle Using Raspberry Pi and Lidar
No ratings yet
Fully Autonomous Vehicle Using Raspberry Pi and Lidar
16 pages
SRS Eye Mouse
0% (1)
SRS Eye Mouse
11 pages
Technical Information Micropilot FMR50
No ratings yet
Technical Information Micropilot FMR50
88 pages
Drone Theory
No ratings yet
Drone Theory
98 pages
TE Sensors - Catalog
No ratings yet
TE Sensors - Catalog
92 pages
CVP 3D Vision System Development Mattias Johannesson
No ratings yet
CVP 3D Vision System Development Mattias Johannesson
91 pages
63 9243 Rev E VLP 16 User Manual
No ratings yet
63 9243 Rev E VLP 16 User Manual
140 pages
Camera Notes For Photogrammetry
No ratings yet
Camera Notes For Photogrammetry
61 pages
Sensors 3
No ratings yet
Sensors 3
59 pages
Slides Fireside Chat
No ratings yet
Slides Fireside Chat
70 pages
Principles of Robot Autonomy I: Robotic Sensors and Introduction To Computer Vision
No ratings yet
Principles of Robot Autonomy I: Robotic Sensors and Introduction To Computer Vision
38 pages
Real-Time Seamless Single Shot 6D Object Pose Prediction
No ratings yet
Real-Time Seamless Single Shot 6D Object Pose Prediction
16 pages
Gen6D: Generalizable Model-Free 6-Dof Object Pose Estimation From RGB Images
No ratings yet
Gen6D: Generalizable Model-Free 6-Dof Object Pose Estimation From RGB Images
32 pages
Stablepose: Learning 6D Object Poses From Geometrically Stable Patches
No ratings yet
Stablepose: Learning 6D Object Poses From Geometrically Stable Patches
10 pages
Fast 6D Pose Estimation From A Monocular Image Usi
No ratings yet
Fast 6D Pose Estimation From A Monocular Image Usi
17 pages
Archer Wireline Product Catalogue 2022 - Digital Version
No ratings yet
Archer Wireline Product Catalogue 2022 - Digital Version
26 pages
Pix2Pose: Pixel-Wise Coordinate Regression of Objects For 6D Pose Estimation
No ratings yet
Pix2Pose: Pixel-Wise Coordinate Regression of Objects For 6D Pose Estimation
17 pages
Angshuk Basu - 2K18 - EE - 027 - Anas Ayub - 2K19 - EE - 038
No ratings yet
Angshuk Basu - 2K18 - EE - 027 - Anas Ayub - 2K19 - EE - 038
17 pages
Model-Based Object Pose in 25 Lines of Code
No ratings yet
Model-Based Object Pose in 25 Lines of Code
19 pages
Highly Accurate Time-of-Flight Measurement Technique Based On Phase-Correlation For Ultrasonic Ranging
No ratings yet
Highly Accurate Time-of-Flight Measurement Technique Based On Phase-Correlation For Ultrasonic Ranging
11 pages
Deep Learning-Based Object Pose Estimation
No ratings yet
Deep Learning-Based Object Pose Estimation
27 pages
Sse2 15 p3p
No ratings yet
Sse2 15 p3p
41 pages
PACE: A Large-Scale Dataset With Pose Annotations in Cluttered Environments
No ratings yet
PACE: A Large-Scale Dataset With Pose Annotations in Cluttered Environments
18 pages
Research Paper Zebra Pose
No ratings yet
Research Paper Zebra Pose
16 pages
PACE: A Large-Scale Dataset With Pose Annotations in Cluttered Environments
No ratings yet
PACE: A Large-Scale Dataset With Pose Annotations in Cluttered Environments
18 pages
Joint Detection and Tracking
No ratings yet
Joint Detection and Tracking
7 pages
Zusc S 24 00845
No ratings yet
Zusc S 24 00845
15 pages
6DGS: 6D Pose Estimation From A Single Image and A 3D Gaussian Splatting Model
No ratings yet
6DGS: 6D Pose Estimation From A Single Image and A 3D Gaussian Splatting Model
21 pages
C R: P E R D: Ameras As AYS OSE Stimation Via AY Iffusion
No ratings yet
C R: P E R D: Ameras As AYS OSE Stimation Via AY Iffusion
22 pages
Far: Flexible, Accurate and Robust 6dof Relative Camera Pose Estimation
No ratings yet
Far: Flexible, Accurate and Robust 6dof Relative Camera Pose Estimation
22 pages
Investigating Deep Optics Model Representation in Affecting Resolved All-In-Focus Image Quality and Depth Estimation Fidelity: Supplement
No ratings yet
Investigating Deep Optics Model Representation in Affecting Resolved All-In-Focus Image Quality and Depth Estimation Fidelity: Supplement
11 pages
Review of Automated Operations in Drilling and Min
No ratings yet
Review of Automated Operations in Drilling and Min
19 pages
Geometric Loss Functions For Camera Pose Regression With Deep Learning
No ratings yet
Geometric Loss Functions For Camera Pose Regression With Deep Learning
10 pages
Deep Object Pose Estimation For Semantic Robotic Grasping of Household Objects
No ratings yet
Deep Object Pose Estimation For Semantic Robotic Grasping of Household Objects
11 pages
Tian Robot Structure Prior Guided Temporal Attention For Camera-to-Robot Pose Estimation CVPR 2023 Paper
No ratings yet
Tian Robot Structure Prior Guided Temporal Attention For Camera-to-Robot Pose Estimation CVPR 2023 Paper
10 pages
3D Bounding Box Estimation Using Deep Learning and Geometry: December 2016
No ratings yet
3D Bounding Box Estimation Using Deep Learning and Geometry: December 2016
11 pages
Geometric Loss Functions For Camera Pose Regression With Deep Learning
No ratings yet
Geometric Loss Functions For Camera Pose Regression With Deep Learning
10 pages
Measuring Air Speed With A Low-Power MEMS Ultrasonic Anemometer Via Adaptive Phase Tracking
No ratings yet
Measuring Air Speed With A Low-Power MEMS Ultrasonic Anemometer Via Adaptive Phase Tracking
10 pages
Sensor and Object Recognition Technologies For Self-Driving Cars
No ratings yet
Sensor and Object Recognition Technologies For Self-Driving Cars
9 pages
TOF Camera WhitePaper
No ratings yet
TOF Camera WhitePaper
10 pages
PoseNet - A Convolutional Network For Real-Time 6-DOF Camera Relocalization
No ratings yet
PoseNet - A Convolutional Network For Real-Time 6-DOF Camera Relocalization
9 pages
Association For Computing Machinery ACM Small Standard Format Template
No ratings yet
Association For Computing Machinery ACM Small Standard Format Template
11 pages
1 s2.0 S0141938225001064 Main
No ratings yet
1 s2.0 S0141938225001064 Main
26 pages
Lecture 4
No ratings yet
Lecture 4
16 pages
Peng PVNet Pixel-Wise Voting Network For 6DoF Pose Estimation CVPR 2019 Paper
No ratings yet
Peng PVNet Pixel-Wise Voting Network For 6DoF Pose Estimation CVPR 2019 Paper
10 pages
A Lightweight Model For Satellite Pose Estimation
No ratings yet
A Lightweight Model For Satellite Pose Estimation
12 pages
Modelling Uncertainty in Deep Learning For Camera Relocalization
No ratings yet
Modelling Uncertainty in Deep Learning For Camera Relocalization
8 pages
Robust 6D Object Pose Estimation by Learning RGB-D Features
No ratings yet
Robust 6D Object Pose Estimation by Learning RGB-D Features
7 pages
Data Sheet E-Series EP EL SSI 551306 EN
No ratings yet
Data Sheet E-Series EP EL SSI 551306 EN
7 pages
Evaluation Metric Paper
No ratings yet
Evaluation Metric Paper
18 pages
Category-Level Object Pose Estimation in Heavily Cluttered Scenes by Generalized Two-Stage Shape Reconstructor
No ratings yet
Category-Level Object Pose Estimation in Heavily Cluttered Scenes by Generalized Two-Stage Shape Reconstructor
9 pages
Journal of Intelligent Fuzzy Systems
No ratings yet
Journal of Intelligent Fuzzy Systems
9 pages
Time-of-Flight Ranging Sensor With Multi Target Detection: VL53L3CX
No ratings yet
Time-of-Flight Ranging Sensor With Multi Target Detection: VL53L3CX
5 pages
09 Shoaib
No ratings yet
09 Shoaib
7 pages
Real Time Pose Estimation
No ratings yet
Real Time Pose Estimation
9 pages
An Overviewon Li DARfor Autonomous Vehicles 2
No ratings yet
An Overviewon Li DARfor Autonomous Vehicles 2
7 pages
3D Object Reconstruction A Comprehensive View Dependent Da - 2024 - Data in Bri
No ratings yet
3D Object Reconstruction A Comprehensive View Dependent Da - 2024 - Data in Bri
7 pages
I2c-Net Using Instance-Level Neural Networks For M
No ratings yet
I2c-Net Using Instance-Level Neural Networks For M
8 pages
Lec 14
No ratings yet
Lec 14
8 pages
3D Bounding Box Estimation Using Deep Learning and Geometry
No ratings yet
3D Bounding Box Estimation Using Deep Learning and Geometry
10 pages
Real-Time 6D Object Pose Estimation On CPU: Yoshinori Konishi, Kosuke Hattori and Manabu Hashimoto
No ratings yet
Real-Time 6D Object Pose Estimation On CPU: Yoshinori Konishi, Kosuke Hattori and Manabu Hashimoto
8 pages
6D Pose Estimation For Textureless Objects On RGB Frames Using Multi-View Optimization
No ratings yet
6D Pose Estimation For Textureless Objects On RGB Frames Using Multi-View Optimization
8 pages
Hinterstoisser Iccv11
No ratings yet
Hinterstoisser Iccv11
8 pages
Ipd Dataset Article
No ratings yet
Ipd Dataset Article
11 pages
Framos Flyer Tof en Fin
No ratings yet
Framos Flyer Tof en Fin
4 pages
Study On The Time of Flight Optical Ranging by Using Direct Modulation of The Laser Diode
No ratings yet
Study On The Time of Flight Optical Ranging by Using Direct Modulation of The Laser Diode
7 pages
FocalPose Focal Length and Object Pose Estimation Via Render and Compare
No ratings yet
FocalPose Focal Length and Object Pose Estimation Via Render and Compare
18 pages
Pvnet: Pixel-Wise Voting Network For 6dof Pose Estimation
No ratings yet
Pvnet: Pixel-Wise Voting Network For 6dof Pose Estimation
10 pages
3D Generic Object Categorization, Localization and Pose Estimation
No ratings yet
3D Generic Object Categorization, Localization and Pose Estimation
8 pages
Inter Iit Tech Meet'21: Drdo Drge'S Vision Based Obstacle Avoidance Drone
No ratings yet
Inter Iit Tech Meet'21: Drdo Drge'S Vision Based Obstacle Avoidance Drone
5 pages
Keypoint-Based Category-Level Object Pose Tracking From An RGB Sequence With Uncertainty Estimation
No ratings yet
Keypoint-Based Category-Level Object Pose Tracking From An RGB Sequence With Uncertainty Estimation
7 pages
Virtual Inverse Perspective Mapping For Simultaneous Pose and Motion Estimation
No ratings yet
Virtual Inverse Perspective Mapping For Simultaneous Pose and Motion Estimation
6 pages
Single Image 3D Object Detection and Pos
No ratings yet
Single Image 3D Object Detection and Pos
8 pages
Towards Learning 3d Object Detection and 6d Pose Estimation From Synthetic Data
No ratings yet
Towards Learning 3d Object Detection and 6d Pose Estimation From Synthetic Data
4 pages
Rukhovich ImVoxelNet Image To Voxels Projection For Monocular and Multi-View General-Purpose WACV 2022 Paper
No ratings yet
Rukhovich ImVoxelNet Image To Voxels Projection For Monocular and Multi-View General-Purpose WACV 2022 Paper
10 pages
Valeport miniSVS
No ratings yet
Valeport miniSVS
1 page
Monocular Pose and Shape Reconstruction of Vehicles in UAV Imagery Using A Multi-Task CNN
No ratings yet
Monocular Pose and Shape Reconstruction of Vehicles in UAV Imagery Using A Multi-Task CNN
18 pages
QDrone 2 Data Sheet
No ratings yet
QDrone 2 Data Sheet
2 pages
Time of Flight: T T 2×Dc D C
No ratings yet
Time of Flight: T T 2×Dc D C
2 pages
Introduction 2 Laser - Scanning
No ratings yet
Introduction 2 Laser - Scanning
21 pages
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Ray Tracing Graphics: Exploring Photorealistic Rendering in Computer Vision
From Everand
Ray Tracing Graphics: Exploring Photorealistic Rendering in Computer Vision
Fouad Sabry
No ratings yet
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Volume Rendering: Exploring Visual Realism in Computer Vision
From Everand
Volume Rendering: Exploring Visual Realism in Computer Vision
Fouad Sabry
No ratings yet
Procedural Surface: Exploring Texture Generation and Analysis in Computer Vision
From Everand
Procedural Surface: Exploring Texture Generation and Analysis in Computer Vision
Fouad Sabry
No ratings yet

PhoCaL - A Multi-Modal Dataset For Category-Level Object Pose Estimation With Photometrically Challenging Objects - Toyota

Uploaded by

PhoCaL - A Multi-Modal Dataset For Category-Level Object Pose Estimation With Photometrically Challenging Objects - Toyota

Uploaded by

PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation

with Photometrically Challenging Objects

FAT [38] ✓ ✓ ✓ ✓ ✓ ± 21 > 1k CC BY-NC-SA 4.0

and it gives an average RMSE of 0.20 mm in translation and

3.5. Hand-Eye Calibration

You might also like