Martin Sundermeyer, M. Durner, E. Y. Puang, Z.-C. Marton, N. Vaskevicius, K. Arras, R. Triebel
CVPR 2020
If you find Multi-Path Encoders useful for your research, please consider citing:
@InProceedings{Sundermeyer_2020_CVPR,
author = {Sundermeyer, Martin and Durner, Maximilian and Puang, En Yen and Marton, Zoltan-Csaba and Vaskevicius, Narunas and Arras, Kai O. and Triebel, Rudolph},
title = {Multi-Path Learning for Object Pose Estimation Across Domains},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together. We learn an encoding of object views that does not only describe an implicit orientation of all objects seen during training, but can also relate views of untrained objects. Our single-encoder-multi-decoder network is trained using a technique we denote "multi-path learning": While the encoder is shared by all objects, each decoder only reconstructs views of a single object. Consequently, views of different instances do not have to be separated in the latent space and can share common features. The resulting encoder generalizes well from synthetic to real data and across various instances, categories, model types and datasets. We systematically investigate the learned encodings, their generalization, and iterative refinement strategies on the ModelNet40 and T-LESS dataset. Despite training jointly on multiple objects, our 6D Object Detection pipeline achieves state-of-the-art results on T-LESS at much lower runtimes than competing approaches.
Nvidia GPU with >4GB memory
RAM >= 16GB
Multiple GPUs with >=8GB memory RAM >= 64GB
Linux, Python 2.7 / Python 3
GLFW for OpenGL:
sudo apt-get install libglfw3-dev libglfw3
Assimp:
sudo apt-get install libassimp-dev
Tensorflow >=1.13
OpenCV >= 3.1
pip install --user --pre --upgrade PyOpenGL PyOpenGL_accelerate
pip install --user cython
pip install --user cyglfw3
pip install --user pyassimp==3.3
pip install --user imgaug
pip install --user progressbar
Please note that we use the GLFW context as default which does not support headless rendering. To allow for both, onscreen rendering & headless rendering on a remote server, set the context to EGL:
export PYOPENGL_PLATFORM='egl'
In order to make the EGL context work, you might need to change PyOpenGL like here
The code now also supports TF 2.6 with python 3. Instead of the pip installs above, you can also use the provided conda environment.
conda env create -f aae_py37_tf26.yml
In the activated environment proceed with the preparatory steps.
1. Pip installation
pip install --user .
2. Set Workspace path, consider to put this into your bash profile, will always be required
export AE_WORKSPACE_PATH=/path/to/autoencoder_ws
3. Create Workspace, Init Workspace (if installed locally, make sure .local/bin/ is in your PATH)
mkdir $AE_WORKSPACE_PATH
cd $AE_WORKSPACE_PATH
ae_init_workspace
1. Create the training config file. Insert the paths to your 3D models and background images and select the number of GPUs and batchsize per GPU.
mkdir $AE_WORKSPACE_PATH/cfg/exp_group
cp $AE_WORKSPACE_PATH/cfg/my_mpencoder.cfg $AE_WORKSPACE_PATH/cfg/exp_group/my_mpencoder.cfg
gedit $AE_WORKSPACE_PATH/cfg/exp_group/my_mpencoder.cfg
2. Generate and check training data. The object views should be strongly augmented but identifiable. Middle part shows reconstruction.
(Press ESC to close the window.)
ae_train exp_group/my_mpencoder -d
3. Train the model (See the Headless Rendering section if you want to train directly on a server without display)
ae_train exp_group/my_mpencoder
$AE_WORKSPACE_PATH/experiments/exp_group/my_mpencoder/train_figures
Updates during training. Middle part should start showing reconstructions of the input object (if all black, set TARGET_BG_COLOR: [0,255,0]
in training config under [Dataset])
4. For trained and untrained objects, create codebooks with
ae_embed_multi exp_group/my_mpencoder --model_path '/path/to/ply_or_off/file'
For the evaluation you will also need https://github.com/thodan/sixd_toolkit + our extensions, see sixd_toolkit_extension/help.txt
Here is an MP-Encoder model trained on the first 18 objects of the T-LESS dataset with codebooks of all 30 objects (paper results):
Extract it to $AE_WORKSPACE_PATH/experiments/exp_group/obj1_18_v2
Set gt_masks = True
, estimate_bbs=False
and estimate_masks=False
in the evaluation config. Set external
to the path with the gt masks in npy format. You can download the T-LESS gt masks here:
To evaluate a specific object and visualize predictions execute
ae_eval exp_group/my_mpencoder name_of_evaluation --eval_cfg eval_group/eval_template.cfg
e.g.
ae_eval exp_group/obj1_18_v2 test_eval --eval_cfg eval_template.cfg --model_path '/path/to/obj_05.ply'
Set gt_masks = False
, estimate_bbs=True
and estimate_masks=True
in the evaluation config. Set external
to the path with the predicted masks in npy format. You can download our T-LESS MaskRCNN predictions here:
To evaluate a specific object and visualize predictions execute
ae_eval exp_group/my_mpencoder name_of_evaluation --eval_cfg eval_group/eval_template.cfg
e.g.
ae_eval multi_object/obj1_18_v2 test_eval --eval_cfg eval_template.cfg --model_path '/path/to/obj_05.ply'
We trained a MaskRCNN on the T-LESS training set pasted randomly on COCO images using https://github.com/facebookresearch/maskrcnn-benchmark
Here is an MP-Encoder model trained on 80 objects from BOP datasets with codebooks of all 108 objects (BOP Challenge 2020 results):
Extract it to $AE_WORKSPACE_PATH/experiments/multi_object/bop_except_itodd
Also get precomputed MaskRCNN predictions for all BOP datasets:
Open the bop20 evaluation configs, e.g. auto_pose/ae/cfg_m3vision/m3_config_lmo_mp.cfg
, and point the path_to_masks
parameter to the downloaded maskrcnn predictions.
You can visualize (-vis option) and reproduce BOP results by running:
python auto_pose/m3_interface/compute_bop_results_m3.py auto_pose/ae/cfg_m3vision/m3_config_lmo_mp.cfg
--eval_name test
--dataset_name=lmo
--datasets_path=/path/to/bop/datasets
--result_folder /folder/to/results
-vis
Note: You will need the bop_toolkit. I created a package bop_toolkit_lib
from it, but you can also just add the required files to sys.path()
After creating the mp_encoder codebooks, adapt the parameters in an m3 config file auto_pose/ae/cfg_m3vision/m3_template.cfg
.
auto_pose/m3_interface/test_m3.py
shows an example how to use the API for 6-DoF pose estimation. Insert your own detector / bounding boxes and then run
python auto_pose/m3_interface/test_m3.py --m3_config_path=/path/to/cfg_m3vision/m3_template.cfg
--img_path=/path/to/an/img.png
-vis
- Document ModelNet Evaluation