A Photogrammetry-Based Framework To - SciTePress
A Photogrammetry-Based Framework To - SciTePress
a b c
Sebastian Bullinger , Christoph Bodensteiner and Michael Arens
Department of Object Recognition, Fraunhofer IOSB, Ettlingen, Germany
Keywords: Image-based Modeling, Camera Tracking, Photogrammetry, Structure from Motion, Multi-view Stereo,
Blender.
Abstract: We propose a framework that extends Blender to exploit Structure from Motion (SfM) and Multi-View Stereo
(MVS) techniques for image-based modeling tasks such as sculpting or camera and motion tracking. Apply-
ing SfM allows us to determine camera motions without manually defining feature tracks or calibrating the
cameras used to capture the image data. With MVS we are able to automatically compute dense scene models,
which is not feasible with the built-in tools of Blender. Currently, our framework supports several state-of-the-
art SfM and MVS pipelines. The modular system design enables us to integrate further approaches without
additional effort. The framework is publicly available as an open source software package.
1 INTRODUCTION
106
Bullinger, S., Bodensteiner, C. and Arens, M.
A Photogrammetry-based Framework to Facilitate Image-based Modeling and Automatic Camera Tracking.
DOI: 10.5220/0010319801060112
In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) - Volume 1: GRAPP, pages
106-112
ISBN: 978-989-758-488-6
Copyright
c 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
A Photogrammetry-based Framework to Facilitate Image-based Modeling and Automatic Camera Tracking
Surface
Geometric Triangu- Bundle Recon-
Verification lation Adjustment struction
Figure 2: Building blocks of state-of-the-art incremental Structure from Motion and Multi-view Stereo pipelines. The input
images are part of the Sceaux Castle dataset Moulon (2012).
tion functionalities, which are not present in other ed- Meshroom (AliceVision, 2020a), Multi-View En-
itors such as Meshlab (Cignoni et al., 2008) or Cloud- vironment (Fuhrmann et al., 2014), OpenMVG
Compare (Daniel Girardeau-Montaut, 2020). (Moulon et al., 2013) & OpenMVS (Cernea, 2020)
as well as Regard3D (Hiestand, 2020). For a quan-
1.2 Related Work titative evaluation of state-of-the-art SfM and MVS
pipelines on outdoor and indoor scenes see Knapitsch
SfM is a photogrammetric technique that estimates et al. (2017), which provides a benchmark dataset us-
for a given set of (unordered) input images the corre- ing laser scans as ground truth.
sponding three-dimensional camera poses and scene While the usage of the reconstructed (textured)
structures. There are two categories of SfM ap- models is widely supported by modern modeling
proaches: incremental and global SfM. Incremental tools, the integration of camera-specific information
SfM is currently the prevalent state-of-the-art method such as intrinsic and extrinsic parameters are often-
(Schönberger and Frahm, 2016). In order to man- times neglected. There are only a few software pack-
age the problem complexity of reconstructing real- ages available that allow to import camera-specific
world scenes, incremental SfM decomposes the re- information into modeling programs. The major-
construction process into more controllable subprob- ity of these packages address specific proprietary re-
lems. The corresponding tasks can be categorized construction or modeling tools such as AliceVision
in correspondence search (including feature detec- (2020b), SideEffects (2020) or Uhlı́k (2020). The
tion, feature matching and geometric verification) and most similar tool compared to the proposed frame-
sparse reconstruction (consisting of image registra- work is presumably Attenborrow (2020), which also
tion, point triangulation, bundle adjustment and out- provides options to import SfM and MVS formats into
lier filtering). During bundle adjustment, SfM mini- Blender. However, the following capabilities of our
mize the reprojection error of the reconstructed three- framework are missing in Attenborrow (2020): vi-
dimensional points for each view. sualization of colored point clouds, representation of
MVS uses the camera poses and the sparse point source images as image planes and creation of point
cloud obtained in the SfM step to compute a dense clouds from depth maps. Further, Attenborrow (2020)
point cloud or a (textured) model reflecting the ge- supports less SfM and MVS libraries and provides
ometry of the input scene. Similarly to SfM, MVS di- less options to configure the input data.
vides the reconstruction task in multiple subproblems.
The multi-view stereo step computes a depth map for 1.3 Contribution
each registered image that potentially includes sur-
face normal vectors. Multi-view fusion fuses the depth The core contribution of this work are as follows.
maps into a unified dense reconstruction that allows (1) The proposed framework allows to leverage
to reconstruct a watertight surface model in the sur- image-based reconstructions (e.g. automatic calibra-
face reconstruction step. Fig. 2 shows an overview of tion of intrinsic camera parameters, computation of
essential SfM and MVS subtasks and their dependen- three-dimensional camera poses and reconstruction of
cies. scene structures) for different tasks in Blender such as
Currently, there are several state-of-the-art pho- sculpting or creating visual effects.
togrammetry libraries that provide full SfM and (2) We use available data structures in Blender to rep-
MVS pipelines such as Colmap (Schönberger, 2020), resent the integrated reconstruction results, which al-
107
GRAPP 2021 - 16th International Conference on Computer Graphics Theory and Applications
Table 1: Overview of photogrammetry pipelines that are supported by the proposed framework. For each pipeline the ta-
ble shows the corresponding methods to compute the different reconstruction steps including Structure from Motion (SfM),
Multi-view Stereo (MVS), surface reconstruction (Surface Rec.) and texturing. In many cases the pipelines allow to sub-
stitute specific pipeline steps using alternative implementations. This table shows the default or the recommended pipeline
configurations.
Pipeline Colmap Pipeline Meshroom
SfM Schönberger and Frahm (2016) SfM Moulon et al. (2012)
MVS Schönberger et al. (2016) MVS Hirschmuller (2005)
Surface Rec. Kazhdan and Hoppe (2013) Surface Rec. Jancosek and Pajdla (2014)
Texturing - Texturing Burt and Adelson (1983)
lows the framework to compute automatic camera an- include the following photogrammetry pipelines:
imations (including extrinsic and intrinsic camera pa- Colmap (Schönberger, 2020), Multi-View Environ-
rameters), represent the reconstructed point clouds as ment (Fuhrmann et al., 2014), OpenMVG (Moulon
particle system or attach the source images to the reg- et al., 2013) & OpenMVS (Cernea, 2020) and Mesh-
istered camera poses. Using the available data struc- room (AliceVision, 2020a) as well as VisualSfM (Wu,
tures in Blender ensures that the integrated results can 2011). Table 1 contains an overview of each pipeline
be further utilized. with the corresponding reconstruction steps. An ex-
(3) This framework provides (together with Blender’s ample reconstruction result of Colmap is shown in
built-in tools) different visualization and anima- Fig. 3a.
tion capabilities for image-based reconstructions In addition to the SfM and MVS libraries men-
that are superior to tools offered by common tioned above, the system supports the integration of
photogrammetry-specific software packages. camera poses or scene structures captured with RGB-
(4) The framework supports already many state-of- D sensors using Zhou et al. (2018) as well as point
the-art open source SfM and MVS pipelines and is clouds provided in common laser scanning data for-
(because of its modular design) easily extensible. mats.
(5) The source code of framework is publicly avail-
able1 . 2.2 Architecture
We followed a modular design approach in order
2 FRAMEWORK to simplify the extensibility of the framework. An
overview of the different components and their de-
pendencies is shown in Fig. 4. Each supported li-
2.1 Overview brary requires the implementation of a corresponding
FileHandler and ImportOperator. The FileHandler
The proposed framework allows us to import the
parses library specific file formats or directory struc-
reconstructed scene geometry represented as point
tures and returns library agnostic information of cam-
cloud or as (textured) mesh, the reconstructed cam-
eras, points and meshes. The ImportOperator may
eras (including intrinsic and extrinsic parameters),
use different classes provided by the framework such
point clouds corresponding to the depth maps of
as the CameraImporter, PointImporter and MeshIm-
the cameras and an animated camera represent-
porter to define the required import options and to im-
ing the camera motion. The supported libraries
port the reconstruction information extracted by the
1 Source
code is available at https://github.com/SBCV/ FileHandler.
Blender-Addon-Photogrammetry-Importer
108
A Photogrammetry-based Framework to Facilitate Image-based Modeling and Automatic Camera Tracking
(a) Reconstructed geometry represented as mesh with vertex (b) Reconstructed point cloud in the 3D view of Blender from
colors in the 3D view of Blender. the perspective of one of the reconstructed cameras. The
background image shows the corresponding input image used
to register the camera.
Figure 3: Reconstructed geometry in the 3D view of Blender.
Figure 4: Integration of the proposed framework in Blender - illustrated with the Colmap importer. The class
bpy.types.Operator provided by Blender allows to define custom operators that can be register with bpy.utils. In order to
support additional SfM and MVS libraries it is sufficient to implement the corresponding import operators and file handlers.
To simplify the figure only relevant classes and methods are shown.
2.3 Camera and Image Representation fers convenient capabilities to visualize and inspect
the reconstructed point clouds and meshes, which are
The supported SfM and MVS libraries use different not feasible with other photogrammetry-specific tools
camera models to represent the intrinsic camera pa- such as CloudCompare and Meshlab. Fig. 3b shows
rameters during the reconstructions process. Further, for example a comparison of the projected point cloud
the conventions describing the extrinsic camera pa- and the color information of the corresponding input
rameters are also inconsistent. We convert the differ- image.
ent formats into a unified representation that can be To further enhance the visualization, the system
directly mapped to Blender’s camera objects. provides an option to add the original input images as
In addition to the integration of the geometric separate image planes as shown in Fig. 1.
properties of the reconstructed cameras, the frame- In order to ease the usage of the reconstruction
work provides an option to add each input image as for animation and visual effect tasks, the framework
background image for the corresponding camera ob- offers an option to create an animated camera us-
ject. Viewing the scene from the perspective of a spe- ing the reconstructed camera poses as well as the
cific camera allows to assess the consistency of virtual corresponding intrinsic parameters such as the focal
objects and the corresponding source images, which length and the principal point. All parameters are
is especially useful for sculpting tasks. It also of- animated by using Blender’s built-in f-curves, which
109
GRAPP 2021 - 16th International Conference on Computer Graphics Theory and Applications
(a) Reconstruction result in Blender’s 3D view. The (b) Interpolation values of the translation and the rotation correspond-
reconstructed cameras are shown in black and the ani- ing to the animated camera in the left image. The black dots denote
mated camera in orange, respectively. the values of the reconstructed camera poses and the vertical blue line
indicates the interpolated values at the position of the camera in the
left image.
Figure 5: Example of a camera animation using 11 images of the Sceaux Castle dataset (Moulon, 2012). By interpolating the
poses of the reconstructed cameras, we obtain a smooth trajectory for the animated camera.
Figure 6: Node configuration created by the framework to define colors of the particles in the particle system. The light blue
nodes use the particle index to compute the texture coordinate with the corresponding color - the value in the divide node
represents the number of total particles. The principled BSDF node has been cut to increase the compactness of the figure.
allows to post-process the result with Blender’s ani- points correspond to image correspondences.
mation tools. The camera properties between two re- Currently, there is no Blender entity that permits
constructed camera poses are interpolated to enable to directly represent colored point clouds. Our
the creation of smooth camera trajectories. Fig. 5 framework circumvents this problem by providing
shows an example of the animated camera and the the following two point cloud representations: point
corresponding interpolated properties. The f-curves clouds represented with Blender’s particle system
use quaternions to define the camera rotations of the and point clouds visualized with OpengGL (Woo
animated camera. We normalize the quaternions rep- et al., 1999).
resenting the rotations of the reconstructed cameras The particle system allows us to represent each 3D
to avoid unexpected interpolation results caused by point with a single particle, which enables us to post-
quaternions with different signs, i.e. we ensure that process and render the reconstructed result. We define
the quaternions show consistent signs, which mini- the particle colors with Blender’s node system. The
mizes the distance between two consecutive quater- proposed framework uses the ID of each particle to
nions. determine a texture coordinate of single texture con-
taining all particle colors. The corresponding nodes
2.4 Representation of Scene Geometry are shown in Fig. 6.
In contrast to Blender’s particle system, the draw-
Photogrammetry-based reconstruction techniques ing of point clouds with OpenGL is computationally
frequently use two types of entities to represent the less expensive and enables to visualize larger point
reconstructed scene structure: point clouds and (tex- clouds. Thus, it is better suited for the visualization of
tured) meshes. While meshes provide a more holistic large point numbers such as point clouds representing
representation of the scene geometry, point clouds the depth maps of multiple input images. Fig. 7 shows
are typically more accurate - since the reconstructed an example.
110
A Photogrammetry-based Framework to Facilitate Image-based Modeling and Automatic Camera Tracking
(a) Reconstructed point cloud and depth map of the rightmost (b) Depth map from the perspective of one of the recon-
camera highlighted with orange. structed cameras. The background image shows the input im-
age used to register the camera.
Figure 7: Representation of a depth map using OpenGL. Both images show the triangulated points corresponding to the depth
map of the same camera. Using Blender’s 3D view allows us to assess the consistency of the depth map w.r.t. to the point
cloud (see Fig. 7a) as well as the consistency of the depth w.r.t. to visual cues in source image (Fig. 7b).
111
GRAPP 2021 - 16th International Conference on Computer Graphics Theory and Applications
Fuhrmann, S., Langguth, F., and Goesele, M. (2014). Mve: GameDevelopmentToolset. [Accessed October
A multi-view reconstruction environment. In Pro- 2020].
ceedings of the Eurographics Workshop on Graphics Uhlı́k, J. (2020). Agisoft Photoscan Importer for Blender.
and Cultural Heritage, GCH ’14, page 11–18, Goslar, https://github.com/uhlik/bpy. [Accessed October
DEU. Eurographics Association. 2020].
Furukawa, Y. and Ponce, J. (2010). Accurate, dense, and Ummenhofer, B. and Brox, T. (2017). Global, dense multi-
robust multi-view stereopsis. IEEE Trans. on Pattern scale reconstruction for a billion points. International
Analysis and Machine Intelligence, 32(8):1362–1376. Journal of Computer Vision, pages 1–13.
Goesele, M., Snavely, N., Curless, B., Hoppe, H., and Seitz, Waechter, M., Moehrle, N., and Goesele, M. (2014). Let
S. M. (2007). Multi-view stereo for community photo there be color! large-scale texturing of 3d reconstruc-
collections. In 2007 IEEE 11th International Confer- tions. In Fleet, D., Pajdla, T., Schiele, B., and Tuyte-
ence on Computer Vision, pages 1–8. laars, T., editors, Computer Vision – ECCV 2014,
Hiestand, R. (2020). Regard3D: A free and open source pages 836–850, Cham. Springer International Pub-
structure-from-motion program. [Accessed October lishing.
2020]. Woo, M., Neider, J., Davis, T., and Shreiner, D. (1999).
Hirschmuller, H. (2005). Accurate and efficient stereo pro- OpenGL programming guide: the official guide to
cessing by semi-global matching and mutual informa- learning OpenGL, version 1.2. Addison-Wesley
tion. In 2005 IEEE Computer Society Conference on Longman Publishing Co., Inc.
Computer Vision and Pattern Recognition (CVPR’05), Wu, C. (2011). Visualsfm: A visual structure from motion
volume 2, pages 807–814 vol. 2. system. http://ccwu.me/vsfm/.
Jancosek, M. and Pajdla, T. (2014). Exploiting visibil- Zhou, Q.-Y., Park, J., and Koltun, V. (2018). Open3D:
ity information in surface reconstruction to preserve A modern library for 3D data processing.
weakly supported surfaces. International Scholarly arXiv:1801.09847.
Research Notices, 2014:1–20.
Kazhdan, M. and Hoppe, H. (2013). Screened poisson sur-
face reconstruction. ACM Transactions on Graphics
(TOG).
Knapitsch, A., Park, J., Zhou, Q.-Y., and Koltun, V.
(2017). Tanks and temples: Benchmarking large-scale
scene reconstruction. ACM Transactions on Graphics,
36(4).
Langguth, F., Sunkavalli, K., Hadap, S., and Goesele, M.
(2016). Shading-aware multi-view stereo. In Proceed-
ings of the European Conference on Computer Vision
(ECCV).
Moulon, P. (2012). Sceaux castle dataset. https://github.
com/openMVG/ImageDataset SceauxCastle. [Ac-
cessed October 2020].
Moulon, P., Monasse, P., and Marlet, R. (2012). Adap-
tive structure from motion with a contrario model es-
timation. In Asian Conference on Computer Vision
(ACCV).
Moulon, P., Monasse, P., Marlet, R., and Others (2013).
OpenMVG. An open multiple view geometry library.
https://github.com/openMVG/openMVG/. [Accessed
October 2020].
Schönberger, J. L. (2020). COLMAP: A general-purpose
Structure-from-Motion and Multi-View Stereo
pipeline. https://github.com/colmap/colmap. [Ac-
cessed October 2020].
Schönberger, J. L. and Frahm, J.-M. (2016). Structure-
from-motion revisited. In IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR).
Schönberger, J. L., Zheng, E., Pollefeys, M., and Frahm, J.-
M. (2016). Pixelwise view selection for unstructured
multi-view stereo. In European Conference on Com-
puter Vision (ECCV).
SideEffects (2020). Game Development Toolset
for Houdini. https://github.com/sideeffects/
112