Python Photogrammetry Toolbox
Python Photogrammetry Toolbox
Python Photogrammetry Toolbox
net/publication/279257923
CITATIONS READS
17 2,981
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Pierre Moulon on 03 September 2015.
Abstract
The modern techniques of Structure from Motion (SfM) and Image-Based Modelling (IBM) open new
perspectives in the field of archaeological documentation, providing a simple and accurate way to record three-
dimensional data. In the last edition of the workshop, the presentation “Computer Vision and Structure From
Motion, new methodologies in archaeological three-dimensional documentation. An open source approach.
” showed the advantages of this new methodology (low cost, portability, versatility …), but it also identified
some problems: the use of the closed feature detector SIFT source code and the necessity of a simplification of
the workflow.
The software Python Photogrammetry Toolbox (PPT) is a possible solution to solve these problems. It is
composed of python scripts that automate the different steps of the workflow. The entire process is reduced in
two commands, calibration and dense reconstruction. The user can run it from a graphical interface or from
terminal command. Calibration is performed with Bundler while dense reconstruction is done through
CMVS/PMVS. Despite the automation, the user can control the final result choosing two initial parameters: the
image size and the feature detector. Acting on the first parameter determines a reduction of the computation time
and a decreasing density of the point cloud. Acting on the feature detector influences the final result: PPT can
work both with SIFT (patent of the University of British Columbia - freely usable only for research purpose) and
with VLFEAT (released under GPL v.2 license). The use of VLFEAT ensures a more accurate result, though it
increases the time of calculation.
Python Photogrammetry Toolbox, released under GPL v.3 license, is a classical example of FLOSS
project in which instruments and knowledge are shared. The community works for the development of the
software, sharing code modification, feed-backs and bug-checking.
1. Introduction
3D Digital copy can be done by various technology, laser (ground), lidar (aerial),
structured light, photogrammetry... They have their pros and cons. Laser and Lidar are
accurate (millimeter precision) but expensive, even in rental agency, and their use requires
formation. Photogrammetry is more and more accessible with the recent progress in
electronics that make compact digital camera cheaper but less precise (centimeter precision).
Photogrammetry with consumer camera does not reach the same performance as the laser but
is accessible to anybody. Today Computer Vision algorithms are mature to be used by non
technical users. It’s a very active research domain and a lot of progress have been done in the
last decade. Such progress are visible with web-service like the Microsoft Photosynth
1
Project1.
Our objective consist in providing a tool-chain in order to make 3D digital copy easy.
This tool-chain should be Free, Open-source and Cross-Platform to be accessible without
constraint. Our pipeline draws largely from existing solution that have proven to be
functional and adequate.
2. Related Work
2
data (pictures).
Thanks to the emergence of OpenSource frameworks (Bundler 12, CMVS/PMVS13)
that perform multi-view calibration and dense 3D point cloud computation, we aim to
develop a free and easy to use pipeline in order to make 3D digital copy easy. We provide
the user a self-contained solution that gives him control on the whole data flow. As pictures
are the property of the user we chose a user-side pipeline. The main drawback of our
approach is that computation speed depends from user's computer. It could have some
drawbacks for large scenes or large images but a compromise between performance and
quality can be made by reducing image size dynamically in the toolbox.
Image matching identifies pictures that can be used to compute the relative orientation
of 2 camera and thus to calibrate a network of images. This process of image matching is
performed in 3 steps:
Once the image matching between all possible pair is performed, a geometric graph is
built (fig. 2). An edge is added if a geometric connection exists between two pictures.
Camera pose estimation finds the camera positions by solving the relative pose
estimation problem. Relative pose estimation consists in estimating a rigid motion between
two cameras, a rotation R and a translation T (fig. 3). This relative geometry between two
12 SNAVELY 2008
13 FURUKAWA 2010
3
views is faithfully described by an “Essential” matrix 14.
This Essential 3×3 matrix relates corresponding points in stereo images assuming that
the cameras satisfy the pinhole camera model. This E matrix could be computed from 8
points with a linear method or with 5 points 15. The 5 points method is preferred because it is
the minimal case and it allows to add more constraint on the estimated matrix and so provide
more accurate results. The image matching step is thus crucial: the more common points we
get between pictures we will get, the more images we can estimate 3D positions accurately.
The position of a camera can also be computed from correspondences between 3D
points and corresponding projections in the image plane. This 3D-2D correspondence
problem is known as Resection (fig. 4). It consists in estimating Pi (rotation, translation and
internal parameters of the camera) with a ray constraint geometry. It finds the Pi
configuration that minimize the re-projection errors between the rays passing through optical
camera center to 3D points and the 2d image plane coordinates. Once two cameras are
related with an Essential matrix and 3D points X are build, we can add incrementally new
camera to the scene by using successive resection.
Based on those computations (Essential, Resection) we can perform Incremental
Structure from Motion. It’s the algorithm that is implemented in the 3D calibration software
we use (Bundler).
Input:
• image network of geometrically coherent pictures,
• internal camera parameters (Focal length, CCD sensor size).
Output:
• camera position,
• sparse point cloud.
Bundler suffers from some defaults. It's code is not very clean and sometimes the 3D
reconstruction fails due to drift error. But it have the advantage of being nearly the only
Open-Source viable solution over internet with such performance. Recent community
initiative like the libmv project 17 is a prelude to a cleaner implementation of “Bundler
clones”. This bricks could be replaced in the tool-chain in a near future.
14 http://en.wikipedia.org/wiki/Essential_matrix
15 NISTER 2004
16 http://en.wikipedia.org/wiki/Bundle_adjustment
17 http://code.google.com/p/libmv/
4
Multiple View Stereovision (MVS) consists in mapping image pixel to 3D points
fcposes, images point cloud. This dense representation can be a dense point cloud or a dense
mesh. In order to find a 3D position for each corresponding pixel of the image sequence,
MVS uses multiple image to reduce ambiguities and estimate accurate content (fig. 6).
One of the interesting state-of-the-art method is the Patch approach called PMVS
(Patch MultiView Stereo) 18. It is based on a seed growing strategy. It finds corresponding
patches between images and locally expand the region by an iterative expansion and filtering
steps in order to remove bad correspondences (fig. 8). Such an approach finds additional
correspondences that were rejected or not found at the image matching phase step.
Figure 9 shows benefit of using PMVS (empty 3D zones correspond to poorly
textured or too ambiguous image zones):
PPT provides a tool-chain that is easier to maintain and use than the previous
approaches. It defines a clear pipeline to handle 3D reconstruction. This pipeline is designed
as python module with a High Level API in order to be extensible in the future. It results in a
3-level application: Interface, Python modules and Software.
A graphic wrapper has been developed to hide the command-line calls that are
required to use the chain through python modules. It provides a 2-step reconstruction
workflow.
The multi-level application makes maintenance easier. Each bottom module can be
updated as long it respects the designed High Level API. It makes the interface easily
extensible. For example the python wrapper use a design pattern interface in order to have
various feature detection/description algorithm for the image matching step (the user can use
18 FURUKAWA 2010
19 Source code is accessible from http://code.google.com/p/osm-bundler/
20 http://www.python.org/
21 http://www.cmake.org/
22 https://github.com/TheFrenchLeaf
5
the David Lowe SIFT23 or the Open-source implementation VLFEAT24).
Data workflow is organized in a temp directory created at the beginning of the
process. All the required data to process the 3D reconstruction is located in this directory.
Data is updated by the different element of the tool-chain and showed at the end to the user
via a directory pop-up. The main workflow is illustrated in figure 10. It’s interesting to take a
closer look to the 2-step process workflow (RunBundler and RunCMVS) to better see the job
of the python scripts.
RunBundler (fig. 11) performs the camera calibration step. It computes the 3D
camera pose from a set of image with corresponding “camera model”/”CCD width size”
embedded in an Sqlite database. In figure 11, orange coloured items (bottom squares) are the
created files. We recognize image matching tools (sift, matchFull) and the 3D pose
estimation software (Bundler).
RunCMVS (fig. 12) takes as input the images collection, cameras poses and perform
the dense 3D point cloud computation. Data conversion from Bundler format to
CMVS/PMVS format is done by using Bundle2PMVS and RadialUndistort. Dense
computation is done by PMVS as well as CMVS, that is an optional process to divide the
input scene in many smaller instance that make the process of dense reconstruction faster.
PPT-Gui (fig. 7) is the graphical interface to interact easily with the photogrammetry
toolbox25. The Gui part is powered by PyQt4 26, a multi-platform gui manager. The
interface is designed in two different parts: a main window composed by numbered panels
which allows the user to understand the steps to perform, and a terminal window in which
the process is running. The GUI is deliberately simple and it is build for people who are not
familiar with command-line scripts. The four panels lead the user to the end of the process
through only two steps: Run Bundler (panel 1) and Run CMVS\PMVS (panel 2). Running
CMVS before PMVS is highly recommended, but not strictly necessary: there is also the
possibility to use directly PMVS (panel 3). Panel 4 provides a fast solution to integrate the
SQL database with the CCD width (mm) of the camera, without using external software.
5. Application
Archaeological field activity is mainly a working process which ends, in most cases,
with the complete destruction of the site. Usually a ground layer is excavated to investigate
the underlying level. In the lack of particular expensive equipment (laserscanner, calibrated
camera) or software (photogrammetric applications), field documentation is composed by
pictures (digital or films), manual drawings, total station measurements and bi-dimensional
photo-mosaics. At best all the data are connected together inside a Geographical Information
System (GIS).
The last years' progresses of Computer Vision open new perspectives, giving to
everybody the possibility to record three-dimensional data. The benefits of this technique are
different:
23 http://www.cs.ubc.ca/~lowe/keypoints/
24 http://www.vlfeat.org/
25 Source code is accessible from https://github.com/archeos/ppt-gui/
26 http://wiki.python.org/moin/PyQt4
6
• the flexibility of this technique facilitates the documentation of a wide range of
situations.
The next chapters introduce some examples of application in different scales: from
macro (layers, structure) to micro (finds).
5.1. Layers
Archaeological finds can be documented “in situ”, taking pictures moving around the
object (fig. 17), or in laboratory using a turntable and a black background (fig. 18). In this
last case the position of the camera during data acquisition was fixed, but will be split in
more poses during data processing (fig. 19). Good results were reached taking a picture each
10 degrees, 36 photos for a complete revolution of the object. It is possible to use macros to
obtain model of a small artefact.
One the most interesting approaches of SfM is the ability to extract three-dimensional
information from old or even historical photographs taken by amateurs for other purposes.
The critical point of this application is to reach the minimal number of images needed to start
the reconstruction process (3). It is much easier to find an appropriate photographic
documentation since digital cameras have become a widespread phenomenon (figg. 20-21).
If a monument in the present is not longer in his original conservation status or even
completely destroyed (e.g. the Banyan Buddhas), Computer Vision is a valid method to
recover the original morphology.
7
6. Conclusion
Acknowledgements
This work have been made possible due to many individual Open-Source initiative.
We thanks particularly Noah Snavely for Bundler sources, Yasutaka Furukawa for
CMVS/PMVS sources and Vladimir Elistratov for the osm-bundler initiative.
Bibliography
AGARWAL 2009
S. Agarwal - N. Snavely - I. Simon - S. M. Seitz - R. Szeliski, Building Rome in a day, in ICCV 2009, 72-79.
BAY 2008
H. Bay – A. Ess - T. Tuytelaars – L. Van Gool, SURF: Speeded Up Robust Features, in CVIU 2008, 346-359.
BEZZI 2006
A. Bezzi – L. Bezzi – D. Francisci – R. Gietl, L'utilizzo di voxel in campo archeologico, in Geomatic Workbooks,
6, 2006.
FRAHM 2010
J. -M. Frahm – P. Georgel - D. Gallup – T. Johnson – R. Raguram – C. Wu – Y.-H. Jen – E. Dunn - B. Clipp - S.
27 Bezzi 2006.
8
Lazebnik, Building Rome on a Cloudless Day, in ECCV 2010, 368-381.
FURUKAWA 2010
Y. Furukawa – B. Curless – S. M. Seitz – R. Szeliski. Towards Internet-scale multi-view stereo, in CVPR 2010,
1434-1441.
LOWE 2004
D. G. Lowe, Distinctive image features from scale-invariant keypoints, in IJCV 2004, 91-110.
NISTER 2004
D. Nister, An Efficient Solution to the Five-Point Relative Pose Problem, in IEEE Trans. Pattern Anal. Mach.
Intell. 2004, 756-777.
SNAVELY 2008
N. Snavely – S. M. Seitz – R. Szeliski. Modeling the World from Internet Photo Collections, in IJCV 2008, 189-
210.
Figures
9
archeologici della Provincia Autonoma di Trento – N. Pisu).
Affiliation
10
11
12