Sensors 23 06456 v2
Sensors 23 06456 v2
Article
MoReLab: A Software for User-Assisted 3D Reconstruction
Arslan Siddique 1,2 , Francesco Banterle 2, * , Massimiliano Corsini 2 , Paolo Cignoni 2 ,
Daniel Sommerville 3 and Chris Joffe 3
1 Department of Computer Science, Pisa University, Largo Bruno Pontecorvo 3, 56127 Pisa, Italy;
[email protected]
2 Visual Computing Laboratory, ISTI-CNR, Via G. Moruzzi 1, 56124 Pisa, Italy;
[email protected] (M.C.); [email protected] (P.C.)
3 EPRI, 3420 Hillview Avenue, Palo Alto, CA 94304, USA; [email protected] (D.S.); [email protected] (C.J.)
* Correspondence: [email protected]
SfM has become a popular choice to create 3D models due to its low-cost nature and
simplicity. Structure from Motion is a very well-studied research problem. In early research
works, Pollefeys et al. [3] developed a complete system to build a sparse 3D model of
the scene from uncalibrated image sequences captured using a hand-held camera. At the
time of writing, there is a plethora of choices for SfM software packages, each with its
unique features and capabilities. Some are open-source software, such as COLMAP [4],
MicMac [5], OpenMVS [6], and so on, while some others are commercial software packages,
such as Metashape (https://www.agisoft.com (accessed on 23 May 2023)), RealityCapture
(https://www.capturingreality.com (accessed on 23 May 2023)), etc. They rely on automatic
keypoint detection and matching algorithms to estimate 3D structures. The input to such
an SfM software is only a collection of digital photographs, generally captured by the same
camera. However, these fully automatic tools usually require suitable lighting conditions
and high-quality photographs, to generate high-quality 3D models. These conditions
are very difficult to be fulfilled in industrial environments because there may be low
lighting (which exacerbates blurring) and utility companies may have legacy video cameras
capturing videos at low resolution. These legacy cameras are meant for plants’ visual
inspection and enduring chemical, temperature, and radiation stresses.
The mentioned issues may become more severe in video-based SfM because video
frames have motion blur and are aggressively compressed, leading to strong compression
artifacts (e.g., ringing, blocking, etc.). Most modern cameras capture videos at 30 fps, so a
few minutes of video produces a high number of frames, e.g., 10 min of footage is already
18,000 frames. Such a high number of frames not only increase computational time signifi-
cantly but also give low-quality 3D output due to insufficient camera motion in consecutive
frames. If we pass such featureless images (e.g., see Figure 1) as inputs to an SfM software,
the number of accurately detected features and correspondences will be very low, leading to
a low-quality 3D output. In this context, we have developed Movie Reconstruction Labora-
tory (MoReLab) (https://github.com/cnr-isti-vclab/MoReLab (accessed on 23 May 2023)),
which is a software tool to perform user-assisted reconstruction on uncalibrated camera
videos. MoReLab will address the problem of SfM in the case of featureless and poor-quality
videos by exploiting the user indications about the structure to be reconstructed. A small
amount of manual assistance can produce accurate models also in these difficult settings.
User-assisted 3D reconstruction can significantly decrease the computational burden and
also reduce the number of input images required for 3D reconstruction.
In contrast to automatic feature detection and matching-based SfM systems, the main
contribution of MoReLab is a user-friendly interactive way that allows the user to provide
topology prior to reconstruction. This modification allows MoReLab to achieve better
results in featureless videos by leveraging the user’s knowledge of visibility and under-
standing of the video across frames. Once the user has added features and correspondences
manually on 2D images, a bundle adjustment algorithm [7] is utilized to estimate camera
poses and a sparse 3D point cloud corresponding to these features. MoReLab achieves
accurate sparse 3D points estimation by adding features on as few as two or three images.
The estimated 3D point cloud is overlaid on manually added 2D feature points to give a
visual indication of the accuracy of estimated 3D points. Then, MoReLab provides several
primitives such as rectangles, cylinders, curved cylinders, etc., to model parts of the scene.
Based on a visual understanding of the shape of the desired object, the user selects the
appropriate primitive and marks vertices or feature points to define it in a specific location.
This approach gives control to the user to extract specific shapes and objects in the scene. By
exploiting inputs from the user at several stages, it is possible to obtain 3D reconstruction
even from poor-quality videos. Additionally, the overall computational burden with regard
to a fully automatic pipeline is significantly reduced.
Sensors 2023, 23, 6456 3 of 25
Figure 1. Examples of frames from videos captured in industrial environments. These videos are not
suitable for automatic SfM tools due to issues such as low resolution, aggressive compression, strong
and moving directional lighting (e.g., a torchlight mounted on the camera), motion blur, featureless
surfaces, liquid turbulence, low lighting, etc.
2. Related Work
There have been several research works in the field of user-assisted reconstruction
from unordered and multi-view photographs. Early research works include VideoTrace [8],
which is an interface to generate realistic 3D models from video. Initially, automatic feature
detection-based SfM is applied to video frames, and a sparse 3D point cloud is overlaid
on the video frame. Then, the user traces out the desired boundary lines, and a closed set
of line segments generates an object face. Sinha et al. [9] modeled architectures using a
combination of piecewise planar 3D models. Their system also computes sparse 3D data in
such a way that lines are extracted, and vanishing points are estimated in the scene as well.
After this automatic preprocessing, the user draws outlines on 2D photographs. Piecewise
planar 3D models are estimated by combining user-provided 2D outlines and automatically
computed sparse 3D points. A few such user interactions can create a realistic 3D model of
the scene quickly. Hu et al. [10] developed an interface for creating accurate 3D models
of complex mechanical objects and equipment. First, sparse 3D points are estimated from
multi-view images and are overlaid on 2D images. Second, stroke-based sweep modeling
creates 3D parts, which are also overlaid on the image. Third, the motion structure of the
equipment is recovered. For this purpose, a video clip recording of the working mechanism
of the equipment is provided, and a stochastic optimization algorithm recovers motion
parameters. Rasmuson et al. [11] employ COLMAP [4] as a preprocessing stage to calibrate
images. Their interface allows users to mark image points and place quads on top of images.
The complete 3D model is obtained by applying global optimization on all quad patches.
By exploiting user-provided information about topology and visibility, they are able to
model complex objects as a combination of a large number of quads.
Some researchers developed interfaces where users can paint desired foreground re-
gions using brush strokes. Such an interface was developed by Habbecke and Kobbelt [12].
Their interface consists of a 2D image viewer and a 3D object viewer. The user paints the
2D image in a 2D image viewer with the help of a stroke. The system computes an optimal
mesh corresponding to the user-painted region of input images. During the modeling ses-
sion, the system incrementally continues to build 3D surface patches and guide the surface
reconstruction algorithm. Similarly, in the interface developed by Baldacci et al. [13], the
user indicates foreground and background regions with different brush strokes. Their inter-
face allows the user to provide localized hints about the curvature of a surface. These hints
are utilized as constraints for the reconstruction of smooth surfaces from multiple views.
Doron et al. [14] require stroke-based user annotations on calibrated images, to guide
multi-view stereo algorithms. These annotations are added into a variational optimization
framework in the form of smoothness, discontinuity, and depth ordering constraints. They
Sensors 2023, 23, 6456 4 of 25
show that their user-directed multi-view stereo algorithm improves the accuracy of the
reconstructed depth map in challenging situations.
Another direction in which user interfaces need to be developed is single-view re-
construction. Single-view reconstruction is complicated without any prior knowledge or
manual assistance because epipolar cannot be established. Töppe et al. [15] introduced
convex shape optimization to minimize weighted surface area for a fixed user-specified
volume in single-view 3D reconstruction. Their method relies on implicit surface represen-
tation to generate high-quality 3D models by utilizing a few user-provided strokes on the
image. 3-Sweep [16] is an interactive and easy-to-use tool for extracting 3D models from a
single photo. When a photo is loaded into the tool, it estimates the boundary contour. Once
the boundary contour is defined, the user selects the model shape and creates an outline of
the desired object using three painting brush strokes, one in each dimension of the image.
By applying the foreground texture segmentation, the interface quickly creates an editable
3D mesh object which can be scaled, rotated, or translated.
Recently, researchers have made significant progress in the area of 3D reconstruction
using deep learning approaches. The breakthrough work by Mildenahall et al. [17] intro-
duced NeRF, which synthesizes novel views of a scene using a small set of input views.
A NeRF is a fully connected deep neural network whose input is a single 5D coordinate
(spatial location ( x, y, z) and viewing direction (θ, φ)), and output is emitted radiance and
volume density. To the best of our knowledge, a NeRF-like method that tackles at the same
time all conditions of low-quality videos (blurred frames, low resolution, turbulence caused
by liquids, etc.) have not been presented yet [18]. A GAN-based work, Pi-GAN [19], is a
promising generative model-based architecture for 3D-aware image synthesis. However,
their method has the main focus on faces and cars, so to be applicable in our context, there
is the need to build a specific dataset for re-training (e.g., a dataset of industrial equipment,
3D man-made objects, and so on). Tu et al. [20] presented a self-supervised reconstruction
model to estimate texture, shape, pose, and camera viewpoint using a single RGB input
and a trainable 2D keypoint estimator. Although this method may be seminal for more
general 3D reconstructions, the current work is currently focused on human hands.
Existing research works pose several challenges for low-quality industrial videos,
which are typically captured by industrial utility companies. First, most works [8–11,14]
in user-assisted reconstruction, still require high-quality images because they are using
automatic SfM pipelines as their initial step. Our focus is on low-quality videos in industrial
scenarios, where SfM generates an extremely sparse point cloud, making subsequent 3D
operations extremely difficult. Second, these research works lack sufficient functionalities
to be able to model a variety of industrial equipment. Third, these research works are not
available as open-source, limiting their usage for non-technical users. Hence, our research
contributions are as follows:
• A graphical user interface for the user to add feature points and correspondences
manually to model featureless videos;
• Several primitive shapes to model the most common industrial components.
In MoReLab, there is no feature detection and matching stage. Instead, the user
needs to add features manually based on the visual understanding of the scene. We have
implemented several user-friendly functionalities to speed up this tedious process for the
user. MoReLab is open-source software targeted for modeling industry scenarios and
available for non-commercial applications for everyone.
3. Method
In this section, we describe the pipeline, the graphical user interface, and the primitive
tools of MoReLab. We designed the software to be user-friendly and easy to use for new
users. However, understanding the tools and design of this software will enable the user to
achieve optimal results with MoReLab.
Sensors 2023, 23, 6456 5 of 25
Figure 2. The graphical user interface of MoReLab. The toolbar at the top allows the user to switch
between different tools.
3.2. Pipeline
Figure 3 presents the pipeline of our software. This pipeline consists of the
following steps:
pixel coordinates on other keyframes. Each feature location can be adjusted by dragging it
to the correct location.
where bij denotes a binary variable that equals 1 if the feature i is visible on the image j
and 0 otherwise. f (Xi , C j ) is the projection of i-th 3D point on j-th image. d( f (Xi , C j ), xij )
indicates the Euclidean distance between the projection point and xij . After this mini-
Sensors 2023, 23, 6456 7 of 25
mization, we obtain optimal camera parameters and locations of 3D points in the world
coordinate frame.
where N is the normal, B is the bi-normal, and T is the tangent. kbk is the radius of
the cylinder, and the base of the cylinder lies in the plane formed by T and B axes.
The height of the cylinder is calculated by projecting the vector P4 − P1 on N.
• Base Cylinder Tool: This tool allows users to create a cylinder in which the initial
three selected points lie on the base of the cylinder. The fourth point determines the
height of the cylinder. This is useful for most industrial scenarios because, in most
cases, we can only see the surface of the cylindrical equipment, and the base center is
not visible. As in other tools, the user needs to select the points by clicking on them.
The point can be either a 2D feature or an area containing a 3D primitive. For 2D
features, we get the corresponding 3D sparse point computed from bundle adjustment.
Similar to the center cylinder tool, first, we need to calculate a new local axes system,
i.e., T, B, and N similar to how these axes were calculated in the center cylinder tool.
In the new local system, the first point is considered to be at the origin; while the
second and third 3D points are projected on B and T to obtain their 2D locations in
the plane formed by B and T. Given these three 2D points, we find the circle passing
through these three points. If three points are in a straight line, the circle would not
be estimated because it would have an infinite radius. Once we know the center
and radius of this circle, we calculate the base and top points, similar to the center
cylinder tool.
• Curved Cylinder Tool: This tool models curved pipes and curved cylindrical equip-
ment. The user clicks on four points at any part of the image. Then, the user clicks
on a sparse 3D point obtained from bundle adjustment, this last point assigns an
approximate depth to the curve just defined. To do this, first, we estimate the plane
containing this 3D point, denoted as P. Typically, a plane is defined as:
ax + by + cz + d = 0, (3)
Sensors 2023, 23, 6456 8 of 25
where coefficients a, b, and c can be obtained from the z -vector of a camera projection
matrix, M. d is obtained by the dot product of the z-vector and P. Assume that s
represents the 2D point clicked by the user at ( x, y) coordinates on the image and X
represents the unknown 3D point corresponding to s.
M1
M2 M X M2 X
M=
M3 x= 1 y= [ a b c] · X + d = 0. (4)
M3 X M3 X
M4
Equation (4) can be re-arranged into the form of linear equation AX = b and a linear
solver finds X. Through this procedure, four 3D points are obtained corresponding to
the clicked points on the frame. These four 3D points act as control points to estimate
a Bézier curve [25] on the frame. Similarly, the user can define the same curve from a
different viewpoint. These curves defined at different viewpoints are optimized to
obtain the final curve in 3D space. This optimization is about minimizing the sum
of the Euclidean distance between control points across frames and the Euclidean
distance between the location of the projected point and the location of the 2D feature
in each frame containing the curve.
Assume that m frames contain curves. Let xij denote the i-th feature location on the
j-th image, CPij denotes i-th control point on the j-th frame. Xi denotes corresponding
i-th 3D point, and C j denotes camera parameters corresponding to j-th image, then
the objective function for optimization of curves is defined as:
m −1 m 4
arg min
CPij
∑ CPj − CPj+1 + ∑ ∑ d( f (CPij , C j ), xij ), (5)
j =1 j =1 i =1
where f (CPij , C j ) is the projection of the i-th control point on the j-th image. The Eu-
clidean distance between the projected point and xij , is represented by d( f (CPij , C j ), xij ).
The optimal control points, obtained from optimization, estimate the final Bézier curve
and the cylinder needs to be built around this curve. In order to define the radius
of this curved cylinder, the user clicks on a 3D point, and a series of cylinders are
computed around the final curve.
by a utility company in the energy sector. Ground-truth measurements have also been
provided for two videos of this dataset for quantitative testing purposes. The second
dataset was captured in our research institute to provide some additional results.
Agisoft Metashape is a popular high-quality commercial SfM software, which we
applied to our datasets. Such software extracts features automatically, matches them,
calibrates cameras, densely reconstructs the final scene, and generates a final mesh. The
output mesh can be visualized in a 3D mesh processing software such as MeshLab [26].
Results obtained with SfM software allow us to model these videos with user-assisted
tools, e.g., see Figure 7b. 3-Sweep is an example of software for user-assisted 3D reconstruc-
tion from a single image. It requires the user to have an understanding of the shapes of the
components. Initially, the border detection stage uses edge detectors to estimate the outline
of different components. The user selects a particular primitive shape, and three strokes
generate a 3D component that snaps to the object outline. Such a user-interactive interface
combines the cognitive abilities of humans with fat image processing algorithms. We will
perform a visual comparison of modeling different objects with an SfM software package,
3-Sweep, and our software. Table 1 presents a qualitative comparison of the functionalities
of software packages being used in our experiments. The measuring tool in MeshLab
performs measurements on models exported from Metashape and 3-Sweep.
(a) (b)
(c) (d)
(e) (f)
Figure 4. Modeling a cuboid with Metashape, 3-Sweep, and MoReLab: (a) A frame of input video;
(b) Cuboid modeling with Metashape; (c) Paint strokes snapped to cuboid outline; (d) Cuboid
modeling with 3-Sweep; (e) Modeling with rectangle tool; (f) MeshLab visualization of estimated
surfaces of the cuboid.
The jet pump beam has also been modeled with MoReLab in Figure 5e. The quadri-
lateral tool has been used to estimate the surface of the jet pump beam. The output
mesh is formed by joining piecewise quadrilaterals on the surface of the jet pump beam.
Quadrilaterals on the upper part of the jet pump are aligned very well together; but, some
misalignment can be observed on surfaces at the side of the jet pump beam. The resulting
mesh has a smooth surface and reflects the original shape of the jet pump beam. Hence,
this result is better than the mesh in Figure 5b and mesh in Figure 5d.
Sensors 2023, 23, 6456 11 of 25
(a) (b)
(c) (d)
(e) (f)
Figure 5. Jet pump beam is modeled with tested software programs under consideration:
(a) Metashape reconstruction output; (b) Another view of (a); (c) Paint strokes snapped to jet pump
beam outline; (d) Output obtained by modeling jet pump beam with 3-Sweep; (e) Estimation of jet
pump beam surface using quadrilateral tool in MoReLab; (f) Output obtained by modeling jet pump
beam with MoReLab.
(a) (b)
(c) (d)
(e) (f)
Figure 6. An example of modeling a cylinder with Metashape, 3-Sweep, and MoReLab: (a) A frame
of input video; (b) MeshLab visualization of a cylinder created using Metashape; (c) Paint strokes
snapped to cylinder outline in 3-Sweep; (d) MeshLab visualization of a cylinder modeled using
3-Sweep; (e) Modeling a cylinder using base cylinder tool in MoReLab; (f) MeshLab visualization of
a cylinder mesh obtained from MoReLab.
(a) (b)
(c) (d)
(e) (f)
(g) (h)
(i) (j)
Figure 7. An example of modeling a curved pipe: (a) A frame of input video; (b) Modeling curved
pipes in Metashape; (c) Paint strokes snapped to curved cylinder outlines; (d) Estimation of curved
pipes using 3-Sweep visualized in MeshLab; (e) Bézier curve is drawn on a frame; (f) Bézier curve
is drawn on another frame; (g) Curves on multiple frames are optimized to obtain the final Bézier
curve shown by red color; (h) A cylinder around the curve is created; (i) A copy of the first cylinder is
placed on the second pipe; (j) Estimated curved cylinders are visualized in MeshLab.
Sensors 2023, 23, 6456 14 of 25
Figure 7e–j show the steps of modeling two curved cylinders in MoReLab. The
results are quite good, even if there is a small misalignment between the pipes and the
underlying frame.
(a) (b)
(c) (d)
(e) (f)
Figure 8. Modeling cuboids and a curved pipe with tested software programs: (a) Metashape
reconstruction output visualized in MeshLab; (b) A different view of the Metashape reconstruction
visualized in MeshLab; (c) Paint strokes snapped to desired object outlines; (d) 3-Sweep output
visualized in MeshLab; (e) Estimation of desired objects in MoReLab; (f) Estimated objects are
visualized in MeshLab.
Sensors 2023, 23, 6456 15 of 25
Figure 9 shows the result of modeling another video. Metashape output (see Figure 9a)
shows a high level of approximation. The red rectangular region represents the curved
pipe in the frame, and Figure 9b shows the zoom-in of this rectangular region. The lack of a
smooth surface reduces the recognizability of the pipe and introduces inaccuracies in the
measurements. Figure 9d shows gaps in the 3D output model of a curved pipe. However,
outputs obtained with MoReLab are more accurate and represent the underlying objects
more accurately.
(c) Paint strokes snapped to desired object outlines. (d) 3-Sweep output.
(e) Estimation of desired objects in MoReLab. (f) MeshLab visualization of modeled objects.
Figure 9. Modeling cuboids and a curved pipe with Metashape, 3-Sweep, and MoReLab.
4.6. Discussion
The results obtained with SfM packages (e.g., see Figures 4b, 6b, 7b, 8b, and 9a) elicit
the need to identify features manually and develop software for user-assisted reconstruction.
The reason for low-quality output models obtained using 3-Sweep can be attributed to
low-quality border detection. This is due to dark light conditions in these low-resolution
images. 3-Sweep modeled high-resolution images in their paper and reported high-quality
results in their work for high-quality images. However, our experiments indicate that
3-Sweep is not suitable for low-resolution images and industrial scenarios mentioned in
Figure 1. In these difficult scenarios, 3-Sweep suffers from low robustness and irregularity
in the shapes of meshes.
MoReLab does not rely on the boundary detection stage and hence generates
more robust results. After computing sparse 3D points on the user-provided features,
our software provides tools to the user to quickly model objects of different shapes.
Figures 4f, 5e, 6e, 7i, 8e, and 9e demonstrate the effectiveness of our software by showing
the results obtained with our software tools.
Sensors 2023, 23, 6456 16 of 25
where Mg is the ground-truth measurement, and Me is a measure length from the estimated
3D model.
Figure 10. Measurements are taken in MoReLab. The distance of 22.454 cm between features 31 and
32 is the measurement provided for calibration. The other distances are calculated according to this
reference distance.
The selection of measurements has been done according to the available ground-
truth measurements from diagrams of equipment. Table 2 also presents a comparison of
relative errors with these three software packages. Among the five measurements under
consideration, MoReLab achieves the lowest errors in three measurements and the lowest
average relative error.
Table 3 reports measurements obtained with Metashape, 3-Sweep, and MoReLab
on another video of the first dataset, and Figure 11 shows these measurements taken in
MoReLab. Given the availability of a CAD model for the jet pump, we take meaningful
measurements between corners in a CAD file and use these measurements as ground truths.
Table 3 also presents a comparison of relative errors with these three software packages.
Among the five measurements under consideration, MoReLab achieves the lowest errors
in three measurements and the lowest average relative error.
Sensors 2023, 23, 6456 17 of 25
Table 2. Results of comparing MoReLab against Metashape and 3-Sweep in terms of relative error in
measurements on the first video (see Figure 10).
Table 4 reports measurements and calculations for a video of the second dataset,
and Figure 12 illustrates these measurements in MoReLab. We took some meaningful
measurements to be used as ground truth for measurements with Metashape, 3-Sweep,
and MoReLab. Relative errors are also calculated for these measurements and reported
in Table 4. All software programs have achieved more accurate measurements in this
video with respect to videos of the first dataset. This can be due to more favorable lighting
conditions and high-resolution frames containing a higher number of recognizable features.
Similar to Tables 2 and 3, five measurements have been considered and MoReLab achieves
the lowest relative errors in three measurements and the lowest average relative error in
comparison to other software programs.
Figure 11. Measurements computed in MoReLab. The distance of 7.630 cm between features 28 and
29 is the measurement provided for calibration, and other distances are calculated.
Sensors 2023, 23, 6456 18 of 25
Table 3. Results of comparing MoReLab against Metashape and 3-Sweep in terms of relative error in
measurements on the second video (see Figure 11).
Figure 12. Measurements computed in MoReLab. The distance of 50 cm between features 25 and 31
is the measurement provided for calibration, and other distances are calculated.
Sensors 2023, 23, 6456 19 of 25
Table 4. Results of comparing MoReLab against Metashape and 3-Sweep in terms of relative error in
measurements on the third video (see Figure 12).
Figure 13. Measurements computed in MoReLab. The distances of 55.8 cm between features 39 and
40 are provided for calibration, and other distances are calculated.
Table 5. Results of comparing MoReLab against Metashape and 3-Sweep in terms of relative error in
measurements on the fourth video (see Figure 13).
Table 6. Results of comparing MoReLab against Metashape and 3-Sweep in terms of relative error in
measurements on the first video seen in Figure 14. The 1-measurement calibration table corresponding
to this one is Table 2.
4.7.3. Limitations
From our evaluation, we have shown that our method performs better than other
approaches for our scenario of industrial plants. However, users need to be accurate
and precise when adding feature points and to use a high-quality methodology when
performing measurements. Overall, all image-based 3D reconstruction methods, including
ours, cannot achieve a precision of millimeters (at our scale) or less for many factors (e.g.,
sensor resolution). Therefore, if an object has a small scale the error introduced by the
tolerance is lower than the reconstruction error.
Sensors 2023, 23, 6456 21 of 25
Figure 14. Measurements computed in MoReLab. The distances of 22.454, 14.046, and 12.395 cm are
provided for calibration, and other distances are calculated.
Table 7. Results of comparing MoReLab against Metashape and 3-Sweep in terms of relative error
in measurements on the second video seen in Figure 15. The 1-measurement calibration table
corresponding to this one is Table 3.
Figure 15. Measurements computed in MoReLab. The distances of 7.63, 3.355, and 3.216 cm are
provided for calibration, and other distances are calculated.
Sensors 2023, 23, 6456 22 of 25
Table 8. Results of comparing MoReLab against Metashape and 3-Sweep in terms of relative er-
ror in measurements on the third video seen in Figure 16. The 1-measurement calibration table
corresponding to this one is Table 4.
Figure 16. Measurements computed in MoReLab. The distances of 50, 35, and 7 cm are provided for
calibration, and other distances are calculated.
Table 9. Results of comparing MoReLab against Metashape and 3-Sweep in terms of relative er-
ror in measurements on the fourth video seen in Figure 17. The 1-measurement calibration table
corresponding to this one is Table 5.
Figure 17. Measurements computed in MoReLab. The distances of 55.8, 24, and 17.5 cm are provided
for calibration, and other distances are calculated.
5. Conclusions
We have developed a user-interactive 3D reconstruction tool for modeling low-quality
videos. MoReLab can handle long videos and is well-suited to model featureless objects
in videos. It allows the user to load a video, extract frames, mark features, estimate
the 3D structure of the video, add primitives (e.g., quads, cylinders, etc.), calibrate, and
perform measurements. These functionalities lay the foundations of the software and
present a general picture of its use. MoReLab allows users to estimate shapes that are
typical of industrial equipment (e.g., cylinders, curved cylinders, etc.) and measure them.
We evaluated our tool for several scenes and compared results against the automatic
SfM software program, Metashape, and another modeling software, 3-Sweep [16]. Such
comparisons show that MoReLab can generate 3D reconstructions from low-quality videos
with less relative error than these state-of-the-art approaches. This is fundamental in the
industrial context when there is the need to obtain measurements of objects in difficult
scenarios, e.g., in areas with chemical and radiation hazards.
In future work, we plan to extend MoReLab tools for modeling more complex indus-
trial equipment and to show that we are not only more effective than other state-of-the-art
approaches in terms of measurement errors but also more efficient in terms of the time that
the user needs to spend to achieve an actual reconstruction.
Author Contributions: Conceptualization, A.S., F.B., M.C. and P.C.; methodology, A.S., F.B., M.C.
and P.C.; experiments, A.S.; validation, A.S., F.B. and M.C.; formal analysis, A.S., F.B. and M.C.;
investigation, A.S., F.B. and M.C.; resources, F.B., M.C., D.S. and C.J.; data curation, A.S., F.B.,
P.C., D.S. and C.J.; writing—original draft preparation, A.S.; writing—review and editing, F.B.,
M.C., D.S. and C.J.; supervision, F.B., M.C. and P.C.; project administration, F.B., P.C., D.S. and C.J.;
funding acquisition, P.C., D.S. and C.J. All authors have read and agreed to the published version of
the manuscript.
Funding: This research has been supported by European Union’s Horizon 2020 Programme EVOCA-
TION: Advanced Visual and Geometric Computing for 3D Capture, Display, and Fabrication (Grant
Agreement No. 813170) and a research contract with Electric Power Research Institute.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The first dataset presented in this study is not available; but, the second
dataset can be provided on request.
Conflicts of Interest: The authors declare no conflict of interest.
Sensors 2023, 23, 6456 24 of 25
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Vacca, G. 3D Survey with Apple LiDAR Sensor —Test and Assessment for Architectural and Cultural Heritage. Heritage
2023, 6, 1476–1501. [CrossRef]
2. Rocchini, C.; Cignoni, P.; Montani, C.; Pingi, P.; Scopigno, R. A low cost 3D scanner based on structured light. In Computer
Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2001; Volume 20, pp. 299–308.
3. Pollefeys, M.; Koch, R.; Vergauwen, M.; Van Gool, L. Hand-Held Acquisition of 3D Models with a Video Camera. In Proceedings
of the Second International Conference on 3-D Digital Imaging and Modeling, Ottawa, ON, Canada, 8 October 1999; pp. 14–23.
[CrossRef]
4. Schönberger, J.L.; Frahm, J.M. Structure-from-Motion Revisited. In Proceedings of the Conference on Computer Vision and
Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016.
5. Rupnik, E.; Daakir, M.; Pierrot Deseilligny, M. MicMac—A free, open-source solution for photogrammetry. Open Geospat. Data
Softw. Stand. 2017, 2, 14. [CrossRef]
6. Cernea, D. OpenMVS: Multi-View Stereo Reconstruction Library. City 2020, 5, 7.
7. Triggs, B.; McLauchlan, P.F.; Hartley, R.I.; Fitzgibbon, A.W. Bundle adjustment—A modern synthesis. In Proceedings of the
International Workshop on Vision Algorithms, Corfu, Greece, 21–22 September 1999; Springer: Berlin/Heidelberg, Germany,
1999; pp. 298–372.
8. Van Den Hengel, A.; Dick, A.; Thormählen, T.; Ward, B.; Torr, P.H. Videotrace: Rapid interactive scene modelling from video.
ACM Trans. Graph. (ToG) 2007, 26, 86-es
9. Sinha, S.N.; Steedly, D.; Szeliski, R.; Agrawala, M.; Pollefeys, M. Interactive 3D architectural modeling from unordered photo
collections. ACM Trans. Graph. (TOG) 2008, 27, 1–10. [CrossRef]
10. Xu, M.; Li, M.; Xu, W.; Deng, Z.; Yang, Y.; Zhou, K. Interactive mechanism modeling from multi-view images. ACM Trans. Graph.
(TOG) 2016, 35, 1–13. [CrossRef]
11. Rasmuson, S.; Sintorn, E.; Assarsson, U. User-guided 3D reconstruction using multi-view stereo. In Proceedings of the
Symposium on Interactive 3D Graphics and Games, San Francisco, CA, USA, 5–7 May 2020; pp. 1–9.
12. Habbecke, M.; Kobbelt, L. An Intuitive Interface for Interactive High Quality Image-Based Modeling. In Computer Graphics
Forum; Wiley Online Library: Hoboken, NJ, USA, 2009; Volume 28, pp. 1765–1772.
13. Baldacci, A.; Bernabei, D.; Corsini, M.; Ganovelli, F.; Scopigno, R. 3D reconstruction for featureless scenes with curvature hints.
Vis. Comput. 2016, 32, 1605–1620. [CrossRef]
14. Doron, Y.; Campbell, N.D.; Starck, J.; Kautz, J. User directed multi-view-stereo. In Proceedings of the Computer Vision-ACCV
2014 Workshops, Singapore, 1–2 November 2014; Revised Selected Papers, Part II 12; Springer: Berlin/Heidelberg, Germany,
2015; pp. 299–313.
15. Töppe, E.; Oswald, M.R.; Cremers, D.; Rother, C. Image-based 3d modeling via cheeger sets. In Proceedings of the Computer
Vision—ACCV 2010: 10th Asian Conference on Computer Vision, Queenstown, New Zealand, 8–12 November 2010; Revised
Selected Papers, Part I 10; Springer: Berlin/Heidelberg, Germany, 2011; pp. 53–64.
16. Chen, T.; Zhu, Z.; Shamir, A.; Hu, S.M.; Cohen-Or, D. 3-sweep: Extracting editable objects from a single photo. ACM Trans. Graph.
(TOG) 2013, 32, 1–10. [CrossRef]
17. Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. NeRF: Representing Scenes as Neural Radiance
Fields for View Synthesis. Commun. ACM 2021, 65, 99–106. [CrossRef]
18. Tewari, A.; Thies, J.; Mildenhall, B.; Srinivasan, P.P.; Tretschk, E.; Wang, Y.; Lassner, C.; Sitzmann, V.; Martin-Brualla, R.; Lombardi,
S.; et al. Advances in Neural Rendering. Comput. Graph. Forum 2022, 41, 703–735. [CrossRef]
19. Chan, E.R.; Monteiro, M.; Kellnhofer, P.; Wu, J.; Wetzstein, G. pi-gan: Periodic implicit generative adversarial networks for
3d-aware image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville,
TN, USA, 20–25 June 2021; pp. 5799–5809.
20. Tu, Z.; Huang, Z.; Chen, Y.; Kang, D.; Bao, L.; Yang, B.; Yuan, J. Consistent 3d hand reconstruction in video via self-supervised
learning. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 9469–9485. [CrossRef] [PubMed]
21. Longuet-Higgins, H.C. A computer algorithm for reconstructing a scene from two projections. Nature 1981, 293, 133–135.
[CrossRef]
22. Banterle, F.; Gong, R.; Corsini, M.; Ganovelli, F.; Gool, L.V.; Cignoni, P. A Deep Learning Method for Frame Selection in Videos
for Structure from Motion Pipelines. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP),
Anchorage, AK, USA, 19–22 September 2021; pp. 3667–3671. [CrossRef]
23. Nocerino, E.; Lago, F.; Morabito, D.; Remondino, F.; Porzi, L.; Poiesi, F.; Rota Bulo, S.; Chippendale, P.; Locher, A.; Havlena,
M.; et al. A Smartphone-Based 3D Pipeline for the Creative Industry the Replicate Eu Project. Int. Arch. Photogramm. Remote Sens.
Spat. Inf. Sci. 2017, XLII-2/W3, 535–541. [CrossRef]
Sensors 2023, 23, 6456 25 of 25
24. Branch, M.A.; Coleman, T.F.; Li, Y. A Subspace, Interior, and Conjugate Gradient Method for Large-Scale Bound-Constrained
Minimization Problems. SIAM J. Sci. Comput. 1999, 21, 1–23. [CrossRef]
25. Gordon, W.J.; Riesenfeld, R.F. Bernstein-BéZier Methods for the Computer-Aided Design of Free-Form Curves and Surfaces.
J. ACM 1974, 21, 293–310. [CrossRef]
26. Cignoni, P.; Callieri, M.; Corsini, M.; Dellepiane, M.; Ganovelli, F.; Ranzuglia, G. Meshlab: An open-source mesh processing tool.
In Proceedings of the Eurographics Italian Chapter Conference, Salerno, Italy, 2–4 July 2008; Volume 2008, pp. 129–136.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.