Sharp Feature-Preserving 3D Mesh Reconstruction From Point Clouds Based On Primitive Detection

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

remote sensing

Article
Sharp Feature-Preserving 3D Mesh Reconstruction from Point
Clouds Based on Primitive Detection
Qi Liu 1 , Shibiao Xu 2 , Jun Xiao 1, * and Ying Wang 1

1 School of Artificial Intelligence, University of Chinese Academy of Sciences, No. 19 Yuquan Road,
Shijingshan District, Beijing 100049, China
2 School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
* Correspondence: [email protected]; Tel.: +86-10-8825-6566

Abstract: High-fidelity mesh reconstruction from point clouds has long been a fundamental research
topic in computer vision and computer graphics. Traditional methods require dense triangle meshes
to achieve high fidelity, but excessively dense triangles may lead to unnecessary storage and computa-
tional burdens, while also struggling to capture clear, sharp, and continuous edges. This paper argues
that the key to high-fidelity reconstruction lies in preserving sharp features. Therefore, we introduce
a novel sharp-feature-preserving reconstruction framework based on primitive detection. It includes
an improved deep-learning-based primitive detection module and two novel mesh splitting and
selection modules that we propose. Our framework can accurately and reasonably segment primitive
patches, fit meshes in each patch, and split overlapping meshes at the triangle level to ensure true
sharpness while obtaining lightweight mesh models. Quantitative and visual experimental results
demonstrate that our framework outperforms both the state-of-the-art learning-based primitive
detection methods and traditional reconstruction methods. Moreover, our designed modules are
plug-and-play, which not only apply to learning-based primitive detectors but also can be com-
bined with other point cloud processing tasks such as edge extraction or random sample consensus
(RANSAC) to achieve high-fidelity results.

Keywords: mesh reconstruction; point clouds; sharp feature; primitive detection

Citation: Liu, Q.; Xu, S.; Xiao, J.;


Wang, Y. Sharp Feature-Preserving
3D Mesh Reconstruction from Point
1. Introduction
Clouds Based on Primitive Detection. Reconstructing 3D mesh surfaces from point clouds is a key research topic in com-
Remote Sens. 2023, 15, 3155. https:// puter vision and computer graphics, as it enables subsequent computer applications such
doi.org/10.3390/rs15123155 as calculation, rendering, lighting, deformation, and physical simulation. In particular,
Academic Editor: Filiberto
achieving high-fidelity results has been a major focus in this field. Mesh reconstruction has
Chiabrando significant value in various domains, such as reverse engineering, game animation, virtual
and augmented reality, and robotics vision.
Received: 28 April 2023 Traditional reconstruction algorithms [1,2] achieve high fidelity by producing dense,
Revised: 24 May 2023 high-resolution triangle meshes, but this approach can create unnecessary storage or
Accepted: 15 June 2023
computational burden due to the large number of triangles.
Published: 16 June 2023
For common manmade 3D objects, such as CAD models, urban buildings, and indoor
furniture, the key to achieving high-fidelity reconstruction is to preserve their sharp edge
features during reconstruction. Therefore, reconstruction methods based on geometric
Copyright: © 2023 by the authors.
primitive detection may be the best way to achieve high fidelity [3]. This is because manmade
Licensee MDPI, Basel, Switzerland. objects can be represented as a combination of primitive surface patches such as planes,
This article is an open access article spheres, cylinders, and cones, among others.
distributed under the terms and Despite the many related studies in this area, there is currently no complete framework
conditions of the Creative Commons that allows users to easily obtain high-fidelity mesh models from point clouds. For example,
Attribution (CC BY) license (https:// traditional methods use algorithms such as random sample consensus (RANSAC) [4,5] or
creativecommons.org/licenses/by/ cluster segmentation [6,7] to segment point clouds and obtain primitive patches, but these
4.0/). methods are limited by the tedious tuning required for each model. Recent methods [8–10]

Remote Sens. 2023, 15, 3155. https://doi.org/10.3390/rs15123155 https://www.mdpi.com/journal/remotesensing


Remote Sens. 2023, 15, 3155 2 of 20

leverage deep learning to learn semantics from large amounts of data, achieving better
generalization performance and higher extraction accuracy. However, as shown in Figure 1,
the recent state-of-the-art methods such as Parsenet [9] and HPNet [10] extract primitives
by training neural networks in a supervised manner, but they are more focused on the
accuracy of extraction and prediction and have not considered reconstruction results in
detail. For a more detailed introduction of related work, please refer to Section 2.

Input Points ParSeNet Segments HPNet Segments Ours Segments

Ground Truth ParSeNet HPNet Ours

Figure 1. Comparison results between our method and the previous state-of-the-art methods
(ParSeNet [9] and HPNet [10]) for primitive detection and reconstruction. Our method aims to
preserve sharp boundary features and ultimately produce high-fidelity mesh models that are close to
the ground truth.

In this paper, we propose a full-process reconstruction framework based on primitive


detection. Our goal is to provide a complete and user-friendly solution that can reconstruct
high-fidelity mesh models with sharp features preserved from point clouds. It includes (1)
an improved primitive extraction module based on deep learning, equipped with a necessary
post-processing refine module, achieving the best performance in extracting primitive
patches from point clouds to date, and (2) an efficient mesh fitting and splitting module is
used to preserve sharp features. A brand-new selection module based on linear optimization
is used to ensure a watertight final result. For technical details of each module, please refer
to Section 3.
Visual and quantitative experimental results show that our reconstruction framework
has the following advantages: (1) High fidelity. As shown in Figure 1, our framework
produces reconstruction models that are visually close to ground truth, with clear and
sharp boundaries. (2) Lightweight representation. Unlike previous dense mesh reconstruc-
tion methods, our approach requires fewer triangles to capture object boundary details,
resulting in lightweight models. (3) Flexibility. Our proposed modules are versatile and
can be combined with other tasks, such as automatic or semiautomatic edge extraction,
to produce high-quality reconstruction models. Moreover, our method can generate sharp
and lightweight mesh models from real-scanned point clouds, as shown in Figure 2, which
is the result of our modules combined with RANSAC applied to point clouds of urban
buildings. For more experimental details, please refer to Section 4.
In conclusion, this work presents the following contributions:
• We propose a novel complete framework for reconstructing meshes from point clouds
based on primitive detection. Our framework can accurately preserve sharp and clear
boundary features and generate high-fidelity reconstruction models.
• The framework includes an improved learning-based primitive detection module.
Experiments show that it outperforms previous methods, with Seg-IoU and Type-
IoU scores improving from 85.24/91.04 to 88.42/92.85. In addition, we specifically
Remote Sens. 2023, 15, 3155 3 of 20

designed a refine submodule to optimize the detected segmentation, obtaining more


reasonable segmentation patches.
• The framework also includes an efficient module for mesh splitting that can separate
overlapping meshes at the triangle level, producing clear and continuous segmentation
blocks. This module helps our framework reconstruct high-quality sharp edges, and it
can be well parallelized.
• Our framework also features a novel optimization selection module, which treats the
reconstruction task as a minimum subset selection problem. In our framework, this
module is responsible for selecting the optimal subset from the already split mesh
collection, to obtain the optimal reconstruction result. The design of this module
considers both the local and global information of the input model.

Figure 2. Results of our method combined with RANSAC [5] applied to real-scanned urban building
point clouds . Our proposed modules can be used as plug-and-play post-processing modules to
produce sharp and lightweight high-quality mesh models. (a) Input points, (b) RANSAC segments,
(c) our refined submodule outputs, (d) our selection module outputs, (e) reconstructed mesh model.

Next, in Section 2, we introduce related work, and in Section 3, we detail our frame-
work and each module’s specifics. Then, in Section 4, we present visualization and quan-
titative experimental results, and explore the flexibility and robustness of our proposed
framework for transfer to other tasks through a series of exploratory experiments.

2. Related Work
The technologies most relevant to this article are surface reconstruction and primitive de-
tection from point clouds, which we briefly describe in this section. For more comprehensive
discussion, please refer to the recent surveys [1–3].

2.1. Surface Reconstruction


2.1.1. Non-Learning-Based Methods
Reconstructing the mesh surface from point clouds is an ill-posed problem in the
absence of reasonable priors [1,2]. Early classic reconstruction methods are employed
smoothness as a prior to infer the implicit surface from the point cloud globally or locally,
and then isosurface extraction algorithms such as marching cubes [11] are used to ob-
tain the explicit mesh. Representative works include methods based on moving least
squares (MLS) [12,13], radial basis functions (RBFs), or the Hermite variant of the RBFs
(HRBF) [14–16], and the currently most commonly used Poisson reconstruction [17,18].
The reconstruction quality of these methods is limited by the resolution of the triangles,
often requiring dense triangles to ensure high fidelity, especially for sharp edges, which
can cause unnecessary computational and storage pressures.
Therefore, some subsequent methods use piecewise smoothness as a prior to preserve
sharp edges while improving the reconstruction performance. For example, Robust
MLS (RMLS) [19] fits feature boundaries through two different smoothing components,
and points on different smoothing components can be regarded as outliers of each other.
Robust implicit MLS (RIMLS) [20] assumes that the points on two different smooth com-
ponents can be regarded as outliers in both the space and the gradient domains; however,
they are only processed locally and are prone to jagged edges at the border. Edge-aware
resampling (EAR) [21] uses a bilateral mechanism to smooth normals and separates the
sharp feature boundaries from the smooth area. The locally optimal projector (LOP) [22]
smooths and resamples the area far from feature boundaries, whereas interpolation pro-
Remote Sens. 2023, 15, 3155 4 of 20

jection upsamples and enhances sharp feature areas. Though these methods preserve or
enhance sharp edges, they still produce jagged edges or artifacts at the boundaries and do
not obtain complete and continuous boundaries compared to our method.

2.1.2. Learning Based Methods


Recently, deep-learning-based reconstruction techniques have gained popularity.
These methods propose the use of neural networks to learn the continuous surface of
objects, also known as neural implicit representation, and were first introduced in pioneering
works such as IM-Net [23], DeepSDF [24], and Occupancy Networks [25]. Unlike com-
monly used explicit representations such as point clouds, voxel grids, and meshes, neural
implicit representation trains a discriminator to output the occupancy or the distance to the
surface (signed distance function, SDF) based on the query points in space during training.
During inference, a discrete grid is utilized, with each grid cell being queried to obtain the
occupancy or SDF value, and marching cubes [11] are used to obtain an explicit mesh.
Subsequent follow-up research proposed several important observations. The first
is that the network should pay more attention to local information, which is conducive
to learning better generalization. Points2Surf [26] proposes to sample two patches, local
and global, based on query points in the input point cloud, and learn two encoders for
each patch as two separate branches, combining local and global information to learn
the discriminator. Local Implicit Grid (LIG) [27] proposes to not directly learn the entire
object, but rather to learn local parts. During reconstruction, already learned patches are
combined using an optimization algorithm to obtain the optimal combination. This is
similar to our approach, but we learn primitive patches with geometric meaning, while
they randomly cut patches. The second observation is that convolutional networks have
stronger representation abilities than the fully-connected networks used in the early stages,
which can bring better reconstruction results. ConvONet [28] projects features extracted
from the point cloud onto a feature grid, further aggregates features using 3D CNN on the
feature grid, and finally hands them over to a discriminator to generate the model. Overall,
existing methods based on neural implicit representation have not paid special attention to
boundaries and high fidelity, and therefore cannot obtain high-quality mesh models similar
to our method.
In Section 4.2, we provide a detailed comparison of our method with representative
works mentioned above, and we give a more detailed analysis.

2.2. Primitive Detection


In recent decades, more higher-level priors, such as semantic and geometric priors,
have been emphasized in reconstruction research. Primitive-based approaches assume that
objects can be represented by a combination of simple and standard geometric shapes (such
as planes, spheres, cylinders, cones, etc.).

2.2.1. Non-Learning-Based Methods


Popular non-learning primitive detection methods are mainly based on RANdom
SAmple Consensus (RANSAC) [4] and its variants [5,29], which are widely used due to
their robustness to outliers. Among them, Schnabel et al. [5] first proposed extracting prim-
itives such as planes, spheres, cylinders, and cones from point clouds based on RANSAC,
and reconstructing models through their combination. Schnabel et al. [29] extrapolated all
detected primitives by RANSAC and calculated the intersection points between them. The
extrapolation of the primitives is interpreted as a graph cut problem.
A limitation of RANSAC-based methods is that if some parts of the point cloud model
cannot be well represented by the defined primitives, the reconstruction result cannot be
obtained. Lafarge et al. [30] attempted to solve this problem through a hybrid approach.
Part of the regular structure is represented by planes and their combinations, while the
area without primitives is reconstructed using the graph cut method based on Delaunay
triangulation. Nan et al. [31] used planes to approximate all surfaces, they obtained a set of
Remote Sens. 2023, 15, 3155 5 of 20

candidate faces through the intersection of planes detected by RANSAC, then selected the
optimal subset through binary linear optimization to generate lightweight reconstruction
results. It is only suitable for objects containing only planes.
In general, non-learning primitive-based methods suffer from the trouble of parameter
tuning, which requires manual adjustments for each model and primitive. However,
the idea of combining the primitives to generate models by selection and optimization
inspired our work.

2.2.2. Learning-Based Methods


Recent works extract features from point clouds by supervised training of a neural
network to detect primitives. CSGNet [32] and CAPRI-Net [33] reconstruct objects by
combining the detected quadric surface primitives via constructive solid geometry (CSG)
operations. However, they require a labeled hierarchical structure for primitives, which
may not always be readily available.
The most relevant works to our framework are SPFN [8], ParSeNet [9], and HPNet [10].
SPFN is the pioneering work among them, which proposes predicting pointwise properties
by segmentation labels, type labels, and normals, and fitting the primitive parameters
through a differential model estimation module. ParSeNet adds the prediction of B-spline
patches on the basis of SPFN, which increases the expressive ability of surface models.
Both SPFN and ParSeNet only use high-dimensional semantic supervision information for
prediction. HPNet improves the prediction accuracy by introducing additional geometric
features (sharp edges).
Learning-based methods avoid trivial manual tuning of parameters, but the supervised
methods still suffer from the problem of data dependence. It is foreseeable that as larger
models and larger datasets are continuously proposed, learning-based methods will have
more room for improvement.

3. Method
The input of our framework is a 3D point cloud P = { pi |1 ≤ i ≤ N }, and each point
pi contains position pi ∈ R3 and normal ni ∈ R3 ; therefore, pi ∈ R6 . Our goal is to finally
obtain a high-fidelity watertight mesh model.
Figure 3 shows the pipeline of our framework; this section details the implementation
of primitive detection module, mesh fitting and splitting module, and selection module. More
experiments are detailed in Section 4.

Detection Network Refine Clustering

Primitive Detection Mesh Fitting and Splitting Selection

Figure 3. The pipeline of our framework.

3.1. Primitive Detection Module


3.1.1. Coarse Primitive Detection Based on Supervised Learning
As shown in Figure 3, our primitive detection module adopts a two-step strategy, which
first applies an improved supervised-learning-based coarse primitive detector. Here, we
first briefly introduce the previous state-of-the-art primitive detector HPNet [10], and then
describe our improvements based on it.
HPNet employs a two-stage hybrid model that combines neural networks with geo-
metric constraints. The trainable part employs a three-layer DGCNN [34] as the backbone
encoder, which encodes each point pi ∈ P into R256 feature space, and then outputs through
multiple classification heads: an embedding descriptor ei ∈ R128 , a binary primitive type
Remote Sens. 2023, 15, 3155 6 of 20

prediction vector ti ∈ {0, 1}6 (corresponds to 6 primitive types: plane, sphere, cylinder,
cone, B-spline-open, and B-spline-closed), and a shape parameter prediction vector si ∈ R22 .
With regard to the nontrainable post-processing part, HPNet constructs two constraints,
named geometric consistency and smoothness consistency, according to the primitive
parameters and the known normal. According to these two constraints, the embedding
descriptors e ∈ R N ×128 are then clustered by mean-shift [35] to obtain the final K patches
Pk , {1 ≤ k ≤ K } and P = P1 ∪ · · · ∪ PK .
HPNet achieves the best primitive detection performance due to its use of geometric
constraints, but the main limitation comes from the DGCNN backbone, which makes the
throughput during training significantly lower than other learning-based point cloud pro-
cessing methods, resulting in higher training costs. In addition, recent work PointNeXt [36]
proves that even the most classic and widely used backbone PointNet++ [37], after simply
adopting improved training strategies, can outperform some recent complex designed
backbones (such as PointMLP [38] and Point Transformer [39]).
Therefore, as per Figure 4, we keep the same two-stage design as HPNet and make the
following enhancements: (1) We replaced DGCNN with PointNeXt-b, a classic PointNet++
equipped with the bottleneck structure [40]), resulting in a significant improvement in
throughput and detection performance; (2) we switched from Adam optimizer to AdamW;
(3) we implemented cosine learning rate decay; and (4) we incorporated label smoothing.
As a result, our primitive detection network achieved higher segmentation mean IoU and
primitive type mean IoU scores of 88.42/92.85 on the ABCParts [9] benchmarks, compared
to the original scores of 85.24/91.04. Additionally, we improved throughput from 8 ins/sec
to 28 ins/sec. For more detailed experimental information, please refer to Section 4.

Figure 4. Primitive detection network. Learnable encoder predicts embedding descriptors, primitive
type prediction vectors, and shape parameter prediction vectors from input point clouds. Then the
mean-shift module clusters embedding descriptors through geometric consistency and smoothness
consistency to produces primitive segments.

3.1.2. Refine Clustering via Normal Angle


The accuracy of the segmentation results obtained from the primitive detection net-
work depends heavily on the quality of labels used in its supervised training. For instance,
the ABCParts [9] benchmark dataset is automatically labeled and oversegmented, which
can negatively impact the segmentation results. To address this issue, we introduced a
refine clustering submodule that can optionally be used to improve the oversegmentation
problem by enforcing normal angle constraints. This submodule can be skipped if the
upstream task already produces accurate segmentation results.
Specifically, our submodule first eliminates patches with fewer than Nmin points
and computes the convex hull for the remaining patches. Next, taking two adjacent
patches, P p and Pq , as an example, we detect their adjacency relationship and determine
whether to merge them. Here, we refer to the clustering approach of P-linkage [41] to
automatically cluster each patch into a collection of smaller slices, resulting in cluster sets
C p = {c p i |1 ≤ i ≤ Np } and Cq = {cq j |1 ≤ j ≤ Nq }. Then, we apply principal component
analysis (PCA) [42] locally to each c p and cq to compute their normals, representing each
patch as multiple slices with normals. Finally, if more than Nc slice pairs satisfy the
Remote Sens. 2023, 15, 3155 7 of 20

following Equation (1), it indicates that the two patches P p and Pq have a good adjacent
smooth transition, and they will be merged.
−−−→ −−→
arccos|n(c p )> · n(cq )| ≤ θt (1)

where θt is the angle threshold that determines the curvature of the surface to be merged.
In summary, our refine clustering submodule effectively removes disturbances caused
by trivial patches and merges oversegmented patches.

3.2. Mesh Fitting and Splitting Module


After extracting the point cloud patches from the primitive detection module, this section
introduces our mesh fitting and splitting module, which is used to fit meshes to each patch
and split intersecting meshes. The module consists of four steps: mesh fitting, intersection
line detection, pairwise splitting, and partitioning triangles in nonintersecting area.
For mesh fitting, We use Huang et al. [16]’s method (other meshing methods can also
be used, such as screened Poisson reconstruction [18], which can handle most cases faster
but may not be able to fill holes) to produce triangular meshes S = {Si |1 ≤ i ≤ Ns }, which
can handle cases with holes or missing data. Note that a slightly wider grid or scale should
be set to ensure overlapping and intersection between meshes (such as Figure 3).
For intersection line detection, we iterate through and pairwise check for intersections.
For a pair of surfaces Sa and Sb in S , we first calculate the axis-aligned bounding boxes [43]
for all triangles on each surface separately. Collision detection of the bounding boxes
can determine all the intersecting triangle pairs on the two surfaces, that is, where the
intersection lines are. This is more efficient than traversing all the triangles to calculate the
intersection line.
For the pairwise splitting, the intersection line detection has already identified pairs of
intersection triangles. Take ∆a ∈ Sa and ∆b ∈ Sb in Figure 5 for an example; we fit triangle
∆b to plane b̄, and divide triangle ∆a according to the relationship between the intersection
point and the vector as follows:

(a) (b) (c) (d)

Figure 5. Pairwise splitting. (a) A pair of intersecting triangles ∆a and ∆b. (b) Fit ∆b to a plane b̄.
(c) Split ∆a into three small triangles (∆p1 a3 p2 , ∆p1 p2 a2 , and ∆p1 a2 a1 ) according to intersection points
p1 and p2 , then divide them according to vectors ~v and ~u. (d) Process the triangles on intersection line.

~v = − p− → u
1 p2 × ~ (2)

S a ( A ), −p− → v≥0
(
1 a3 · ~
∆p1 a3 p2 ∈ (3)
S a ( B ), −p− → v<0
1 a3 · ~

where p1 , p2 are the intersection points of plane b̄ and triangle ∆a, ~u is the known normal
vector pointing either inward or outward of the model. Vector ~v is the vector that lies on
the plane of triangle ∆a and is perpendicular to vector − p−→ −−→ v ≥ 0,
1 p2 (Equation (2)). If p1 a3 · ~
triangle ∆p1 a3 p2 is included in set Sa ( A) (marked in red in the figure), triangles ∆p1 p2 a2 ,
∆p1 a2 a1 are included into set Sa ( B) (marked in green in the figure). Whereas triangle
∆p1 a3 p2 is included in set Sa ( B), triangles ∆p1 p2 a2 and ∆p1 a1 a2 are include in set Sa ( A).
Note that the vector − p−→
1 p2 of adjacent triangles should be in the same direction.
Remote Sens. 2023, 15, 3155 8 of 20

For partitioning triangles in a nonintersecting area, such as ∆o in Figure 6, we locate the


closest intersection point p on the intersection line to ∆o, then categorize ∆o in either Sa ( A)
or Sa ( B) based on the already divided triangles near p, according to the following equation:

k k k k
∑ dis(o, ai ) − ∑ dis( p, ai ) ≤ ∑ dis(o, bi ) − ∑ dis( p, bi ) (4)
i =1 i =1 i =1 i =1

where k is the K-neighborhood of the two divided triangle sets Sa ( A) and Sa ( B) taken at
point p, and dis( a, b) represents the Euclidean distance from ∆a to ∆b. In this way, we split
Sa into two subsurfaces, Sa = Sa ( A) ∪ Sa ( B).

Figure 6. Partition triangles in nonintersecting area.

We apply these operations to all surfaces, splitting them into a set of candidate faces
Scandi that are segmented along intersecting lines. This approach is crucial in producing
high-quality boundaries in the final model. Note that this meshing and splitting algorithm
operates discretely on each surface and triangle and can be well parallelized. In practice,
the C++ implementation of the parallelized version has an efficiency improvement of
approximately 10×, as discussed in Section 4.

3.3. Selection Module


We treat the reconstruction problem as an optimal subset selection problem, and for
this we introduce the selection module to choose appropriate meshes from candidate sur-
faces Scandi and combine them into a reasonable reconstruction model. We define the
following data fitting and 3D structural similarity energy terms to constitute the optimization
objective function.

3.3.1. Energy Terms


The set Scandi = {si |1 ≤ i ≤ Nsc } of Nsc candidate surfaces is known from the mesh
fitting and splitting module; we define the binary variable x to represent whether si is chosen
(i.e., x = 1) or not (i.e., x = 0).
Data fitting. This term assesses the degree of alignment between the generated sur-
faces and the point sets while taking into consideration the points’ confidence [31,44]. Its
definition is as follows:
1 Nsc
N i∑
Ef = 1 − xi · supp(si ) (5)
=1

where N represents the total number of points in the point cloud P , while supp(si ) factors
in the distance between the point and the surface, the point distribution, and the local
sampling uniformity, as follows:
 
dist( p, s)
supp(s) = ∑ 1−
e
· conf( p) (6)
p,s|dist( p,s)<e
n o
dist( p, s) = min dist( p, f j )| f ∈ s, 1 ≤ j ≤ N f (7)
Remote Sens. 2023, 15, 3155 9 of 20

!
1 3 3λ1i λ2i
conf( p) = ∑ 1 − 1 · (8)
3 i =1 λi + λ2i + λ3i λ3i
where dist( p, s) in Equation (7) represents the Euclidean distance from a point p ∈ P to
the candidate surface s. Only points whose distance from the surface is less than e are
considered. Mesh s contains N f faces, denoted by f .
λ1i ≤ λ2i ≤ λ3i in Equation (8) are the three eigenvalues of the covariance matrix at scale
i. The property 1 − 3λ1 /λ1 + λ2 + λ3 in conf( p) measures the quality of fitting a tangent
plane in the local neighborhood. A value closer to 0 indicates a poor point distribution,
whereas a value of 1 implies a perfectly fitting plane. The property λ2 /λ3 in conf( p) gauges
the uniformity of point sampling in the local neighborhood. Its value ratio ranges from 0 to
1, with 0 indicating perfect line distribution and 1 representing uniform disk distribution.
In essence, the data fitting term biases the final result towards selecting candidate faces
that are proximal to the input points and have a dense and uniform point distribution.
3D structural similarity. To ensure the reliability of the reconstruction results, we cannot
rely solely on data fitting because it may stubbornly select surfaces around data points,
and the input point cloud may contain defects such as noise or missing data. Additionally,
there may be gaps in the boundary area after the primitive detection module, and mesh splitting
may lead to nonunique intersections in the missing or gap area (shown as Figure 7a).
These factors make data fitting unable to choose a reasonable result. Hence, we add a 3D
structural similarity term that considers the global structure information of the model to the
objective function.

(a) (b) (c)

Figure 7. (a) There may be data missing (self-defective or discarded by the primitive detection module)
and nonunique intersections at boundaries. (b) Reconstruction result obtained solely through the
data fitting term. (c) Reconstruction result with the 3D structural similarity added.

The input point set P contains structural information, and the final selected surface
set Sout ∈ Scandi should be structurally similar to the input point cloud, defined as

Ess = 1 − similarity(Sout , P ) (9)


Nsc
Sout = ∑ xi · si (10)
i =1

where P is the known segmented patches, and the range of similarity(Sout , P ) is (0, 1],
where the closer the value is to 1, the higher the similarity. It is defined as

2µS · µ P 2σ · σ σSP
similarity(Sout , P ) = 2 2
· 2 S 2P · (11)
µS + µ P + η σS + σP + η σS · σP + η
where η is a small number to avoid division by zero. µ P and σP represent the mean and
variance of P , µS and σS represents the mean and variance of Sout , and σSP represents the
covariance between P and Sout . These values are computed by randomly sampling the
same number of points from both the point cloud and the mesh, and calculating them
Remote Sens. 2023, 15, 3155 10 of 20

based on the coordinates of the sampled points. According to Equation (10), they can be
written as follows:
Nsc Nsc Nsc
µS = ∑ xi · µ si , σS = σ ( ∑ xi · si ), σSP = σ ( ∑ xi · si , P ) (12)
i =1 i =1 i =1

Similarly, a fixed number of sampling points is used to calculate them. This transforms
the problem into a binary linear combination optimization problem, which is optimized
together with the data fitting term.
The 3D structural similarity term aims to minimize the distribution difference between
the surface set Sout and the point set P , resulting in a reconstruction that has a similar
global structure to P .
Figure 7 illustrates the effect of this term. When there are gaps between patches and
nonunique intersections, the data fitting term alone may not result in a reasonable recon-
struction (Figure 7b). However, after adding the 3D structural similarity term (Figure 7c),
the optimizer can achieve the desired result.

3.3.2. Optimization
We use the energy terms defined above to formulate the following optimization model,
which selects the best set of candidate surfaces and ensures the watertightness of the final
model through hard constraints:

min λ f · E f + λss · Ess


X

∑ j∈ N (ei ) x j = 2 or 0,
 1 ≤ i ≤ | E| (13)
s.t.

xi ∈ {0, 1}, 1 ≤ i ≤ | E|

∑ j∈ N (ei ) x j = 2 or 0 ensures that an intersecting boundary ei is adjacent to only two


surfaces in the final result, thereby ensuring the watertightness of the final model. We solve
this binary optimization model using the Gurobi [45] optimizer.

4. Results
4.1. Datasets
We present experimental results on two widely-used datasets of manmade objects,
namely, ABCParts [9] and Thingi10K [46]. ABCParts is a subset of the ABC dataset [47],
which is considered a standard benchmark for learning-based primitive detection meth-
ods [9,10,48] in recent times. It comprises the point cloud of 30k CAD models, where each
point cloud has 10, 000 points and at least one curved surface. We trained and evaluated
our primitive detection network on this dataset.
For nonlearnable modules, Thingi10K is more challenging in demonstrating algorithm
performance. It consists of over 10k objects that have been uploaded by users for 3D
printing. We selected models containing curved surfaces and sharp edges to demonstrate
the framework’s efficacy.

4.2. Experiment Details and Results Analysis


Training strategies. For the convenience of comparison, except for the improved part,
we adopt the same training strategies as HPNet, and adopt the same loss function and
weights. Refer to Section 3.1 for the improvements. The same strategies include 24k/4k/4k
dataset division, downsampling to 7000 points for each point cloud, input channel contains
point + normal, same data augmentation, and learning rate lr = 0.001 for 150 epochs. The
throughput is measured with an Nvidia GeForce RTX 3090 24 GB GPU and a 16-core Intel
i7 @2.8 GHz CPU.
Evaluation metrics. We use the following metrics for evaluating the segmentation and
primitive labeling.
Remote Sens. 2023, 15, 3155 11 of 20

• Seg-IoU: this metric measures the similarity between the predicted patches and ground
truth segments: K1 ∑kK=1 IoU (W [:, k], Ŵ [:, k]), where W is the predicted segmentation
membership for each point cloud, Ŵ is the ground truth, and K is the number of
ground truth segments.
• Type-IoU: this metric measures the classification accuracy of primitive type prediction:
1 K ˆ
K ∑k =1 I[ tk = tk ], where tk is the predicted primitive type for the kth segment patch
and tˆk is the ground truth. I is an indicator function.
• Throughput: this metric measures the efficiency performance of the network: ins./sec.,
meaning maximum number of instances the network can handle per second.
We use Hungarian matching [49] to find correspondences between predicted segments
and ground-truth segments.
Analysis of results. Our framework’s reconstruction results, both quantitative and qual-
itative, are presented in Table 1 and Figure 8, respectively, for the ABCParts benchmark.
The detection performance of our primitive detection module, as shown in Table 1, out-
performs other state-of-the-art methods, particularly in terms of throughput efficiency,
which can be attributed to the PointNet++ backbone. Additionally, Figure 8 shows that
our method produces high-quality watertight models that are close to the ground truth
and surpass all previous methods. This is thanks to our proposed mesh fitting and splitting
module and selection module.

Input

Ground
Truth

ParSeNet
Segments

ParSeNet

HPNet
Segments

HPNet

Ours
Segments

Ours

(a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k)

Figure 8. (a–k) Primitive segmentation and surface reconstruction results. From top to bottom: input,
ground truth, segments produced by ParSeNet, surfaces fitted by ParSeNet, segments produced by
HPNet, surfaces produced by HPNet and ball pivoting [50], our segments produced by primitive
detection module, and final reconstructed models.

Table 1. Benchmark evaluation on primitive detection module and baseline approaches.

Method Seg-IoU (%) Type-IoU (%) Throughput (ins./sec.)


SPFN [8] 73.41 80.04 21
ParSeNet [9] 82.14 88.6 8
HPNet [10] 85.24 91.04 8
Ours 88.42 92.85 28
Remote Sens. 2023, 15, 3155 12 of 20

In contrast, the primitive blocks detected by ParSeNet [9] may contain breaks or errors,
particularly near boundaries. Although it incorporates a learnable fit module to reconstruct
mesh models, it may generate gaps or overlaps (such as (a), (f), (h), and (j) in the third
row). This highlights the difficulty of training the surface fitting task compared to primitive
detection. HPNet produces more precise primitive detection results than ParSeNet, but it
may encounter oversegmentation issues ((c), (d), (e), (i), and (k) in the fifth row). As HPNet
does not consider mesh model reconstruction, for comparison, we used the classical mesh
reconstruction algorithm ball pivoting [50] to obtain the mesh models (sixth row).
Ablation analysis. Table 2 displays the enhancements of our primitive detection module
compared to the baseline. The most significant improvement comes from the backbone,
which not only boosts the throughput from 8 to 28 instances per second but also enhances
the detection score. Moreover, incorporating more advanced training strategies such as
AdamW [51], cosine learning rate decay, and label smoothing [52] can also lead to improved
performance.

Table 2. Ablation analysis of improvements.

Improvements Seg-IoU (%) Type-IoU (%) Throughput (ins./sec.)


Baseline (HPNet [10]) 85.24 91.04 8
+DGCNN [34] →
86.88 92.23 28
PointNeXt-b [36]
+Adam [53] → AdamW [51] 87.34 92.74 28
+Step → Cosine 87.50 92.67 28
+Label Smoothing [52] 88.42 92.85 28

Comparison with learning-based reconstruction methods. Figure 9 provides a compar-


ison of our method with recently popular deep-learning-based surface reconstruction
methods. Please refer to Section 2.1.2 for more information on these methods. We utilized
the authors’ open-source code and pretrained models while following the recommended
settings in their papers.

Ground
Truth

Points2Surf

LIG

ConvONet
(3p)

ConvONet
(grid 32)

Ours

Figure 9. Comparison with deep-learning-based reconstruction methods.

Overall, most learning-based methods train networks to predict occupancy or SDF


values, making them unsuitable for representing hollow shapes such as those indicated
Remote Sens. 2023, 15, 3155 13 of 20

by the blue boxes in the figure. The reconstruction results of Points2surf [26] have surface
artifacts and poor boundary quality, possibly because of their design of separate local and
global branches. Local Implicit Grid (LIG) [27] performs better since it trains the network to
learn parts rather than the entire object, making it more local and having better generaliza-
tion in surface reconstruction tasks. However, as mentioned by the authors, it may suffer
from the “back-faces” problem, as shown by the yellow boxes. Convolutional Occupancy
Networks (ConvOnet) [28] proposes two networks that use 2DCNN and 3DCNN, respec-
tively, named ConvOnet-3plane (shown as ConvOnet-3p in the figure) and ConvOnet-grid
(shown as ConvOnet-grid32 in the figure, with a default resolution of 323 ). We conducted
comparative tests on both models. The three-plane network is more efficient but has poorer
representation capabilities, which may result in artifacts in the reconstruction results (as
shown by the red boxes in the figure). The grid model has better representation capabilities
but is relatively less efficient. Moreover, since ConvOnet uses one latent vector to represent
one object, it may make semantic recognition errors and not be as local as Points2surf or
ConvOnet, as shown by the green box in the figure.
In contrast, our method achieved the best reconstruction results, with a clean surface
and sharp edges, with only a small amount of distortion at the boundary of the last model
(shown by the green box in the last row).
Comparison with traditional reconstruction methods. Figure 10 illustrates a comprehen-
sive comparison between our method and five other well-known reconstruction techniques.
They include the classical reconstruction algorithm screened Poisson (SCP) [18], as well as
four methods specifically designed for preserving sharp features: APSS [54], RIMLS [20],
EAR [21], and PolyFit [31], which are detailed in Section 2. Two models of varying com-
plexity are used: a Vase with only four surfaces (top of Figure 10) and a Fandisk with
approximately 20 faces (bottom of Figure 10). Both models utilize point clouds with 10k
points and moderate levels of Gaussian noise (σ = 0.02d, where d represents the length of
the diagonal of the bounding boxes).

Input Points
SCP APSS RIMLS EAR PolyFit Ours
(with noise)

Figure 10. Comparison with traditional meshing methods (top: Vase; bottom: Fandisk).

SCP accurately reconstructs the surface noise but lacks specific smoothing on surface
or sharpening effects on the boundaries. APSS and RIMLS, with their piecewise smoothing
design, partially smooth the surface noise but introduce jagged edges at the boundaries.
EAR effectively smooths the surface noise and enhances sharp edges; however, exten-
sive upsampling leads to deformations near the boundaries and an excessive number
of triangles in the reconstruction. PolyFit represents all surfaces as planes, rendering it
unsuitable for models with curved surfaces. In contrast, our method excels at accurate
shape reconstruction while preserving clear and sharp boundaries, resulting in superior
visual outcomes compared to the other techniques.
Timeliness analysis. Table 3 provides statistics on the reconstruction results in Figure 10,
including the number of triangles in the reconstructed meshes and the runtime of the
algorithms. For SCP, APSS, and RIMLS, the open-source versions available in MeshLab [55]
Remote Sens. 2023, 15, 3155 14 of 20

with their default settings were utilized. Regarding EAR and PolyFit, we employed the
executable programs provided by their respective authors and followed the guidelines
outlined in their papers for parameter tuning and usage.
The comparison reveals that traditional methods such as SCP, APSS, and EAR, which
do not incorporate segmentation, exhibit relatively fast algorithm speeds. However, when
dealing with more complex models, they necessitate dense triangle representations for
accurate reconstruction. Specifically, EAR, with its extensive upsampling for edge en-
hancement, produces an excessive number of triangles in the reconstruction, leading to
significantly longer runtime. Conversely, PolyFit represents the entire model using planes
for lightweight representation, but it struggles to accurately capture shapes containing
curved surfaces.

Table 3. Statistics on the examples presented in Figure 10.

Vase Fandisk
Faces. Sec. Faces. Sec.
Screened Possion [18] 9996 1.98 40028 3.08
APSS [54] 9996 1.59 17815 4.34
RIMLS [20] 9996 2.41 17801 6.63
EAR [21] 181170 128.98 272593 202.39
PolyFit [31] 38 1.81 17 3.39
Ours 4068 7.92 6403 69.09
Ours + OpenMP [56] 4068 2.26 6403 7.35

In our reconstruction framework, explicit segmentation is incorporated, yet the al-


gorithm’s runtime remains within an acceptable range. We also implemented a parallel
accelerated version utilizing OpenMP [56]. Since our algorithm operates on each face
independently, it can be effectively parallelized, further reducing the runtime. Further-
more, our method does not require dense triangles to preserve sharp features, allowing for
lightweight representations. Users can adjust the number of triangles according to their
needs, which is discussed in detail in subsection 4.3.
Geometric error analysis. To analyze the errors of various reconstruction methods, we
visualized the noise-free original point cloud and computed the shortest distances to the
reconstructed meshes. The resulting heatmap, colored based on the shortest distance, is
presented in Figure 11. Additionally, we provide statistical results for relevant error metrics
in Table 4.

Original Points
SCP APSS RIMLS EAR PolyFit Ours
(without noise)

Figure 11. Heatmap of geometric error (top: Vase, bottom: Fandisk). To visualize the reconstruct
error, the point cloud is colored based on the distance between each point and the surface of the
reconstructed model (Figure 10). The color scale ranges from blue (representing a shorter distance) to
red (indicating a longer distance).
Remote Sens. 2023, 15, 3155 15 of 20

It can be observed that SCP exhibits a global error distribution, primarily due to the
presence of noise. APSS and RIMLS can only smooth relatively low-level surface noise
and may increase errors at the boundaries. EAR can effectively smooth the surface and
enhance the boundaries, but it also introduces increased boundary errors. In comparison,
our method achieves the overall minimum geometric error and accurately reconstructs the
boundaries, resulting in the most accurate reconstruction.

Table 4. Statistics on the examples presented in Figure 11.

Method Vase Fandisk


Shortest Dis. Hausdorff Dis. Mean Dis. Median Dis. Shortest Dis. Hausdorff Dis. Mean Dis. Median Dis.
(10−9 mm) (10−3 mm) (10−3 mm) (10−3 mm) (10−9 mm) (10−3 mm) (10−3 mm) (10−3 mm)

Screened Possion [18] 27.35 8.014 1.766 1.390 117.8 20.24 2.983 2.572
APSS [54] 149.0 5.579 1.120 0.897 26.28 19.32 3.051 2.806
RIMLS [20] 31.28 5.663 1.296 1.022 97.25 19.86 3.515 3.303
EAR [21] 8.227 10.57 1.647 1.152 197.7 19.95 3.843 3.575
PolyFit [31] 3525 210.7 53.52 36.32 1586 260.4 45.34 31.94
Ours 0.003 8.132 1.499 1.343 15.82 18.85 1.162 1.008

4.3. Exploratory Experiments


In this subsection, we present exploratory experiments that demonstrate the exten-
sible possibilities of our proposed framework and modules. These experiments include
integrating with other tasks, conducting low-density mesh representation performance
testing, and performing more robustness testing.
Combined with Edge extraction. Our proposed framework can be applied not only to
primitive detection, but also flexibly combined with other point cloud processing tasks,
such as edge extraction, to achieve high-fidelity reconstruction models.
To demonstrate this, we selected three state-of-the-art and representative works and
tested them on the Thingi10K dataset. They belong to heuristic traversal algorithms,
learning-based methods, and semiautomatic methods with manual assistance:
Merigot et al. [57] detected edges by thresholding the Voronoi covariance measure (VCM),
Wang et al. [48] detected edges by supervised training of a neural network named PIE-Net,
and Zhuang et al. [58] extracted feature edges semiautomatically by combining geodesic
distance and hand-marking labels, named Live-Wire.
Given the boundaries output by these methods, we segmented the original point
cloud into patches using a fixed small neighborhood and performed reclustering. Then,
we combined our proposed mesh fitting and splitting module and selection module to pro-
duce reconstruction models. Figure 12 shows the visualization results. It is evident that
our framework can still achieve high-fidelity and watertight mesh models from the ex-
tracted boundaries.
Combined with RANSAC. Our method can also be combined with the classical RANSAC [4,5]
method for more application scenarios. Figure 2 shows the results on a real scanned urban
building point cloud, where it can be seen that our proposed module still produces high-
quality models with preserved sharp features. It should be noted that RANSAC [5] requires
careful tuning for each input and needs to be combined with our proposed refine submodule
to obtain satisfactory segmentation patches.
Low-density mesh representation. Our method is distinct from common reconstruction
algorithms that rely on dense triangle meshes to achieve high fidelity. To demonstrate its
ability to handle low-density meshes, we designed the following experiment. As shown
in Figure 13, we gradually reduced the triangle density during mesh fitting for the same
input point cloud and tested our method’s reconstruction effect. As the number of triangle
faces decreases, the surface becomes more distorted, but the intersecting boundary remains
sharp. Therefore, users can adjust the appropriate number of triangular faces according to
their needs to obtain lightweight models and avoid poor boundaries. This is beneficial for
subsequent efficient storage and computation.
Remote Sens. 2023, 15, 3155 16 of 20

Input
Points

Edge
Detection

Clustering
&
Segmentation

Candidate
Surfaces

Results

(a) (b) (c) (d) (e) (f)

Figure 12. Demonstration of the effectiveness of our framework combined with edge extraction.
From top to bottom: input, results produced by three different types of edge detection methods,
results after clustering and segmentation, candidate surfaces produced by the proposed mesh splitting
module, and final outputs of selection module. Panels (a,b) are examples of VCM [57], (c,d) are examples
of the learning-based method PIE-Net [48], and (e,f) are examples of the semiautomatic method
Live-Wire [58].

(a) Input Points (b) 7758 Faces (c) 3408 Faces

(d) 2275 Faces (e) 1382 Faces (f) 698 Faces


Figure 13. Our method can easily handle meshes represented by low-density triangles without
sacrificing boundary quality, because we do not require dense triangles to ensure high-fidelity. Users
can choose triangle density according to their needs.

Robustness evaluation. To test the robustness of our algorithm, we examined the recon-
struction effect of our framework on noisy and missing data.
As depicted in Figure 14, we added random Gaussian noise during the training of
the primitive detection module as a data augmentation technique. During testing, we
added varying degrees of Gaussian noise to the same point cloud, with d representing
Remote Sens. 2023, 15, 3155 17 of 20

the length of an edge of the object. Our algorithm demonstrated robustness to moderate
Gaussian noise.
d

(a) (b) (c) (d)

Figure 14. Reconstruction results of the CAD component with increasing Gaussian noise. Top: input.
Bottom: reconstruction. (a) Input without noise. (b) σ = 0.01d. (c) σ = 0.04d. (d) σ = 0.06d.

For missing data, as shown in Figure 15, we artificially cropped the input point cloud.
Despite this, the refine submodule Section 3.1.2 was still able to detect all patches and
merge them via normal angle. The mesh fitting and segmentation module is responsible for
filling in the missing parts based on the merged patches. It is worth noting that our default
HRBF-based meshing method [16] is required here, as screened Poisson will not work. The
reconstruction framework was still able to produce the desired result.

(a) (b) (c)

Figure 15. Reconstruction of missing data. (a) Point cloud with missing data. (b) Candidate surfaces.
(c) Reconstruction result.

5. Conclusions
Our work presents a novel framework for 3D mesh reconstruction from point clouds,
which is based on primitive detection and is designed to preserve sharp features with high
fidelity. Unlike previous methods, our approach emphasizes achieving high-quality overall
reconstruction, particularly on sharp boundaries. To achieve this goal, we developed
multiple modules that cover the entire reconstruction process and result in watertight
and high-fidelity models. Our method outperforms the state-of-the-art on most metrics,
and produces models that are closer to the ground truth and have smaller errors than recent
learning-based reconstruction and classic mesh reconstruction methods. Additionally, we
demonstrate the versatility of our designed modules by applying them to other tasks, such
as edge extraction and RANSAC, resulting in high-quality models.
As larger networks and datasets become available, we expect further improvements
in feature extraction and primitive detection for point clouds, making our framework even
more valuable for various applications in the future.
Future prospects. We believe that, compared to 2D datasets, the current availability and
quality of 3D segmentation datasets are still limited. This particularly applies to primitive
segmentation data, which is why our method currently performs well only on CAD
data. However, the modules within our framework are designed to be flexible, enabling
Remote Sens. 2023, 15, 3155 18 of 20

them to adapt to future developments. We anticipate that as larger networks and higher-
quality datasets, including urban buildings and objects in indoor scenes, become available,
or with further advancements in unsupervised segmentation methods, the reconstruction
performance of our method will continue to improve, making our framework even more
valuable for various applications in the future.
Limitations. It is worth noting that primitive representation is a strong prior, and our
method may not be applicable to all types of objects that are not suitable for representation
using primitives. Additionally, in the selection module, we treat the reconstruction problem
as a binary linear combination optimization problem, which may limit our method’s
ability to complete large missing areas in the input point cloud, such as in the case of
extensive missing data. In these cases, our method may not be able to achieve the correct
reconstruction result, unlike recent deep-learning-based methods [23,24,28].

Author Contributions: Conceptualization, Q.L.; funding acquisition, J.X.; project administration, J.X.;
supervision, J.X., S.X. and Y.W.; validation, Q.L., J.X., S.X. and Y.W.; visualization, Q.L.; writing—
original draft, Q.L.; writing—review and editing, J.X., S.X. and Y.W. All authors have read and agreed
to the published version of the manuscript.
Funding: This work is supported by the National Natural Science Foundation of China (U2003109,
U21A20515, 62102393,62206263,62271467), the Strategic Priority Research Program of the Chinese
Academy of Sciences (No. XDA23090304), the Youth Innovation Promotion Association of the Chinese
Academy of Sciences (Y201935), the State Key Laboratory of Robotics and Systems (HIT) (SKLRS-2022-
KF-11), and the Fundamental Research Funds for the Central Universities and China Postdoctoral
Science Foundation (2022T150639, 2021M703162).
Data Availability Statement: This work uses the following datasets, all of which can be obtained
from the internet. ABC at https://deep-geometry.github.io/abc-dataset/ (accessed on 14 October
2022); Thingi10K at https://ten-thousand-models.appspot.com/ (accessed on 14 October 2022).
Acknowledgments: The authors would like to thank the reviewers for their valuable comments
and suggestions.
Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Berger, M.; Tagliasacchi, A.; Seversky, L.M.; Alliez, P.; Guennebaud, G.; Levine, J.A.; Sharf, A.; Silva, C.T. A Survey of Surface
Reconstruction from Point Clouds. Comput. Graph. Forum 2017, 36, 301–329. [CrossRef]
2. Berger, M.; Tagliasacchi, A.; Seversky, L.M.; Alliez, P.; Levine, J.a.; Sharf, A.; Silva, C.T.; Tagliasacchi, A.; Seversky, L.M.; Silva,
C.T.; et al. State of the Art in Surface Reconstruction from Point Clouds. In Proceedings of the 35th Annual Conference of the
European Association for Computer Graphics, Eurographics 2014-State of the Art Reports (No. CONF), Strasbourg, France, 7–11
April 2014; Volume 1, pp.161–185.
3. Kaiser, A.; Ybanez Zepeda, J.A.; Boubekeur, T. A Survey of Simple Geometric Primitives Detection Methods for Captured 3D
Data. Comput. Graph. Forum 2019, 38, 167–196. [CrossRef]
4. Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and
Automated Cartography. In Readings in Computer Vision; Fischler, M.A., Firschein, O., Eds.; Morgan Kaufmann: San Francisco,
CA, USA, 1987; pp. 726–740. [CrossRef]
5. Schnabel, R.; Wahl, R.; Klein, R. Efficient RANSAC for point-cloud shape detection. Comput. Graph. Forum 2007, 26, 214–226.
[CrossRef]
6. Yan, D.M.; Wang, W.; Liu, Y.; Yang, Z. Variational mesh segmentation via quadric surface fitting. CAD Comput. Aided Des. 2012,
44, 1072–1082. [CrossRef]
7. Lafarge, F.; Mallet, C. Creating large-scale city models from 3D-point clouds: A robust approach with hybrid representation. Int.
J. Comput. Vis. 2012, 99, 69–85. [CrossRef]
8. Li, L.; Sung, M.; Dubrovina, A.; Yi, L.; Guibas, L.J. Supervised fitting of geometric primitives to 3D point clouds. In Proceedings
of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019;
pp. 2647–2655.
9. Sharma, G.; Liu, D.; Maji, S.; Kalogerakis, E.; Chaudhuri, S.; Měch, R. ParSeNet: A Parametric Surface Fitting Network for 3D
Point Clouds. Lect. Notes Comput. Sci. 2020, 12352, 261–276.
10. Yan, S.; Yang, Z.; Ma, C.; Huang, H.; Vouga, E.; Huang, Q. HPNet: Deep Primitive Segmentation Using Hybrid Representations
In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021.
Remote Sens. 2023, 15, 3155 19 of 20

11. Lorensen, W.E.; Cline, H.E. Marching Cubes: A High Resolution 3D Surface Construction Algorithm. SIGGRAPH Comput. Graph.
1987, 21, 163–169. [CrossRef]
12. Alexa, M.; Behr, J.; Cohen-Or, D.; Fleishman, S.; Levin, D.; Silva, C.T. Computing and rendering point set surfaces. IEEE Trans.
Vis. Comput. Graph. 2003, 9, 3–15. [CrossRef]
13. Wang, H.; Scheidegger, C.E.; Silva, C.T. Bandwidth selection and reconstruction quality in point-based surfaces. IEEE Trans. Vis.
Comput. Graph. 2009, 15, 572–582. [CrossRef]
14. Carr, J.C.; Beatson, R.K.; Cherrie, J.B.; Mitchell, T.J.; Fright, W.R.; McCallum, B.C.; Evans, T.R. Reconstruction and representation
of 3D objects with radial basis functions. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive
Techniques, SIGGRAPH 2001, Los Angeles, CA, USA, 12–17 August 2001; pp. 67–76. . [CrossRef]
15. Brazil, E.V.; Macedo, I.; Sousa, M.C.; de Figueiredo, L.H.; Velho, L. Sketching Variational Hermite-RBF Implicits. In Proceedings
of the Seventh Sketch-Based Interfaces and Modeling Symposium, Annecy, France, 7–10 June 2010; SBIM ’10, pp. 1–8.
16. Huang, Z.; Carr, N.; Ju, T. Variational implicit point set surfaces. ACM Trans. Graph. 2019, 38, 124. [CrossRef]
17. Kazhdan, M.; Bolitho, M.; Hoppe, H. Poisson Surface Reconstruction. In Proceedings of the Fourth Eurographics Symposium on
Geometry Processing; Sardinia, Italy, 26–28 June 2006; SGP ’06, pp. 61–70.
18. Kazhdan, M.; Hoppe, H. Screened poisson surface reconstruction. ACM Trans. Graph. 2013, 32, 29. [CrossRef]
19. Fleishman, S.; Cohen-Or, D.; Silva, C.T. Robust moving least-squares fitting with sharp features. ACM Trans. Graph. 2005,
24, 544–552. [CrossRef]
20. Öztireli, A.C.; Guennebaud, G.; Gross, M. Feature Preserving Point Set Surfaces based on Non-Linear Kernel Regression. Comput.
Graph. Forum 2009, 28, 493–501. [CrossRef]
21. Huang, H.; Wu, S.; Gong, M.; Cohen-Or, D.; Ascher, U.; Zhang, H.R. Edge-aware point set resampling. ACM Trans. Graph. 2013,
32, 9. [CrossRef]
22. Lipman, Y.; Cohen-Or, D.; Levin, D.; Tal-Ezer, H. Parameterization-free projection for geometry reconstruction. In Proceedings of
the ACM SIGGRAPH Conference on Computer Graphics, San Diego, CA, USA, 5–9 August 2007. . [CrossRef]
23. Chen, Z.; Zhang, H. Learning Implicit Fields for Generative Shape Modeling. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019.
24. Park, J.J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. DeepSDF: Learning Continuous Signed Distance Functions for
Shape Representation. In Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long
Beach, CA, USA, 15–20 June 2019.
25. Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy Networks: Learning 3D Reconstruction in Function
Space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20
June 2019.
26. Erler, P.; Guerrero, P.; Ohrhallinger, S.; Mitra, N.J.; Wimmer, M. Points2Surf: Learning Implicit Surfaces from Point Clouds. In
Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; pp. 108–124. [CrossRef]
27. Jiang, C.M.; Sud, A.; Makadia, A.; Huang, J.; Nießner, M.; Funkhouser, T. Local Implicit Grid Representations for 3D Scenes. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020.
28. Songyou, P.; Michael, N.; Lars, M.; Marc, P.; Andreas, G. Convolutional Occupancy Networks. In Proceedings of the European
Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020.
29. Schnabel, R.; Degener, P.; Klein, R. Completion and reconstruction with primitive shapes. Comput. Graph. Forum 2009, 28, 503–512.
[CrossRef]
30. Lafarge, F.; Alliez, P. Surface reconstruction through point set structuring. Comput. Graph. Forum 2013, 32, 225–234. [CrossRef]
31. Nan, L.; Wonka, P. PolyFit: Polygonal Surface Reconstruction from Point Clouds. In Proceedings of the IEEE International
Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2372–2380. [CrossRef]
32. Sharma, G.; Goyal, R.; Liu, D.; Kalogerakis, E.; Maji, S. CSGNet: Neural Shape Parser for Constructive Solid Geometry. In
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA,
18–23 June 2018; pp. 5515–5523.
33. Yu, F.; Chen, Z.; Li, M.; Sanghi, A.; Shayani, H.; Mahdavi-Amiri, A.; Zhang, H. CAPRI-Net: Learning Compact CAD Shapes
with Adaptive Primitive Assembly In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
Nashville, TN, USA, 20–25 June 2021. [CrossRef]
34. Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. ACM
Trans. Graph. 2019, 38, 146. [CrossRef]
35. Comaniciu, D.; Meer, P. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2002,
24, 603–619. [CrossRef]
36. Qian, G.; Li, Y.; Peng, H.; Mai, J.; Hammoud, H.; Elhoseiny, M.; Ghanem, B. PointNeXt: Revisiting PointNet++ with Improved
Training and Scaling Strategies. In Advances in Neural Information Processing Systems; Koyejo, S., Mohamed, S., Agarwal, A.,
Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2022; Volume 35, pp. 23192–23204.
37. Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In
Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December
2017; NIPS’17, pp. 5105–5114.
Remote Sens. 2023, 15, 3155 20 of 20

38. Ma, X.; Qin, C.; You, H.; Ran, H.; Fu, Y. Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP
Framework. In Proceedings of the International Conference on Learning Representations, Virtual Event, 25–29 April 2022.
39. Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point Transformer. In Proceedings of the IEEE/CVF International Conference on
Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 16259–16268.
40. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In
Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June
2018; pp. 4510–4520. [CrossRef]
41. Lu, X.; Yao, J.; Tu, J.; Li, K.; Li, L.; Liu, Y. Pairwise linkage for point cloud segmentation. ISPRS Ann. Photogramm. Remote Sens.
Spat. Inf. Sci. 2016, 3, 201–208. [CrossRef]
42. Alliez, P.; Giraudot, S.; Jamin, C.; Lafarge, F.; Mérigot, Q.; Meyron, J.; Saboret, L.; Salman, N.; Wu, S.; Yildiran, N.F. Point Set
Processing. In CGAL User and Reference Manual, 4th ed.; CGAL Editorial Board: New York, NY, USA, 2022.
43. Kettner, L.; Meyer, A.; Zomorodian, A. Intersecting Sequences of dD Iso-oriented Boxes. In CGAL User and Reference Manual, 3rd
ed.; CGAL Editorial Board: New York, NY, USA, 2021.
44. Nan, L.; Sharf, A.; Zhang, H.; Cohen-Or, D.; Chen, B. SmartBoxes for interactive urban reconstruction. In ACM Siggraph 2010
Papers, Siggraph 2010; ACM: New York, NY, USA, 2010. [CrossRef]
45. Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual. 2021. Available online: https://www.gurobi.com accessed on
(27 April 2023).
46. Zhou, Q.; Jacobson, A. Thingi10K: A Dataset of 10, 000 3D-Printing Models. arXiv 2016, arXiv:abs/1605.04797.
47. Koch, S.; Matveev, A.; Williams, F.; Alexa, M.; Zorin, D.; Panozzo, D.; Files, C.A.D. ABC: A Big CAD Model Dataset For Geometric
Deep Learning. arXiv 2019, arXiv:1812.06216v2. http://xxx.lanl.gov/abs/arXiv:1812.06216v2.
48. Wang, X.; Xu, Y.; Xu, K.; Tagliasacchi, A.; Zhou, B.; Mahdavi-Amiri, A.; Zhang, H. PIE-NET: Parametric Inference of Point Cloud
Edges. Adv. Neural Inf. Process. Syst. 2020, 33, 20167–20178. http://xxx.lanl.gov/abs/2007.04883.
49. Kuhn, H.W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [CrossRef]
50. Bernardini, F.; Mittleman, J.; Rushmeier, H.; Silva, C.; Taubin, G. The Ball-Pivoting Algorithm for Surface Reconstruction. IEEE
Trans. Vis. Comput. Graph. 1999, 5, 349–359. [CrossRef]
51. Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the ICLR, New Orleans, LA, USA, 6–9 May
2019.
52. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June
2016; pp. 2818–2826. [CrossRef]
53. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980.
54. Guennebaud, G.; Gross, M. Algebraic point set surfaces. In Proceedings of the ACM SIGGRAPH Conference on Computer
Graphics, San Diego, CA, USA, 5–9August 2007. [CrossRef]
55. Cignoni, P.; Callieri, M.; Corsini, M.; Dellepiane, M.; Ganovelli, F.; Ranzuglia, G. MeshLab: An Open-Source Mesh Processing
Tool. In Proceedings of the Eurographics Italian Chapter Conference, Salerno, Italy, 2–4 July 2008; Scarano, V., Chiara, R.D., Erra,
U., Eds.; The Eurographics Association: Crete, Greece, 2008. . [CrossRef]
56. Chandra, R.; Dagum, L.; Kohr, D.; Menon, R.; Maydan, D.; McDonald, J. Parallel Programming in OpenMP; Morgan Kaufmann:
Burlington, MA, USA, 2001.
57. Mérigot, Q.; Ovsjanikov, M.; Guibas, L.J. Voronoi-based curvature and feature estimation from point clouds. IEEE Trans. Vis.
Comput. Graph. 2011, 17, 743–756. [CrossRef]
58. Zhuang, Y.; Zou, M.; Carr, N.; Ju, T. Anisotropic geodesics for live-wire mesh segmentation. Comput. Graph. Forum 2014,
33, 111–120. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like