Sharp Feature-Preserving 3D Mesh Reconstruction From Point Clouds Based On Primitive Detection
Sharp Feature-Preserving 3D Mesh Reconstruction From Point Clouds Based On Primitive Detection
Sharp Feature-Preserving 3D Mesh Reconstruction From Point Clouds Based On Primitive Detection
Article
Sharp Feature-Preserving 3D Mesh Reconstruction from Point
Clouds Based on Primitive Detection
Qi Liu 1 , Shibiao Xu 2 , Jun Xiao 1, * and Ying Wang 1
1 School of Artificial Intelligence, University of Chinese Academy of Sciences, No. 19 Yuquan Road,
Shijingshan District, Beijing 100049, China
2 School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
* Correspondence: [email protected]; Tel.: +86-10-8825-6566
Abstract: High-fidelity mesh reconstruction from point clouds has long been a fundamental research
topic in computer vision and computer graphics. Traditional methods require dense triangle meshes
to achieve high fidelity, but excessively dense triangles may lead to unnecessary storage and computa-
tional burdens, while also struggling to capture clear, sharp, and continuous edges. This paper argues
that the key to high-fidelity reconstruction lies in preserving sharp features. Therefore, we introduce
a novel sharp-feature-preserving reconstruction framework based on primitive detection. It includes
an improved deep-learning-based primitive detection module and two novel mesh splitting and
selection modules that we propose. Our framework can accurately and reasonably segment primitive
patches, fit meshes in each patch, and split overlapping meshes at the triangle level to ensure true
sharpness while obtaining lightweight mesh models. Quantitative and visual experimental results
demonstrate that our framework outperforms both the state-of-the-art learning-based primitive
detection methods and traditional reconstruction methods. Moreover, our designed modules are
plug-and-play, which not only apply to learning-based primitive detectors but also can be com-
bined with other point cloud processing tasks such as edge extraction or random sample consensus
(RANSAC) to achieve high-fidelity results.
leverage deep learning to learn semantics from large amounts of data, achieving better
generalization performance and higher extraction accuracy. However, as shown in Figure 1,
the recent state-of-the-art methods such as Parsenet [9] and HPNet [10] extract primitives
by training neural networks in a supervised manner, but they are more focused on the
accuracy of extraction and prediction and have not considered reconstruction results in
detail. For a more detailed introduction of related work, please refer to Section 2.
Figure 1. Comparison results between our method and the previous state-of-the-art methods
(ParSeNet [9] and HPNet [10]) for primitive detection and reconstruction. Our method aims to
preserve sharp boundary features and ultimately produce high-fidelity mesh models that are close to
the ground truth.
Figure 2. Results of our method combined with RANSAC [5] applied to real-scanned urban building
point clouds . Our proposed modules can be used as plug-and-play post-processing modules to
produce sharp and lightweight high-quality mesh models. (a) Input points, (b) RANSAC segments,
(c) our refined submodule outputs, (d) our selection module outputs, (e) reconstructed mesh model.
Next, in Section 2, we introduce related work, and in Section 3, we detail our frame-
work and each module’s specifics. Then, in Section 4, we present visualization and quan-
titative experimental results, and explore the flexibility and robustness of our proposed
framework for transfer to other tasks through a series of exploratory experiments.
2. Related Work
The technologies most relevant to this article are surface reconstruction and primitive de-
tection from point clouds, which we briefly describe in this section. For more comprehensive
discussion, please refer to the recent surveys [1–3].
jection upsamples and enhances sharp feature areas. Though these methods preserve or
enhance sharp edges, they still produce jagged edges or artifacts at the boundaries and do
not obtain complete and continuous boundaries compared to our method.
candidate faces through the intersection of planes detected by RANSAC, then selected the
optimal subset through binary linear optimization to generate lightweight reconstruction
results. It is only suitable for objects containing only planes.
In general, non-learning primitive-based methods suffer from the trouble of parameter
tuning, which requires manual adjustments for each model and primitive. However,
the idea of combining the primitives to generate models by selection and optimization
inspired our work.
3. Method
The input of our framework is a 3D point cloud P = { pi |1 ≤ i ≤ N }, and each point
pi contains position pi ∈ R3 and normal ni ∈ R3 ; therefore, pi ∈ R6 . Our goal is to finally
obtain a high-fidelity watertight mesh model.
Figure 3 shows the pipeline of our framework; this section details the implementation
of primitive detection module, mesh fitting and splitting module, and selection module. More
experiments are detailed in Section 4.
prediction vector ti ∈ {0, 1}6 (corresponds to 6 primitive types: plane, sphere, cylinder,
cone, B-spline-open, and B-spline-closed), and a shape parameter prediction vector si ∈ R22 .
With regard to the nontrainable post-processing part, HPNet constructs two constraints,
named geometric consistency and smoothness consistency, according to the primitive
parameters and the known normal. According to these two constraints, the embedding
descriptors e ∈ R N ×128 are then clustered by mean-shift [35] to obtain the final K patches
Pk , {1 ≤ k ≤ K } and P = P1 ∪ · · · ∪ PK .
HPNet achieves the best primitive detection performance due to its use of geometric
constraints, but the main limitation comes from the DGCNN backbone, which makes the
throughput during training significantly lower than other learning-based point cloud pro-
cessing methods, resulting in higher training costs. In addition, recent work PointNeXt [36]
proves that even the most classic and widely used backbone PointNet++ [37], after simply
adopting improved training strategies, can outperform some recent complex designed
backbones (such as PointMLP [38] and Point Transformer [39]).
Therefore, as per Figure 4, we keep the same two-stage design as HPNet and make the
following enhancements: (1) We replaced DGCNN with PointNeXt-b, a classic PointNet++
equipped with the bottleneck structure [40]), resulting in a significant improvement in
throughput and detection performance; (2) we switched from Adam optimizer to AdamW;
(3) we implemented cosine learning rate decay; and (4) we incorporated label smoothing.
As a result, our primitive detection network achieved higher segmentation mean IoU and
primitive type mean IoU scores of 88.42/92.85 on the ABCParts [9] benchmarks, compared
to the original scores of 85.24/91.04. Additionally, we improved throughput from 8 ins/sec
to 28 ins/sec. For more detailed experimental information, please refer to Section 4.
Figure 4. Primitive detection network. Learnable encoder predicts embedding descriptors, primitive
type prediction vectors, and shape parameter prediction vectors from input point clouds. Then the
mean-shift module clusters embedding descriptors through geometric consistency and smoothness
consistency to produces primitive segments.
following Equation (1), it indicates that the two patches P p and Pq have a good adjacent
smooth transition, and they will be merged.
−−−→ −−→
arccos|n(c p )> · n(cq )| ≤ θt (1)
where θt is the angle threshold that determines the curvature of the surface to be merged.
In summary, our refine clustering submodule effectively removes disturbances caused
by trivial patches and merges oversegmented patches.
Figure 5. Pairwise splitting. (a) A pair of intersecting triangles ∆a and ∆b. (b) Fit ∆b to a plane b̄.
(c) Split ∆a into three small triangles (∆p1 a3 p2 , ∆p1 p2 a2 , and ∆p1 a2 a1 ) according to intersection points
p1 and p2 , then divide them according to vectors ~v and ~u. (d) Process the triangles on intersection line.
~v = − p− → u
1 p2 × ~ (2)
S a ( A ), −p− → v≥0
(
1 a3 · ~
∆p1 a3 p2 ∈ (3)
S a ( B ), −p− → v<0
1 a3 · ~
where p1 , p2 are the intersection points of plane b̄ and triangle ∆a, ~u is the known normal
vector pointing either inward or outward of the model. Vector ~v is the vector that lies on
the plane of triangle ∆a and is perpendicular to vector − p−→ −−→ v ≥ 0,
1 p2 (Equation (2)). If p1 a3 · ~
triangle ∆p1 a3 p2 is included in set Sa ( A) (marked in red in the figure), triangles ∆p1 p2 a2 ,
∆p1 a2 a1 are included into set Sa ( B) (marked in green in the figure). Whereas triangle
∆p1 a3 p2 is included in set Sa ( B), triangles ∆p1 p2 a2 and ∆p1 a1 a2 are include in set Sa ( A).
Note that the vector − p−→
1 p2 of adjacent triangles should be in the same direction.
Remote Sens. 2023, 15, 3155 8 of 20
k k k k
∑ dis(o, ai ) − ∑ dis( p, ai ) ≤ ∑ dis(o, bi ) − ∑ dis( p, bi ) (4)
i =1 i =1 i =1 i =1
where k is the K-neighborhood of the two divided triangle sets Sa ( A) and Sa ( B) taken at
point p, and dis( a, b) represents the Euclidean distance from ∆a to ∆b. In this way, we split
Sa into two subsurfaces, Sa = Sa ( A) ∪ Sa ( B).
We apply these operations to all surfaces, splitting them into a set of candidate faces
Scandi that are segmented along intersecting lines. This approach is crucial in producing
high-quality boundaries in the final model. Note that this meshing and splitting algorithm
operates discretely on each surface and triangle and can be well parallelized. In practice,
the C++ implementation of the parallelized version has an efficiency improvement of
approximately 10×, as discussed in Section 4.
where N represents the total number of points in the point cloud P , while supp(si ) factors
in the distance between the point and the surface, the point distribution, and the local
sampling uniformity, as follows:
dist( p, s)
supp(s) = ∑ 1−
e
· conf( p) (6)
p,s|dist( p,s)<e
n o
dist( p, s) = min dist( p, f j )| f ∈ s, 1 ≤ j ≤ N f (7)
Remote Sens. 2023, 15, 3155 9 of 20
!
1 3 3λ1i λ2i
conf( p) = ∑ 1 − 1 · (8)
3 i =1 λi + λ2i + λ3i λ3i
where dist( p, s) in Equation (7) represents the Euclidean distance from a point p ∈ P to
the candidate surface s. Only points whose distance from the surface is less than e are
considered. Mesh s contains N f faces, denoted by f .
λ1i ≤ λ2i ≤ λ3i in Equation (8) are the three eigenvalues of the covariance matrix at scale
i. The property 1 − 3λ1 /λ1 + λ2 + λ3 in conf( p) measures the quality of fitting a tangent
plane in the local neighborhood. A value closer to 0 indicates a poor point distribution,
whereas a value of 1 implies a perfectly fitting plane. The property λ2 /λ3 in conf( p) gauges
the uniformity of point sampling in the local neighborhood. Its value ratio ranges from 0 to
1, with 0 indicating perfect line distribution and 1 representing uniform disk distribution.
In essence, the data fitting term biases the final result towards selecting candidate faces
that are proximal to the input points and have a dense and uniform point distribution.
3D structural similarity. To ensure the reliability of the reconstruction results, we cannot
rely solely on data fitting because it may stubbornly select surfaces around data points,
and the input point cloud may contain defects such as noise or missing data. Additionally,
there may be gaps in the boundary area after the primitive detection module, and mesh splitting
may lead to nonunique intersections in the missing or gap area (shown as Figure 7a).
These factors make data fitting unable to choose a reasonable result. Hence, we add a 3D
structural similarity term that considers the global structure information of the model to the
objective function.
Figure 7. (a) There may be data missing (self-defective or discarded by the primitive detection module)
and nonunique intersections at boundaries. (b) Reconstruction result obtained solely through the
data fitting term. (c) Reconstruction result with the 3D structural similarity added.
The input point set P contains structural information, and the final selected surface
set Sout ∈ Scandi should be structurally similar to the input point cloud, defined as
where P is the known segmented patches, and the range of similarity(Sout , P ) is (0, 1],
where the closer the value is to 1, the higher the similarity. It is defined as
2µS · µ P 2σ · σ σSP
similarity(Sout , P ) = 2 2
· 2 S 2P · (11)
µS + µ P + η σS + σP + η σS · σP + η
where η is a small number to avoid division by zero. µ P and σP represent the mean and
variance of P , µS and σS represents the mean and variance of Sout , and σSP represents the
covariance between P and Sout . These values are computed by randomly sampling the
same number of points from both the point cloud and the mesh, and calculating them
Remote Sens. 2023, 15, 3155 10 of 20
based on the coordinates of the sampled points. According to Equation (10), they can be
written as follows:
Nsc Nsc Nsc
µS = ∑ xi · µ si , σS = σ ( ∑ xi · si ), σSP = σ ( ∑ xi · si , P ) (12)
i =1 i =1 i =1
Similarly, a fixed number of sampling points is used to calculate them. This transforms
the problem into a binary linear combination optimization problem, which is optimized
together with the data fitting term.
The 3D structural similarity term aims to minimize the distribution difference between
the surface set Sout and the point set P , resulting in a reconstruction that has a similar
global structure to P .
Figure 7 illustrates the effect of this term. When there are gaps between patches and
nonunique intersections, the data fitting term alone may not result in a reasonable recon-
struction (Figure 7b). However, after adding the 3D structural similarity term (Figure 7c),
the optimizer can achieve the desired result.
3.3.2. Optimization
We use the energy terms defined above to formulate the following optimization model,
which selects the best set of candidate surfaces and ensures the watertightness of the final
model through hard constraints:
4. Results
4.1. Datasets
We present experimental results on two widely-used datasets of manmade objects,
namely, ABCParts [9] and Thingi10K [46]. ABCParts is a subset of the ABC dataset [47],
which is considered a standard benchmark for learning-based primitive detection meth-
ods [9,10,48] in recent times. It comprises the point cloud of 30k CAD models, where each
point cloud has 10, 000 points and at least one curved surface. We trained and evaluated
our primitive detection network on this dataset.
For nonlearnable modules, Thingi10K is more challenging in demonstrating algorithm
performance. It consists of over 10k objects that have been uploaded by users for 3D
printing. We selected models containing curved surfaces and sharp edges to demonstrate
the framework’s efficacy.
• Seg-IoU: this metric measures the similarity between the predicted patches and ground
truth segments: K1 ∑kK=1 IoU (W [:, k], Ŵ [:, k]), where W is the predicted segmentation
membership for each point cloud, Ŵ is the ground truth, and K is the number of
ground truth segments.
• Type-IoU: this metric measures the classification accuracy of primitive type prediction:
1 K ˆ
K ∑k =1 I[ tk = tk ], where tk is the predicted primitive type for the kth segment patch
and tˆk is the ground truth. I is an indicator function.
• Throughput: this metric measures the efficiency performance of the network: ins./sec.,
meaning maximum number of instances the network can handle per second.
We use Hungarian matching [49] to find correspondences between predicted segments
and ground-truth segments.
Analysis of results. Our framework’s reconstruction results, both quantitative and qual-
itative, are presented in Table 1 and Figure 8, respectively, for the ABCParts benchmark.
The detection performance of our primitive detection module, as shown in Table 1, out-
performs other state-of-the-art methods, particularly in terms of throughput efficiency,
which can be attributed to the PointNet++ backbone. Additionally, Figure 8 shows that
our method produces high-quality watertight models that are close to the ground truth
and surpass all previous methods. This is thanks to our proposed mesh fitting and splitting
module and selection module.
Input
Ground
Truth
ParSeNet
Segments
ParSeNet
HPNet
Segments
HPNet
Ours
Segments
Ours
(a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k)
Figure 8. (a–k) Primitive segmentation and surface reconstruction results. From top to bottom: input,
ground truth, segments produced by ParSeNet, surfaces fitted by ParSeNet, segments produced by
HPNet, surfaces produced by HPNet and ball pivoting [50], our segments produced by primitive
detection module, and final reconstructed models.
In contrast, the primitive blocks detected by ParSeNet [9] may contain breaks or errors,
particularly near boundaries. Although it incorporates a learnable fit module to reconstruct
mesh models, it may generate gaps or overlaps (such as (a), (f), (h), and (j) in the third
row). This highlights the difficulty of training the surface fitting task compared to primitive
detection. HPNet produces more precise primitive detection results than ParSeNet, but it
may encounter oversegmentation issues ((c), (d), (e), (i), and (k) in the fifth row). As HPNet
does not consider mesh model reconstruction, for comparison, we used the classical mesh
reconstruction algorithm ball pivoting [50] to obtain the mesh models (sixth row).
Ablation analysis. Table 2 displays the enhancements of our primitive detection module
compared to the baseline. The most significant improvement comes from the backbone,
which not only boosts the throughput from 8 to 28 instances per second but also enhances
the detection score. Moreover, incorporating more advanced training strategies such as
AdamW [51], cosine learning rate decay, and label smoothing [52] can also lead to improved
performance.
Ground
Truth
Points2Surf
LIG
ConvONet
(3p)
ConvONet
(grid 32)
Ours
by the blue boxes in the figure. The reconstruction results of Points2surf [26] have surface
artifacts and poor boundary quality, possibly because of their design of separate local and
global branches. Local Implicit Grid (LIG) [27] performs better since it trains the network to
learn parts rather than the entire object, making it more local and having better generaliza-
tion in surface reconstruction tasks. However, as mentioned by the authors, it may suffer
from the “back-faces” problem, as shown by the yellow boxes. Convolutional Occupancy
Networks (ConvOnet) [28] proposes two networks that use 2DCNN and 3DCNN, respec-
tively, named ConvOnet-3plane (shown as ConvOnet-3p in the figure) and ConvOnet-grid
(shown as ConvOnet-grid32 in the figure, with a default resolution of 323 ). We conducted
comparative tests on both models. The three-plane network is more efficient but has poorer
representation capabilities, which may result in artifacts in the reconstruction results (as
shown by the red boxes in the figure). The grid model has better representation capabilities
but is relatively less efficient. Moreover, since ConvOnet uses one latent vector to represent
one object, it may make semantic recognition errors and not be as local as Points2surf or
ConvOnet, as shown by the green box in the figure.
In contrast, our method achieved the best reconstruction results, with a clean surface
and sharp edges, with only a small amount of distortion at the boundary of the last model
(shown by the green box in the last row).
Comparison with traditional reconstruction methods. Figure 10 illustrates a comprehen-
sive comparison between our method and five other well-known reconstruction techniques.
They include the classical reconstruction algorithm screened Poisson (SCP) [18], as well as
four methods specifically designed for preserving sharp features: APSS [54], RIMLS [20],
EAR [21], and PolyFit [31], which are detailed in Section 2. Two models of varying com-
plexity are used: a Vase with only four surfaces (top of Figure 10) and a Fandisk with
approximately 20 faces (bottom of Figure 10). Both models utilize point clouds with 10k
points and moderate levels of Gaussian noise (σ = 0.02d, where d represents the length of
the diagonal of the bounding boxes).
Input Points
SCP APSS RIMLS EAR PolyFit Ours
(with noise)
Figure 10. Comparison with traditional meshing methods (top: Vase; bottom: Fandisk).
SCP accurately reconstructs the surface noise but lacks specific smoothing on surface
or sharpening effects on the boundaries. APSS and RIMLS, with their piecewise smoothing
design, partially smooth the surface noise but introduce jagged edges at the boundaries.
EAR effectively smooths the surface noise and enhances sharp edges; however, exten-
sive upsampling leads to deformations near the boundaries and an excessive number
of triangles in the reconstruction. PolyFit represents all surfaces as planes, rendering it
unsuitable for models with curved surfaces. In contrast, our method excels at accurate
shape reconstruction while preserving clear and sharp boundaries, resulting in superior
visual outcomes compared to the other techniques.
Timeliness analysis. Table 3 provides statistics on the reconstruction results in Figure 10,
including the number of triangles in the reconstructed meshes and the runtime of the
algorithms. For SCP, APSS, and RIMLS, the open-source versions available in MeshLab [55]
Remote Sens. 2023, 15, 3155 14 of 20
with their default settings were utilized. Regarding EAR and PolyFit, we employed the
executable programs provided by their respective authors and followed the guidelines
outlined in their papers for parameter tuning and usage.
The comparison reveals that traditional methods such as SCP, APSS, and EAR, which
do not incorporate segmentation, exhibit relatively fast algorithm speeds. However, when
dealing with more complex models, they necessitate dense triangle representations for
accurate reconstruction. Specifically, EAR, with its extensive upsampling for edge en-
hancement, produces an excessive number of triangles in the reconstruction, leading to
significantly longer runtime. Conversely, PolyFit represents the entire model using planes
for lightweight representation, but it struggles to accurately capture shapes containing
curved surfaces.
Vase Fandisk
Faces. Sec. Faces. Sec.
Screened Possion [18] 9996 1.98 40028 3.08
APSS [54] 9996 1.59 17815 4.34
RIMLS [20] 9996 2.41 17801 6.63
EAR [21] 181170 128.98 272593 202.39
PolyFit [31] 38 1.81 17 3.39
Ours 4068 7.92 6403 69.09
Ours + OpenMP [56] 4068 2.26 6403 7.35
Original Points
SCP APSS RIMLS EAR PolyFit Ours
(without noise)
Figure 11. Heatmap of geometric error (top: Vase, bottom: Fandisk). To visualize the reconstruct
error, the point cloud is colored based on the distance between each point and the surface of the
reconstructed model (Figure 10). The color scale ranges from blue (representing a shorter distance) to
red (indicating a longer distance).
Remote Sens. 2023, 15, 3155 15 of 20
It can be observed that SCP exhibits a global error distribution, primarily due to the
presence of noise. APSS and RIMLS can only smooth relatively low-level surface noise
and may increase errors at the boundaries. EAR can effectively smooth the surface and
enhance the boundaries, but it also introduces increased boundary errors. In comparison,
our method achieves the overall minimum geometric error and accurately reconstructs the
boundaries, resulting in the most accurate reconstruction.
Screened Possion [18] 27.35 8.014 1.766 1.390 117.8 20.24 2.983 2.572
APSS [54] 149.0 5.579 1.120 0.897 26.28 19.32 3.051 2.806
RIMLS [20] 31.28 5.663 1.296 1.022 97.25 19.86 3.515 3.303
EAR [21] 8.227 10.57 1.647 1.152 197.7 19.95 3.843 3.575
PolyFit [31] 3525 210.7 53.52 36.32 1586 260.4 45.34 31.94
Ours 0.003 8.132 1.499 1.343 15.82 18.85 1.162 1.008
Input
Points
Edge
Detection
Clustering
&
Segmentation
Candidate
Surfaces
Results
Figure 12. Demonstration of the effectiveness of our framework combined with edge extraction.
From top to bottom: input, results produced by three different types of edge detection methods,
results after clustering and segmentation, candidate surfaces produced by the proposed mesh splitting
module, and final outputs of selection module. Panels (a,b) are examples of VCM [57], (c,d) are examples
of the learning-based method PIE-Net [48], and (e,f) are examples of the semiautomatic method
Live-Wire [58].
Robustness evaluation. To test the robustness of our algorithm, we examined the recon-
struction effect of our framework on noisy and missing data.
As depicted in Figure 14, we added random Gaussian noise during the training of
the primitive detection module as a data augmentation technique. During testing, we
added varying degrees of Gaussian noise to the same point cloud, with d representing
Remote Sens. 2023, 15, 3155 17 of 20
the length of an edge of the object. Our algorithm demonstrated robustness to moderate
Gaussian noise.
d
Figure 14. Reconstruction results of the CAD component with increasing Gaussian noise. Top: input.
Bottom: reconstruction. (a) Input without noise. (b) σ = 0.01d. (c) σ = 0.04d. (d) σ = 0.06d.
For missing data, as shown in Figure 15, we artificially cropped the input point cloud.
Despite this, the refine submodule Section 3.1.2 was still able to detect all patches and
merge them via normal angle. The mesh fitting and segmentation module is responsible for
filling in the missing parts based on the merged patches. It is worth noting that our default
HRBF-based meshing method [16] is required here, as screened Poisson will not work. The
reconstruction framework was still able to produce the desired result.
Figure 15. Reconstruction of missing data. (a) Point cloud with missing data. (b) Candidate surfaces.
(c) Reconstruction result.
5. Conclusions
Our work presents a novel framework for 3D mesh reconstruction from point clouds,
which is based on primitive detection and is designed to preserve sharp features with high
fidelity. Unlike previous methods, our approach emphasizes achieving high-quality overall
reconstruction, particularly on sharp boundaries. To achieve this goal, we developed
multiple modules that cover the entire reconstruction process and result in watertight
and high-fidelity models. Our method outperforms the state-of-the-art on most metrics,
and produces models that are closer to the ground truth and have smaller errors than recent
learning-based reconstruction and classic mesh reconstruction methods. Additionally, we
demonstrate the versatility of our designed modules by applying them to other tasks, such
as edge extraction and RANSAC, resulting in high-quality models.
As larger networks and datasets become available, we expect further improvements
in feature extraction and primitive detection for point clouds, making our framework even
more valuable for various applications in the future.
Future prospects. We believe that, compared to 2D datasets, the current availability and
quality of 3D segmentation datasets are still limited. This particularly applies to primitive
segmentation data, which is why our method currently performs well only on CAD
data. However, the modules within our framework are designed to be flexible, enabling
Remote Sens. 2023, 15, 3155 18 of 20
them to adapt to future developments. We anticipate that as larger networks and higher-
quality datasets, including urban buildings and objects in indoor scenes, become available,
or with further advancements in unsupervised segmentation methods, the reconstruction
performance of our method will continue to improve, making our framework even more
valuable for various applications in the future.
Limitations. It is worth noting that primitive representation is a strong prior, and our
method may not be applicable to all types of objects that are not suitable for representation
using primitives. Additionally, in the selection module, we treat the reconstruction problem
as a binary linear combination optimization problem, which may limit our method’s
ability to complete large missing areas in the input point cloud, such as in the case of
extensive missing data. In these cases, our method may not be able to achieve the correct
reconstruction result, unlike recent deep-learning-based methods [23,24,28].
Author Contributions: Conceptualization, Q.L.; funding acquisition, J.X.; project administration, J.X.;
supervision, J.X., S.X. and Y.W.; validation, Q.L., J.X., S.X. and Y.W.; visualization, Q.L.; writing—
original draft, Q.L.; writing—review and editing, J.X., S.X. and Y.W. All authors have read and agreed
to the published version of the manuscript.
Funding: This work is supported by the National Natural Science Foundation of China (U2003109,
U21A20515, 62102393,62206263,62271467), the Strategic Priority Research Program of the Chinese
Academy of Sciences (No. XDA23090304), the Youth Innovation Promotion Association of the Chinese
Academy of Sciences (Y201935), the State Key Laboratory of Robotics and Systems (HIT) (SKLRS-2022-
KF-11), and the Fundamental Research Funds for the Central Universities and China Postdoctoral
Science Foundation (2022T150639, 2021M703162).
Data Availability Statement: This work uses the following datasets, all of which can be obtained
from the internet. ABC at https://deep-geometry.github.io/abc-dataset/ (accessed on 14 October
2022); Thingi10K at https://ten-thousand-models.appspot.com/ (accessed on 14 October 2022).
Acknowledgments: The authors would like to thank the reviewers for their valuable comments
and suggestions.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Berger, M.; Tagliasacchi, A.; Seversky, L.M.; Alliez, P.; Guennebaud, G.; Levine, J.A.; Sharf, A.; Silva, C.T. A Survey of Surface
Reconstruction from Point Clouds. Comput. Graph. Forum 2017, 36, 301–329. [CrossRef]
2. Berger, M.; Tagliasacchi, A.; Seversky, L.M.; Alliez, P.; Levine, J.a.; Sharf, A.; Silva, C.T.; Tagliasacchi, A.; Seversky, L.M.; Silva,
C.T.; et al. State of the Art in Surface Reconstruction from Point Clouds. In Proceedings of the 35th Annual Conference of the
European Association for Computer Graphics, Eurographics 2014-State of the Art Reports (No. CONF), Strasbourg, France, 7–11
April 2014; Volume 1, pp.161–185.
3. Kaiser, A.; Ybanez Zepeda, J.A.; Boubekeur, T. A Survey of Simple Geometric Primitives Detection Methods for Captured 3D
Data. Comput. Graph. Forum 2019, 38, 167–196. [CrossRef]
4. Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and
Automated Cartography. In Readings in Computer Vision; Fischler, M.A., Firschein, O., Eds.; Morgan Kaufmann: San Francisco,
CA, USA, 1987; pp. 726–740. [CrossRef]
5. Schnabel, R.; Wahl, R.; Klein, R. Efficient RANSAC for point-cloud shape detection. Comput. Graph. Forum 2007, 26, 214–226.
[CrossRef]
6. Yan, D.M.; Wang, W.; Liu, Y.; Yang, Z. Variational mesh segmentation via quadric surface fitting. CAD Comput. Aided Des. 2012,
44, 1072–1082. [CrossRef]
7. Lafarge, F.; Mallet, C. Creating large-scale city models from 3D-point clouds: A robust approach with hybrid representation. Int.
J. Comput. Vis. 2012, 99, 69–85. [CrossRef]
8. Li, L.; Sung, M.; Dubrovina, A.; Yi, L.; Guibas, L.J. Supervised fitting of geometric primitives to 3D point clouds. In Proceedings
of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019;
pp. 2647–2655.
9. Sharma, G.; Liu, D.; Maji, S.; Kalogerakis, E.; Chaudhuri, S.; Měch, R. ParSeNet: A Parametric Surface Fitting Network for 3D
Point Clouds. Lect. Notes Comput. Sci. 2020, 12352, 261–276.
10. Yan, S.; Yang, Z.; Ma, C.; Huang, H.; Vouga, E.; Huang, Q. HPNet: Deep Primitive Segmentation Using Hybrid Representations
In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021.
Remote Sens. 2023, 15, 3155 19 of 20
11. Lorensen, W.E.; Cline, H.E. Marching Cubes: A High Resolution 3D Surface Construction Algorithm. SIGGRAPH Comput. Graph.
1987, 21, 163–169. [CrossRef]
12. Alexa, M.; Behr, J.; Cohen-Or, D.; Fleishman, S.; Levin, D.; Silva, C.T. Computing and rendering point set surfaces. IEEE Trans.
Vis. Comput. Graph. 2003, 9, 3–15. [CrossRef]
13. Wang, H.; Scheidegger, C.E.; Silva, C.T. Bandwidth selection and reconstruction quality in point-based surfaces. IEEE Trans. Vis.
Comput. Graph. 2009, 15, 572–582. [CrossRef]
14. Carr, J.C.; Beatson, R.K.; Cherrie, J.B.; Mitchell, T.J.; Fright, W.R.; McCallum, B.C.; Evans, T.R. Reconstruction and representation
of 3D objects with radial basis functions. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive
Techniques, SIGGRAPH 2001, Los Angeles, CA, USA, 12–17 August 2001; pp. 67–76. . [CrossRef]
15. Brazil, E.V.; Macedo, I.; Sousa, M.C.; de Figueiredo, L.H.; Velho, L. Sketching Variational Hermite-RBF Implicits. In Proceedings
of the Seventh Sketch-Based Interfaces and Modeling Symposium, Annecy, France, 7–10 June 2010; SBIM ’10, pp. 1–8.
16. Huang, Z.; Carr, N.; Ju, T. Variational implicit point set surfaces. ACM Trans. Graph. 2019, 38, 124. [CrossRef]
17. Kazhdan, M.; Bolitho, M.; Hoppe, H. Poisson Surface Reconstruction. In Proceedings of the Fourth Eurographics Symposium on
Geometry Processing; Sardinia, Italy, 26–28 June 2006; SGP ’06, pp. 61–70.
18. Kazhdan, M.; Hoppe, H. Screened poisson surface reconstruction. ACM Trans. Graph. 2013, 32, 29. [CrossRef]
19. Fleishman, S.; Cohen-Or, D.; Silva, C.T. Robust moving least-squares fitting with sharp features. ACM Trans. Graph. 2005,
24, 544–552. [CrossRef]
20. Öztireli, A.C.; Guennebaud, G.; Gross, M. Feature Preserving Point Set Surfaces based on Non-Linear Kernel Regression. Comput.
Graph. Forum 2009, 28, 493–501. [CrossRef]
21. Huang, H.; Wu, S.; Gong, M.; Cohen-Or, D.; Ascher, U.; Zhang, H.R. Edge-aware point set resampling. ACM Trans. Graph. 2013,
32, 9. [CrossRef]
22. Lipman, Y.; Cohen-Or, D.; Levin, D.; Tal-Ezer, H. Parameterization-free projection for geometry reconstruction. In Proceedings of
the ACM SIGGRAPH Conference on Computer Graphics, San Diego, CA, USA, 5–9 August 2007. . [CrossRef]
23. Chen, Z.; Zhang, H. Learning Implicit Fields for Generative Shape Modeling. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019.
24. Park, J.J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. DeepSDF: Learning Continuous Signed Distance Functions for
Shape Representation. In Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long
Beach, CA, USA, 15–20 June 2019.
25. Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy Networks: Learning 3D Reconstruction in Function
Space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20
June 2019.
26. Erler, P.; Guerrero, P.; Ohrhallinger, S.; Mitra, N.J.; Wimmer, M. Points2Surf: Learning Implicit Surfaces from Point Clouds. In
Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; pp. 108–124. [CrossRef]
27. Jiang, C.M.; Sud, A.; Makadia, A.; Huang, J.; Nießner, M.; Funkhouser, T. Local Implicit Grid Representations for 3D Scenes. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020.
28. Songyou, P.; Michael, N.; Lars, M.; Marc, P.; Andreas, G. Convolutional Occupancy Networks. In Proceedings of the European
Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020.
29. Schnabel, R.; Degener, P.; Klein, R. Completion and reconstruction with primitive shapes. Comput. Graph. Forum 2009, 28, 503–512.
[CrossRef]
30. Lafarge, F.; Alliez, P. Surface reconstruction through point set structuring. Comput. Graph. Forum 2013, 32, 225–234. [CrossRef]
31. Nan, L.; Wonka, P. PolyFit: Polygonal Surface Reconstruction from Point Clouds. In Proceedings of the IEEE International
Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2372–2380. [CrossRef]
32. Sharma, G.; Goyal, R.; Liu, D.; Kalogerakis, E.; Maji, S. CSGNet: Neural Shape Parser for Constructive Solid Geometry. In
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA,
18–23 June 2018; pp. 5515–5523.
33. Yu, F.; Chen, Z.; Li, M.; Sanghi, A.; Shayani, H.; Mahdavi-Amiri, A.; Zhang, H. CAPRI-Net: Learning Compact CAD Shapes
with Adaptive Primitive Assembly In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
Nashville, TN, USA, 20–25 June 2021. [CrossRef]
34. Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. ACM
Trans. Graph. 2019, 38, 146. [CrossRef]
35. Comaniciu, D.; Meer, P. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2002,
24, 603–619. [CrossRef]
36. Qian, G.; Li, Y.; Peng, H.; Mai, J.; Hammoud, H.; Elhoseiny, M.; Ghanem, B. PointNeXt: Revisiting PointNet++ with Improved
Training and Scaling Strategies. In Advances in Neural Information Processing Systems; Koyejo, S., Mohamed, S., Agarwal, A.,
Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2022; Volume 35, pp. 23192–23204.
37. Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In
Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December
2017; NIPS’17, pp. 5105–5114.
Remote Sens. 2023, 15, 3155 20 of 20
38. Ma, X.; Qin, C.; You, H.; Ran, H.; Fu, Y. Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP
Framework. In Proceedings of the International Conference on Learning Representations, Virtual Event, 25–29 April 2022.
39. Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point Transformer. In Proceedings of the IEEE/CVF International Conference on
Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 16259–16268.
40. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In
Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June
2018; pp. 4510–4520. [CrossRef]
41. Lu, X.; Yao, J.; Tu, J.; Li, K.; Li, L.; Liu, Y. Pairwise linkage for point cloud segmentation. ISPRS Ann. Photogramm. Remote Sens.
Spat. Inf. Sci. 2016, 3, 201–208. [CrossRef]
42. Alliez, P.; Giraudot, S.; Jamin, C.; Lafarge, F.; Mérigot, Q.; Meyron, J.; Saboret, L.; Salman, N.; Wu, S.; Yildiran, N.F. Point Set
Processing. In CGAL User and Reference Manual, 4th ed.; CGAL Editorial Board: New York, NY, USA, 2022.
43. Kettner, L.; Meyer, A.; Zomorodian, A. Intersecting Sequences of dD Iso-oriented Boxes. In CGAL User and Reference Manual, 3rd
ed.; CGAL Editorial Board: New York, NY, USA, 2021.
44. Nan, L.; Sharf, A.; Zhang, H.; Cohen-Or, D.; Chen, B. SmartBoxes for interactive urban reconstruction. In ACM Siggraph 2010
Papers, Siggraph 2010; ACM: New York, NY, USA, 2010. [CrossRef]
45. Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual. 2021. Available online: https://www.gurobi.com accessed on
(27 April 2023).
46. Zhou, Q.; Jacobson, A. Thingi10K: A Dataset of 10, 000 3D-Printing Models. arXiv 2016, arXiv:abs/1605.04797.
47. Koch, S.; Matveev, A.; Williams, F.; Alexa, M.; Zorin, D.; Panozzo, D.; Files, C.A.D. ABC: A Big CAD Model Dataset For Geometric
Deep Learning. arXiv 2019, arXiv:1812.06216v2. http://xxx.lanl.gov/abs/arXiv:1812.06216v2.
48. Wang, X.; Xu, Y.; Xu, K.; Tagliasacchi, A.; Zhou, B.; Mahdavi-Amiri, A.; Zhang, H. PIE-NET: Parametric Inference of Point Cloud
Edges. Adv. Neural Inf. Process. Syst. 2020, 33, 20167–20178. http://xxx.lanl.gov/abs/2007.04883.
49. Kuhn, H.W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [CrossRef]
50. Bernardini, F.; Mittleman, J.; Rushmeier, H.; Silva, C.; Taubin, G. The Ball-Pivoting Algorithm for Surface Reconstruction. IEEE
Trans. Vis. Comput. Graph. 1999, 5, 349–359. [CrossRef]
51. Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the ICLR, New Orleans, LA, USA, 6–9 May
2019.
52. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June
2016; pp. 2818–2826. [CrossRef]
53. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980.
54. Guennebaud, G.; Gross, M. Algebraic point set surfaces. In Proceedings of the ACM SIGGRAPH Conference on Computer
Graphics, San Diego, CA, USA, 5–9August 2007. [CrossRef]
55. Cignoni, P.; Callieri, M.; Corsini, M.; Dellepiane, M.; Ganovelli, F.; Ranzuglia, G. MeshLab: An Open-Source Mesh Processing
Tool. In Proceedings of the Eurographics Italian Chapter Conference, Salerno, Italy, 2–4 July 2008; Scarano, V., Chiara, R.D., Erra,
U., Eds.; The Eurographics Association: Crete, Greece, 2008. . [CrossRef]
56. Chandra, R.; Dagum, L.; Kohr, D.; Menon, R.; Maydan, D.; McDonald, J. Parallel Programming in OpenMP; Morgan Kaufmann:
Burlington, MA, USA, 2001.
57. Mérigot, Q.; Ovsjanikov, M.; Guibas, L.J. Voronoi-based curvature and feature estimation from point clouds. IEEE Trans. Vis.
Comput. Graph. 2011, 17, 743–756. [CrossRef]
58. Zhuang, Y.; Zou, M.; Carr, N.; Ju, T. Anisotropic geodesics for live-wire mesh segmentation. Comput. Graph. Forum 2014,
33, 111–120. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.