Sensors 22 04742
Sensors 22 04742
Sensors 22 04742
Article
A Lightweight Network for Point Cloud Analysis via the Fusion
of Local Features and Distribution Characteristics
Qiang Zheng 1,2 , Jian Sun 1,2, * and Wei Chen 1,2
1 State Key Laboratory for Strength and Vibration of Mechanical Structures, School of Aerospace Engineering,
Xi’an Jiaotong University, Xi’an 710049, China; [email protected] (Q.Z.);
[email protected] (W.C.)
2 Shaanxi Engineering Laboratory for Vibration Control of Aerospace Structures, Xi’an Jiaotong University,
Xi’an 710049, China
* Correspondence: [email protected]
Abstract: Effectively integrating the local features and their spatial distribution information for
more effective point cloud analysis is a subject that has been explored for a long time. Inspired by
convolutional neural networks (CNNs), this paper studies the relationship between local features and
their spatial characteristics and proposes a concise architecture to effectively integrate them instead of
designing more sophisticated feature extraction modules. Different positions in the feature map of the
2D image correspond to different weights in the convolution kernel, making the obtained features that
are sensitive to local distribution characteristics. Thus, the spatial distribution of the input features of
the point cloud within the receptive field is critical for capturing abstract regional aggregated features.
We design a lightweight structure to extract local features by explicitly supplementing the distribution
information of the input features to obtain distinctive features for point cloud analysis. Compared
with the baseline, our model shows improvements in accuracy and convergence speed, and these
advantages facilitate the introduction of the snapshot ensemble. Aiming at the shortcomings of the
commonly used cosine annealing learning schedule, we design a new annealing schedule that can be
flexibly adjusted for the snapshot ensemble technology, which significantly improves the performance
Citation: Zheng, Q.; Sun, J.; Chen, W. by a large margin. Extensive experiments on typical benchmarks verify that, although it adopts
A Lightweight Network for Point the basic shared multi-layer perceptrons (MLPs) as feature extractors, the proposed model with a
Cloud Analysis via the Fusion of lightweight structure achieves on-par performance with previous state-of-the-art (SOTA) methods
Local Features and Distribution (e.g., MoldeNet40 classification, 0.98 million parameters and 93.5% accuracy; S3DIS segmentation,
Characteristics. Sensors 2022, 22, 4742.
1.4 million parameters and 68.7% mIoU).
https://doi.org/10.3390/s22134742
Academic Editor: Francesco Longo Keywords: lightweight network; deep learning; point cloud classification; point cloud segmentation
with the increase in resolution. There are also other methods such as [22–26] to render the
point cloud from multiple perspectives to obtain a set of images to directly introduce the
CNN to process 3D point cloud analysis, which leads to spatial information loss and causes
difficulties for tasks such as semantic segmentation.
Since the conversion of point clouds causes information loss and brings extra burdens
on storage and computation, it is feasible to directly use the point cloud as the input of
deep networks. The point cloud analysis network closely follows the development of
image processing technology. The core problem of point cloud analysis is aggregating
and extracting features within the perceptual field. There are many methods devoted to
designing sophisticated local feature extraction modules, representative works are MLP-
based (such as PointNet [27] and PointNet++ [28]), convolution-based (PointConv [29]),
graph-based (DGCNN [30]), relation-based (RPNet [31]), and transformer-based (point
transformer [32]; PCT [33]) methods. These methods have contributed to the advancement
of the point cloud analysis community. However, the pursuit of sophisticated local feature
extractors also has its limitations. Delicate modules often correspond to complex modules,
resulting in huge computational costs, which hinders the application of these methods
to 3D point cloud analysis. In addition, the performance gains from more sophisticated
extractors have also been saturated recently. Experiments in [34] show that under similar
network frameworks, the performance improvements brought by most refined local feature
extractors are not significantly different. Therefore, this paper aims to design a lightweight
and efficient network instead of pursuing more refined feature extractors.
It is feasible to rethink MLP-based methods, analyze their inherent limitations, and
modify them to improve feature extraction capabilities significantly. The advantage of
simple MLP-based methods is that they do not require complex operations such as building
graphs to extract edge features or generating adaptive convolution kernels. In addition,
shared MLP regards all points in the receptive field as equivalent, extracts point features,
and then obtains local aggregated features through a symmetric function, which makes the
MLP-based methods less computationally expensive and can well adapt to the disorder of
point clouds. However, treating all points as equivalent tends to ignore the difference in
the spatial distribution, which leads to the deterioration of features. Looking back at the
process of CNN using convolution kernels to perform convolution operations on the image
patch to extract local features, the weight values in the convolution kernels are usually
different, which means that pixels at different positions in the feature map correspond
to different weights. Even if the features at different locations are the same, different
activation values will be output due to different weights, so the distribution characteristics
of elements in the local area also have a potential impact on the extraction of local features.
Local features are not only the aggregation of input features in the local receptive field but
also potentially encode the spatial distribution information of each element in the local area,
see Figure 1a. When performing MLP-based point cloud analysis, shared MLP is usually
implemented with a 1 × 1 convolution kernel, which is equivalent to forcing the weights of
the convolution kernels corresponding to each position in the local area of the 2D image
to be the same. Thus, the feature extraction is independent of the relative position of the
pixels, which seriously weakens the feature extraction ability of the convolution kernel, see
Figure 1b.
Typical MLP-based methods such as PointNet [27] first extract features for each point
independently and then aggregate over the global receptive field to obtain shape-level fea-
tures. In addition to ignoring local geometric properties, PointNet [27] completely ignores
spatial distribution information when aggregating all point features. PointNet++ [28] is an
upgraded version of PointNet [27], PointNet++ [28] uses mini-PointNet to aggregate point
features in local areas, and only splices the features and relative coordinates in the input
layer of mini-PointNet, so PointNet++ [28] does not inherently overcome the limitations
of PointNet [27]. Although DGCNN [30] is a graph-based method, it still uses the MLP
operation for local feature extraction. DGCNN [30] is aware of the role of spatial distribu-
tion information, so it adds the relative coordinates of the neighborhood points and the
Sensors 2022, 22, 4742 3 of 21
absolute coordinates of the centroid to the input, which makes the local features depend
on the absolute position of the centroid, thus reducing the representation ability of the
local features. In addition, DGCNN [30] dynamically builds graphs in Euclidean space and
feature space, which introduces much computational consumption. Although CNN-based
methods can extract local characteristics based on the spatial distribution information
of points, the process of adaptively learning convolution kernels is still computationally
expensive compared to MLP-based methods. A concise and effective way to alleviate the
limitation of shared MLPs ignoring spatial distribution information is to explicitly provide
the relative coordinates of the current receptive field at each MLP layer (for PointNet [27],
the receptive field is the entire point cloud, so global coordinates need to be provided).
Although this modification is simple, experiments validate its effectiveness. This design
allows us to achieve outstanding performance with fewer layers and a more lightweight
network.
Figure 1. Two local feature extraction methods: (a) The convolution kernel provides different weights
for input features at different locations, potentially encoding the spatial distribution information of
the input features; (b) Points at different locations share the same MLP feature extractor, so shared
MLP ignores the spatial distribution information.
Moreover, the vanilla version of the proposed model with an exponential decay
learning schedule exhibits rapid convergence, see Figure 2, which implies that performance
gains are limited for most epochs beyond the initial growth, accompanied by unnecessary
cost of computing resources. We introduce the snapshot ensemble technology to the
proposed model to address this issue. Snapshot ensemble can integrate a series of trained
models in one complete training session without additional computation cost and fully
utilize the model’s rapid convergence to improve the performance further. However,
for the snapshot ensemble, the commonly used cosine annealing learning rate cannot be
flexibly adjusted when the annealing cycle is fixed, so we propose a novel learning schedule
denoted as Rectified-Tanh (Re-Tanh) with an adjustable parameter that can flexibly adapt to
Sensors 2022, 22, 4742 4 of 21
different scenarios. Ablation studies also demonstrate that the learning strategy is beneficial
to improving the performance of the ensemble model.
Figure 2. Accuracy curve (red) of the proposed model on the ModelNet40 classification task with an
exponential decay learning schedule. Our model converges rapidly and steadily.
2. Related Work
2.1. Methods Based on 3D Voxel
Due to the success of CNN in the field of image processing, it is an intuitive attempt to
represent the point cloud with a regular voxel grid. VoxNet [16,17] uses two states of 0 and
1 to indicate whether the voxel is occupied. In order to reduce over-fitting, the approach
in [23] predicts the corresponding partial subvolumes of the voxelized data from various
directions and uses orientation pooling to fuse the shape features with different directions
and converts the analysis of 3D voxels into the extraction of 2D features. The approach
in [35] designs a voxel-based variational autoencoder, and the obtained features are used
for shape recognition. Voxel-based methods have achieved excellent performance in object
recognition tasks, but it has inherent shortcomings. The 3D point cloud is distributed on
a 2D manifold, so the 3D voxel data obtained by voxelizing the 2D manifold is highly
sparse, and some fine-grained information is lost due to quantization. Moreover, the
size of voxel data grows cubically with the resolution, which restricts a higher resolution
of voxel data. All these factors will cause a considerable consumption of storage and
computation resources. Some methods [19–21,36–38] have been used to alleviate these
Sensors 2022, 22, 4742 5 of 21
difficulties, but they still cannot fundamentally eliminate these limitations. The recently
proposed Minkowski Engine [39] is an extension of sparse convolutional networks to high-
dimensional space, which facilitates the use of deep networks commonly used in 2D vision
for point cloud analysis. Minkowski Engine [39] significantly reduces the storage and
computing requirements, enabling voxel-based methods to be applied to higher resolution
voxel inputs. Following paper [39], some works [40–42] based on Minkowski Engine have
demonstrated excellent performance in point cloud analysis.
3. Methodology
In this section, we first introduce the methods of feature extraction and the supplement
of distribution information, then review the snapshot ensemble and illustrate the proposed
annealing schedule, and finally, we illustrate the network architecture in detail.
where MAX and h represent the max-pooling operation and the stacked shared MLPs,
respectively. See Figure 3.
Figure 4. The Fusion Modules corresponding to local (a) and global (b) perception fields.
where lrmax and lrmin are the initial and final learning rate in one cycle, t is the iteration
number, T is the total number of training iterations in one cycle, and f is a monotonically
decreasing function. In general, f is set to be the shifted cosine function:
The experiments reveal that the non-ensemble model rapidly converges when trained
with an exponential decay learning rate, see Figure 2. Rapid convergence implies that
the proposed model can quickly reach a local minimum in several epochs, facilitating the
introduction of the snapshot ensemble.
However, when T, lrmax and lrmin are fixed, the commonly used cosine annealing
learning schedule is also fixed, which makes it unable to be flexibly adjusted to adapt to
diverse scenarios. Thus we need to design a function that decreases monotonically from
1 to 0 on the interval [0, π ] like the shifted cosine, and the shape of the function can be
flexibly adjusted. As shown in Equation (4), the tanh function increases monotonically, and
we introduce a new annealing schedule based on it.
e x − e− x
tanh( x ) = . (4)
e x + e− x
The steps to rectify tanh to generate a new annealing curve are displayed as follows:
• Use − x instead of x to obtain a monotonically decreasing function f 1 :
f 1 ( x ) = tanh(− x ). (5)
• The x is replaced by sx to scale f 1 , and then f 1 is truncated on the interval [−π/2, π/2]
to obtain the function f 2 :
• By replacing x with x − π/2, the function f 2 is shifted to the right by π/2, thereby
obtaining the function f 3 defined on the interval [0, π ]:
• Since the values of f 3 at both ends of the interval [0, π ] are not strictly equal to ±1, it
needs to be normalized to obtain the function f 4 with a range of [−1, 1]:
tanh[−s( x − π/2)]
f4 (x) = , x ∈ [0, π ]. (8)
tanh(sπ/2)
• Scale the value range of f 4 to [−0.5, 0.5] and move it up by 0.5 to obtain a new annealing
function Re-Tanh defined on [0, π ]. The function value decreases monotonically from
1 to 0, and the shape can be adjusted by s. The expression is:
tanh[−s( x − π/2)]
Re-Tanh( x ) = + 0.5, x ∈ [0, π ]. (9)
2tanh(sπ/2)
Figure 5 illustrates the Re-Tanh curves corresponding to different s values. The shifted
cosine is also shown for reference. It is noted that when s equals 1, the middle part of
the Re-Tanh and the shifted cosine are almost coincident. This phenomenon can prove
mathematically that the slopes of the two curves are almost equal when they are close to
the center of symmetry (x = π/2), which shows that the Re-Tanh can be regarded as a
generalization of shifted cosine so that the learning schedule can be flexibly adjusted in
specific scenarios to improve the performance of the model. In practical applications, the x
is usually replaced with π · mod(t, T )/T, where the mathematical quantities represented
by t and T are the same as those in Equation (2).
Sensors 2022, 22, 4742 10 of 21
Figure 5. The Re-Tanh curves corresponding to different scale factors. The shifted cosine curve is
also illustrated for reference.
Figure 6. Network architecture for classification and part segmentation tasks. Some network layers
are omitted for clarity.
factor of the Re-Tanh is set to 1.5. In each cycle, the learning rate starts at 0.01 and decreases
monotonically to 0.00001. For semantic segmentation, the momentum of BN is fixed to
0.99, and a learning rate starts at 0.01 with an exponential decay rate of 0.95 per epoch is
employed without ensemble technology.
4. Results
4.1. Modelnet40 Classification
Our model is evaluated on the ModelNet40 benchmark, which contains 13,834 CAD
models from 40 categories divided into 9843 for training and 2468 for testing. For each object,
1024 points are uniformly sampled on the mesh surface, and then they are normalized into
a unit sphere. The result is shown in Table 1. Our model achieves on-par performance
with previous SOTA methods with only 1024 points as input. Note that methods such
as RS-CNN [63] improve the accuracy from 92.9% to 93.6% by a tricky 10-voting strategy
with randomly scaled shapes. The 10-voting evaluation is repeated 300 times, and then
the best result is selected, so we take the result without voting for RS-CNN [63] for a
fair comparison.
Table 1. Shape classification and part segmentation results (%) (“nor”: normal; “-”: unknown;
“k”: 1024). The best scores in the table are marked in bold font.
ModelNet40 ShapeNet
Method Input Acc. Input Cls. mIoU Ins. IoU
PointNet [27] 1k 89.2 2k 80.4 83.7
SCN [64] 1k 90.0 1k 81.8 84.6
KD-Net (depth = 10) [19] 1k 90.6 4k 77.4 82.3
PointNet++ [28] 1k 90.7 2 k, nor 81.9 85.1
KCNet [65] 1k 91.0 2k 82.2 84.7
MRTNet [66] 1k 91.2 - - -
SpecGCN [67] 1k 91.5 2k - 85.4
KD-Net (depth = 15) [19] 32 k 91.8 - - -
PointCNN [68] 1k 92.2 - - -
PCNN [48] 1k 92.3 2k 81.8 85.1
DGANet [51] 1k 92.3 2k - 85.2
Point2Sequence [69] 1k 92.6 - - -
A-CNN [49] 1k 92.6 - - -
Hybrid-CNN [59] 1k 92.6 - - -
DGCNN [30] 1k 92.9 2k 82.3 85.1
RS-CNN w/o vot. [63] 1k 92.9 2k 84.0 86.2
Point2Node [56] 1k 93.0 - - -
GAPointNet [53] 1k 93.0 2k - 84.9
PCT [33] 1k 93.2 2k - 86.4
Point2SpatialCapsule [55] 1k 93.4 2k 83.0 85.3
AGNet [70] 1k 93.4 2k 82.7 85.4
Ours 1k 93.5 2k 82.7 85.7
Figure 8. Segmentation examples of ShapeNet. For each sample, (left column) ground truth;
(right column) prediction.
Table 2. Results of 6-fold cross validation on the S3DIS dataset (%). The best scores in the table are
marked in bold font.
Table 3. Results of S3DIS Area-5 (%). The best scores in the table are marked in bold font.
Figure 9. Qualitative results for the semantic segmentation task on S3DIS. Regions with large
deviations between the predicted results and the ground truths are marked with red boxes.
Sensors 2022, 22, 4742 15 of 21
5. Discussion
In this section, we first evaluate the complexity of the proposed model. Then, we
perform ablation experiments on our model to prove the effectiveness of the architecture
design. Finally, we conduct a series of experiments to evaluate the influence of some critical
parameters. All these experiments are performed on the ModelNet40 classification task.
Table 4. Complexity of the model for ModelNet40 classification (res: resolution; k: 1024; M: million).
The three parts from top to bottom are voxel-based, view-based and point-based methods.
only once. Model C takes the spatial coordinates as supplements for local patterns in each
layer in the Fusion Module and achieves an accuracy of 92.9%. Comparing Model A and
C, it can be verified that the supplement of spatial information is critical to improving the
performance significantly. Model D adopts a cosine annealing strategy for snapshot and
achieves 93.0% accuracy, and the improvement is not significant compared with Model
C (92.9%). The model (Ours) with the proposed Re-Tanh annealing strategy outperforms
all these models above, especially Model D, proving that an adjusted annealing learning
schedule is essential for better performance.
Size 12 24 36 48 60
Acc. 92.3 92.4 92.7 93.5 93.1
6. Conclusions
This paper studies the limitation of MLP-based methods and proposes a lightweight
architecture for point cloud analysis. Inspired by CNN, we propose the supplementary
mechanism of distribution information for shared MLPs and perform it concisely. Moreover,
the model converges rapidly with an exponential decay learning rate, so we promote it
with the snapshot ensemble strategy and design a new cyclic annealing schedule that can
be flexibly adjusted. Our network achieves on-par performance with the previous SOTA
methods, with the least number of parameters. Although effective, there are also some
limitations. The kNN search is performed for each point when performing classification
tasks resulting in overlapping points between the local regions of adjacent centroids, which
makes the computation redundant. Adaptive learning of a subset of keypoints in a point
cloud and designing more effective architectures for efficient point analysis will be explored
in our future studies.
Author Contributions: Conceptualization: Q.Z. and J.S.; methodology: Q.Z.; validation: Q.Z.;
investigation: Q.Z. and W.C.; resources: J.S.; writing—original draft preparation: Q.Z.; writing—
review and editing: Q.Z., J.S. and W.C.; supervision: J.S. All authors have read and agreed to the
published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: ModelNet40: https://shapenet.cs.stanford.edu/media/modelnet40_
ply_hdf5_2048.zip; ShapeNet part: https://cs.stanford.edu/~ericyi/project_page/part_annotation/
index.html; S3DIS: https://goo.gl/forms/4SoGp4KtH1jfRqEj2, accessed on 8 October 2021.
Acknowledgments: The authors thank Stanford University for providing the experimental datasets.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Golovinskiy, A.; Kim, V.G.; Funkhouser, T. Shape-based recognition of 3D point clouds in urban environments. In Proceedings of
the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009; pp. 2154–2161.
[CrossRef]
2. Gao, G.; Liu, Y.S.; Lin, P.; Wang, M.; Gu, M.; Yong, J.H. BIMTag: Concept-based automatic semantic annotation of online BIM
product resources. Adv. Eng. Inform. 2017, 31, 48–61. [CrossRef]
3. Han, Z.; Liu, Z.; Han, J.; Vong, C.M.; Bu, S.; Chen, C.L.P. Mesh Convolutional Restricted Boltzmann Machines for Unsupervised
Learning of Features with Structure Preservation on 3-D Meshes. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2268–2281.
[CrossRef] [PubMed]
4. Zhong, S.; Zhong, Z.; Hua, J. Surface reconstruction by parallel and unified particle-based resampling from point clouds. Comput.
Aided Geom. Des. 2019, 71, 43–62. [CrossRef]
Sensors 2022, 22, 4742 19 of 21
5. Skrodzki, M.; Jansen, J.; Polthier, K. Directional density measure to intrinsically estimate and counteract non-uniformity in point
clouds. Comput. Aided Geom. Des. 2018, 64, 73–89. [CrossRef]
6. Zheng, Y.; Li, G.; Xu, X.; Wu, S.; Nie, Y. Rolling normal filtering for point clouds. Comput. Aided Geom. Des. 2018, 62, 16–28.
[CrossRef]
7. Gao, G.; Liu, Y.S.; Wang, M.; Gu, M.; Yong, J.H. A query expansion method for retrieving online BIM resources based on Industry
Foundation Classes. Autom. Constr. 2015, 56, 14–25. [CrossRef]
8. Zhu, Y.; Mottaghi, R.; Kolve, E.; Lim, J.J.; Gupta, A.; Fei-Fei, L.; Farhadi, A. Target-driven visual navigation in indoor scenes using
deep reinforcement learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA),
Singapore, 29 May–3 June 2017; pp. 3357–3364. [CrossRef]
9. Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum PointNets for 3D Object Detection from RGB-D Data. In Proceedings of the
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 918–927.
[CrossRef]
10. Yi, C.; Lu, D.; Xie, Q.; Liu, S.; Li, H.; Wei, M.; Wang, J. Hierarchical tunnel modeling from 3D raw LiDAR point cloud. Comput.
Aided Des. 2019, 114, 143–154. [CrossRef]
11. Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf.
Process. Syst. 2012, 25, 84–90. [CrossRef]
12. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [CrossRef]
13. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269.
[CrossRef]
14. Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020,
42, 2011–2023. [CrossRef]
15. Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the 2018
IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499.
[CrossRef]
16. Maturana, D.; Scherer, S. VoxNet: A 3D Convolutional Neural Network for real-time object recognition. In Proceedings of the
2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October
2015; pp. 922–928. [CrossRef]
17. Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D ShapeNets: A deep representation for volumetric shapes. In
Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June
2015; pp. 1912–1920. [CrossRef]
18. Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.; Nießner, M. ScanNet: Richly-Annotated 3D Reconstructions of
Indoor Scenes. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI,
USA, 21–26 July 2017; pp. 2432–2443. [CrossRef]
19. Klokov, R.; Lempitsky, V. Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models. In Proceedings
of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 863–872. [CrossRef]
20. Wang, P.S.; Liu, Y.; Guo, Y.X.; Sun, C.Y.; Tong, X. O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis.
Acm Trans. Graph. 2017, 36, 72. [CrossRef]
21. Riegler, G.; Ulusoy, A.O.; Geiger, A. OctNet: Learning Deep 3D Representations at High Resolutions. In Proceedings of the 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6620–6629.
[CrossRef]
22. Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E. Multi-view Convolutional Neural Networks for 3D Shape Recognition.
In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015;
pp. 945–953. [CrossRef]
23. Qi, C.R.; Su, H.; Nießner, M.; Dai, A.; Yan, M.; Guibas, L.J. Volumetric and Multi-view CNNs for Object Classification on 3D Data.
In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30
June 2016; pp. 5648–5656. [CrossRef]
24. Xie, J.; Dai, G.; Zhu, F.; Wong, E.K.; Fang, Y. DeepShape: Deep-Learned Shape Descriptor for 3D Shape Retrieval. IEEE Trans.
Pattern Anal. Mach. Intell. 2017, 39, 1335–1345. [CrossRef] [PubMed]
25. Kalogerakis, E.; Averkiou, M.; Maji, S.; Chaudhuri, S. 3D Shape Segmentation with Projective Convolutional Networks. In
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July
2017; pp. 6630–6639. [CrossRef]
26. Feng, Y.; Zhang, Z.; Zhao, X.; Ji, R.; Gao, Y. GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition. In
Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA,
18–23 June 2018; pp. 264–272. [CrossRef]
27. Charles, R.Q.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July
2017; pp. 77–85. [CrossRef]
Sensors 2022, 22, 4742 20 of 21
28. Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings
of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017;
pp. 5099–5108. [CrossRef]
29. Wu, W.; Qi, Z.; Fuxin, L. PointConv: Deep Convolutional Networks on 3D Point Clouds. In Proceedings of the 2019 IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9613–9622.
[CrossRef]
30. Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. ACM
Trans. Graph. 2018, 38, 1–12. [CrossRef]
31. Ran, H.; Zhuo, W.; Liu, J.; Lu, L. Learning Inner-Group Relations on Point Clouds. In Proceedings of the 2021 IEEE/CVF
International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 15457–15467. [CrossRef]
32. Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point Transformer. In Proceedings of the 2021 IEEE/CVF International Conference
on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 16259–16268. [CrossRef]
33. Guo, M.H.; Cai, J.X.; Liu, Z.N.; Mu, T.J.; Martin, R.; Hu, S.M. PCT: Point cloud transformer. Comput. Vis. Media 2021, 7, 187–199.
[CrossRef]
34. Liu, Z.; Hu, H.; Cao, Y.; Zhang, Z.; Tong, X. A Closer Look at Local Aggregation Operators in Point Cloud Analysis. In
Proceedings of the 2020 European conference on computer vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 326–342.
[CrossRef]
35. Brock, A.; Lim, T.; Ritchie, J.M.; Weston, N. Generative and Discriminative Voxel Modeling with Convolutional Neural Networks.
arXiv 2016, arXiv:1608.04236. [CrossRef]
36. Wang, D.Z.; Posner, I. Voting for voting in online point cloud object detection. In Proceedings of the Robotics: Science and
Systems, Rome, Italy, 13–17 July 2015; Volume 1, pp. 10–15. [CrossRef]
37. Li, Y.; Pirk, S.; Su, H.; Qi, C.R.; Guibas, L.J. FPNN: Field Probing Neural Networks for 3D Data. In Proceedings of the
30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 4–9 December 2016; pp. 307–315.
[CrossRef]
38. Engelcke, M.; Rao, D.; Wang, D.Z.; Tong, C.H.; Posner, I. Vote3Deep: Fast object detection in 3D point clouds using efficient
convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA),
Singapore, 29 May–3 June 2017; pp. 1355–1361. [CrossRef]
39. Choy, C.; Gwak, J.; Savarese, S. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. In Proceedings of
the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019;
pp. 3070–3079. [CrossRef]
40. Cheng, R.; Razani, R.; Ren, Y.; Bingbing, L. S3Net: 3D LiDAR Sparse Semantic Segmentation Network. In Proceedings of the
2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 14040–14046.
[CrossRef]
41. Akhtar, A.; Gao, W.; Zhang, X.; Li, L.; Li, Z.; Liu, S. Point Cloud Geometry Prediction Across Spatial Scale using Deep Learning.
In Proceedings of the 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP), Macau,
China, 1–4 December 2020; pp. 70–73. [CrossRef]
42. Xie, S.; Gu, J.; Guo, D.; Qi, C.R.; Guibas, L.; Litany, O. Pointcontrast: Unsupervised pre-training for 3d point cloud understanding.
In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 574–591.
43. Wang, C.; Pelillo, M.; Siddiqi, K. Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition. In Proceedings of
the British Machine Vision Conference (BMVC), London, UK, 4–7 September 2017; Volume 64, pp. 1–12. [CrossRef]
44. Wang, W.; Wang, T.; Cai, Y. Multi-view attention-convolution pooling network for 3D point cloud classification. Appl. Intell. 2021,
34, 3201–3212. [CrossRef]
45. Hamdi, A.; Giancola, S.; Ghanem, B. MVTN: Multi-View Transformation Network for 3D Shape Recognition. In Proceedings of
the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 1–11.
[CrossRef]
46. Wang, W.; Zhou, H.; Chen, G.; Wang, X. Fusion of a Static and Dynamic Convolutional Neural Network for Multiview 3D Point
Cloud Classification. Remote Sens. 2022, 14, 1996. [CrossRef]
47. Zhao, H.; Jiang, L.; Fu, C.W.; Jia, J. PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing. In Proceedings
of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019;
pp. 5560–5568. [CrossRef]
48. Atzmon, M.; Maron, H.; Lipman, Y. Point Convolutional Neural Networks by Extension Operators. ACM Trans. Graph. 2018,
37, 1–12. [CrossRef]
49. Komarichev, A.; Zhong, Z.; Hua, J. A-CNN: Annularly Convolutional Neural Networks on Point Clouds. In Proceedings of
the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019;
pp. 7413–7422. [CrossRef]
50. Zhou, H.; Feng, Y.; Fang, M.; Wei, M.; Qin, J.; Lu, T. Adaptive Graph Convolution for Point Cloud Analysis. In Proceedings of the
2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 4945–4954.
[CrossRef]
Sensors 2022, 22, 4742 21 of 21
51. Wan, J.; Xie, Z.; Xu, Y.; Zeng, Z.; Yuan, D.; Qiu, Q. DGANet: A Dilated Graph Attention-Based Network for Local Feature
Extraction on 3D Point Clouds. Remote Sens. 2021, 13, 3484. [CrossRef]
52. Lin, H.; Zheng, W.; Peng, X. Orientation-Encoding CNN for Point Cloud Classification and Segmentation. Mach. Learn. Knowl.
Extr. 2021, 3, 601–614. [CrossRef]
53. Chen, C.; Fragonara, L.Z.; Tsourdos, A. GAPointNet: Graph attention based point neural network for exploiting local feature of
point cloud. Neurocomputing 2021, 438, 122–132. [CrossRef]
54. Zhang, C.; Song, Y.; Yao, L.; Cai, W. Shape-oriented convolution neural network for point cloud analysis. In Proceedings of the
AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12773–12780. [CrossRef]
55. Wen, X.; Han, Z.; Liu, X.; Liu, Y.S. Point2SpatialCapsule: Aggregating Features and Spatial Relationships of Local Regions on
Point Clouds Using Spatial-Aware Capsules. IEEE Trans. Image Process. 2020, 29, 8855–8869. [CrossRef] [PubMed]
56. Han, W.; Wen, C.; Wang, C.; Li, X.; Li, Q. Point2Node: Correlation Learning of Dynamic-Node for Point Cloud Feature
Modeling. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34,
pp. 10925–10932. [CrossRef]
57. Wu, W.; Zhang, Y.; Wang, D.; Lei, Y. SK-Net: Deep learning on point cloud via end-to-end discovery of spatial keypoints. In
Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 6422–6429.
[CrossRef]
58. Xu, M.; Zhou, Z.; Qiao, Y. Geometry sharing network for 3d point cloud classification and segmentation. In Proceedings of the
AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12500–12507. [CrossRef]
59. Hu, M.; Ye, H.; Cao, F. Convolutional neural networks with hybrid weights for 3D point cloud classification. Appl. Intell. 2021, 51.
[CrossRef]
60. Bello, S.A.; Wang, C.; Wambugu, N.M.; Adam, J.M. FFPointNet: Local and global fused feature for 3D point clouds analysis.
Neurocomputing 2021, 461, 55–62. [CrossRef]
61. Hao, F.; Song, R.; Li, J.; Cao, K.; Li, Y. Cascaded geometric feature modulation network for point cloud processing. Neurocomputing
2022, 492, 474–487. [CrossRef]
62. Huang, G.; Li, Y.; Pleiss, G.; Liu, Z.; Hopcroft, J.E.; Weinberger, K.Q. Snapshot Ensembles: Train 1, get M for free. arXiv 2017,
arXiv:1704.00109.
63. Liu, Y.; Fan, B.; Xiang, S.; Pan, C. Relation-Shape Convolutional Neural Network for Point Cloud Analysis. In Proceedings of
the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019;
pp. 8887–8896. [CrossRef]
64. Xie, S.; Liu, S.; Chen, Z.; Tu, Z. Attentional ShapeContextNet for Point Cloud Recognition. In Proceedings of the 2018 IEEE/CVF
Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4606–4615. [CrossRef]
65. Shen, Y.; Feng, C.; Yang, Y.; Tian, D. Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling. In
Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June
2018; pp. 4548–4557. [CrossRef]
66. Gadelha, M.; Wang, R.; Maji, S. Multiresolution Tree Networks for 3D Point Cloud Processing. In Proceedings of the European
Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 105–122. [CrossRef]
67. Wang, C.; Samari, B.; Siddiqi, K. Local Spectral Graph Convolution for Point Set Feature Learning. In Proceedings of the
European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu,
C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 56–71.
68. Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on x-transformed points. In Proceedings of the International
Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 828–838.
69. Liu, X.; Han, Z.; Liu, Y.S.; Zwicker, M. Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-
based Sequence to Sequence Network. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA,
27 January–1 February 2019; Volume 33, pp. 8778–8785. [CrossRef]
70. Jing, W.; Zhang, W.; Li, L.; Di, D.; Chen, G.; Wang, J. AGNet: An Attention-Based Graph Network for Point Cloud Classification
and Segmentation. Remote Sens. 2022, 14, 1036. [CrossRef]
71. Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. RandLA-Net: Efficient Semantic Segmentation
of Large-Scale Point Clouds. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), Seattle, WA, USA, 16–20 June 2020; pp. 11105–11114. [CrossRef]
72. Poux, F.; Ponciano, J.J. Self-Learning Ontology for Instance Segmentation of 3d Indoor Point Cloud. ISPRS—Int. Arch. Photogramm.
Remote Sens. Spat. Inf. Sci. 2020, 43B2, 309–316. [CrossRef]
73. Huang, Q.; Wang, W.; Neumann, U. Recurrent Slice Networks for 3D Segmentation of Point Clouds. In Proceedings of the 2018
IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2626–2635.
[CrossRef]