0% found this document useful (0 votes)
7 views6 pages

An Efficient Framework Using CRF Models For 3D Mesh Co-Segmentation

This document presents an unsupervised approach for 3D shape co-segmentation using local geometric features, focusing on segmenting unlabeled 3D shapes into distinct parts and establishing correspondences between them. The method employs a Stacked Sparse AutoEncoder for representation learning and integrates a Gaussian Mixture Model and Conditional Random Fields to enhance label consistency and refine boundaries. Experimental results demonstrate improved co-segmentation outcomes, validated using the Princeton Segmentation Benchmark and Shape COSEG datasets.

Uploaded by

abouqora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views6 pages

An Efficient Framework Using CRF Models For 3D Mesh Co-Segmentation

This document presents an unsupervised approach for 3D shape co-segmentation using local geometric features, focusing on segmenting unlabeled 3D shapes into distinct parts and establishing correspondences between them. The method employs a Stacked Sparse AutoEncoder for representation learning and integrates a Gaussian Mixture Model and Conditional Random Fields to enhance label consistency and refine boundaries. Experimental results demonstrate improved co-segmentation outcomes, validated using the Princeton Segmentation Benchmark and Shape COSEG datasets.

Uploaded by

abouqora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

An Efficient Appraoch based Representation

Learning and CRF Models for Unsupervised 3D


Shape Co-segmentation
Youness Abouqora Zakaria Benhaila Lahcen Moumoun
Laboratory of Mathematics, Computer Laboratory of Mathematics, Computer Laboratory of Mathematics, Computer
Science and Engineering Sciences Science and Engineering Sciences Science and Engineering Sciences
Faculty of Sciences and Techniques, Faculty of Sciences and Techniques, Department of Computer Engineering
Hassan 1st University Hassan 1st University and Mathematics, Hassan 1st
Settat, Morocco Settat, Morocco University
[email protected] [email protected] Settat, Morocco
[email protected]

Abstract—We present in this article an outline approach for input mesh. Each signature can describe pertinently a portion
the unsupervised 3D shape co-segmentation based on local of the geometric features. Secondly, we construct the
geometric features. The unsupervised 3D shape co- correspondence between the parts with the same semantic
segmentation consists to divide a collection of unlabeled 3D information. The following step is to build a deep neural
shapes into distinct parts by finding the relationship between network Stacked Sparse AutoEncoder (SSAE) for the
them. Typically, two main steps can be followed: finding representation learning, so that the high-level features can be
correspondence and learning representation. In our extracted from the computed geometric signatures. Thirdly,
contribution, we firstly propose to segment each shape into
we combine a Gaussian Mixture Model (GMM) as a
primitive patches and finding the correspondence between
those from to the same class. Secondly, for the learning process
clustering model to predict the true final label of a patch
we propose to extracts the latent features from the geometric based on the inherent features. Our architecture incorporates
information through the representation learning module a surface-based Conditional Random Field (CRF) [35] as the
combined to a second one that aggregates the generated last layer to promote consistent patch labelling and refine
features to assign consistent labels to each of the resulting their boundaries. The entire network is trained from
clusters. However, the problem of miss-labeling on the handcrafted feature designed by domain experts to achieve
occluded or adjacent regions of the shapes may persist in this specific optimal performance. To evaluate the effectiveness
process. To overcome this problem, the label of each patch is of our approach, we use the Princeton Segmentation
viewed as a sequence of random variables which are trained Benchmark [32] and the Shape COSEG [33] as datasets for
using Conditional Random Field (CRF) model. The test.
experimental stage shows that the adopted approach generates
better co-segmentation results. The rest of the paper is structured as follows. Section 2
reviews related work on 3D descriptors and shape co-
Keywords— shape correspondence, representation learning, segmentation. Section 3 presents the general framework for
3D shape co-segmentation, Conditional Random Field co-segmentation followed by a detailed process for
constructing high-level features and the prediction paradigm.
I. INTRODUCTION In Section 4, we discuss detailed experimental results, as well
Recently, the 3D shape co-analysis has gained a lot of as a comparison with the state of the art. Finally, we
interest. Current research works have shown that processing conclude in section 5.
a collection of shapes at once yields more semantic
information than analyzing each shape separately, and using II. RELATED WORK
this semantic information to co-segment a group of shapes
will lead to better segmentation outcomes [1, 6]. Co- For the 3D shape co-segmentation, various techniques
segmentation is a key task in high-level pattern co-analysis, [2, 18] have been developed. Their goal is to establish
due to its ability to simultaneously segment the forms of the correspondences between a group of shapes to divide them
same class into semantically consistent parts with into meaningful parts. By minimizing a specified distortion
correspondence [8, 13]. Because of its consistency and its energy, the search for correspondences between shapes aims
inherently contextual, the co-segmentation has been used in to locate point-to-point maps [23, 24] or matches between
broad areas such as modelling [9], pattern matching [10], feature points [25]. Nevertheless, these approximations
texturing [11] and medical image analysis [12]. frequently fail to consider the uniqueness structural attributes
Extracting crucial knowledge associated with multiple of the shapes. To work around these restrictions, Kleiman et
shapes for consistent segmentation is a challenging task [13], al. [22] create a flexible method that matches shapes with
especially when the corresponding shape segments differ distinct geometries from various classes, by using global
greatly in geometry or even topology. To address this issue, information. Golovinskiy and Funkhouser's [1] achieved a
we worked an unsupervised co-segmentation approach consistent segmentation of a set by aligning all input shapes
founded on sparse feature learning from a collection of and grouping their primitive patches. According to affinity
shapes belonging to the same class. Firstly, we decompose aggregation spectral clustering, Wu et al. [4] presented a
each 3D mesh into primitive patches to generate their over- variety of feature fusion approaches for shape co-
segmentation using the p-spectral clustering segmentation segmentation. In order to learn the high-level features
algorithm [21]. This decomposition provides consistency created on the patches, Shu et al. [5] suggested an
which allows computation of various shape signatures for the unsupervised method. The performance of these methods is

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


generally quite promising. However, because of their distinguish them. Therefore, six geometric descriptors were
constrained and limited low-level feature space, along with a selected and extracted for each particular face in the mesh,
lack of diversity, these methods frequently encounter namely spin image (SI) [29], average geodesic distance
difficulties when dealing with dataset that exhibit significant (AGD) [28], shape context (SC) [27], shape diameter
variation. function (SDF) [30], scale-invariant HKS (SIHKS) [31], and
To overcome these limitations, supervised methods conformal factor (CF) [26]. As in the work of Hu et al [5], a
have been suggested to deal with 3D shape segmentation as histogram is computed for each feature descriptor against all
a labeling issue. For instance, Kalogerakis et al. [14] faces to quantify its distribution on each patch. The number
of bins in this histogram is set to 100. We then recombine the
conceive classifiers that can predict class labels using
six histograms into a feature vector serving as the concludin
traditional machine learning methods. Furthermore, based
on the positive results of Convolutional Neural Networks feature descriptor f i for the patch indexed by i . Lastly, co-
(CNN) in computer vision tasks, Guo et al. [15] introduce a analysis is established to compute the co-segmentation of
deep learning framework that uses CNN to predict labels. patches and their labelling. Hence, the correspondence
Further techniques included PointNet [17] which directly between segments of the same class of patches is required
considers points as input and generates class labels. and should have a joint label.
Meanwhile, Kalogerakis et al. [19] combine image-based B. High-level Features Learning and Clustering Approach
Fully Convolutional Networks (FCNs) with Conditional To build a high-level features, we adopt an unsupervised
Random Fields (CRFs) and surface-based networks to architecture based on Stacked AutoEncoder network [16].
obtain consistent 3D shape segmentations. Our previous The proposed Stacked Sparse AutoEncoder is a neural
work [20], used an end-to-end supervised CNN-CRF network made up of numerous sparse autoencoders
architecture to predict the mesh segment label based on the connected end-to-end. To produce higher-level feature
geometric features of neighbouring of triangular faces. representations of the input data, the output of the previous
We observed that the effectiveness of supervised layer of sparse self-encoder is used as the input of the
methods heavily relies on high-quality training data that following layer of self-encoder. The output decodes this new
encompasses wide range of shape variations. the creation of representation to reconstruct the input y according to (1) and
such datasets is a manual process with limited coverage. (2).
Moreover, supervised approaches come with significant
training costs. L=h(Wy+ b) (1)
Compared to the previously mentioned approaches, we
introduce a deep architecture based on the representation ' '
D=s (W y+ b ) (2)
learning from the geometric features joined conditional
random fields model to get a coherent segment of the
collection of the shape. The proposed approach employs a where L the new representation and y is the input. W
bag of features to efficiently integrate a variety of shape and W ' are weight matrices, b and b ' are the encoder and
signatures and generate the relevant high-level decoder bias vectors, respectively. h is the activation
characteristics. Then the generated features can be function for the hidden layer neurons, and s is the activation
successfully clustered and inferred using CRF layers which function for the output layer. Instead of using Relu, Tanh,
promote a coherent labelling of the whole surface of shapes. and other activation functions, we use the sigmoid activation
Lastly, our architecture is an end-to-end trained model, that function shown in (3).
include all the surface processing steps, to achieve optimal

(3)
performances. 1
h=s= −y
III. RESEARCH METHOD 1+e
Inspired by the use of a variety of unsupervised
segmentation techniques [33, 2, 17], our proposed method
focuses simultaneously on segmenting a collection of shapes The mean squared error (MSE) function is applied in the
into primitive parts and constructing correspondences reconstruction error function MSE between the input y and
between them as shown in Fig. 1. Firstly, triangular rebuilt input y ' (4).
geometric features are computed and quantified for each
segment using the bag of features paradigm. Second, we
(4)
N
1
introduce an unsupervised network that integrates sparse
regularization and classification for latent features extraction.
MSE=
N
∑ y i + y ' 2i
i=1
Finally, the encoded features are used as an input into the
mixture classification models to predict clusters. Afterward, where N represent the number of input samples.
the network is connected to the CRF model as a last layer,
which promotes a coherent labeling estimated by considering The sparse autoencoder is used in this work to produce an
its neighbors. efficient low-level representation of the source data despite
the sparse constraints. Therefore, we include the sparse
A. Over-segmentation and Descriptors Calculation penalty term in the cost function (7) using Kullback-Leibler
The primary step in our approach is to divide each shape (KL) divergence (5) [34].
in the input data collection into smaller regions. We carry out

(5)
this over-segmentation using p-spectral clustering [21] to d
ρ 1−ρ
produce patches that are initially set to 50 per shape. Then, KL ( ρ‖^ρi )=∑ ρ log ⁡( )+(1−ρ)log ⁡( )
geometric features are calculated to characterize the patches, i=1 ^ρi 1− ρ^ i
which allows informative features to be chosen in order to
To handle the weights and avoid overfitting, an L 2 that predict the labels for each patch i of the mesh.
regularization (6) is added to the cost function (7). Nevertheless, it doesn’t consider the smoothness and the
consistency of the label attribution. The unary term is defined
nl −1 sl sl +1
L 2 R= ∑ ∑ ∑ ( w (jil )) (6)
1 2
as in (9):

(9)
2 l=1 i=1 j=1 ϕ ( x i )=−logP ( x i)
Where nl enotes the number of layers, sl is the number of For the pairwise energy, it is a smooth term that promotes
(l )
neurons in the hidden layer s and w ji the weight value the assignment of similar labels to boundary faces with
between node i in the layer l and node j in layer l+1. similar properties. Similarly to our previous work [20], when
modeling pairwise connections among these variables, we
The cost function E for training a sparse autoencoder is used both proximity and surface curvature. For each neighbor
given by (7). pair of faces (f i , f j ), we set a factor that promotes the same
label for those faces sharing a normal vector, and different

(7)
sl
labels otherwise. Considering the angle θ f i ,f j in between
E=MSE+ λ× L2 R+ β × ∑ KL ( ρ‖^ρi )
j=1 their normals (θ f i ,f j is divided by π to map it into [0;1]), the
factor is given in (10):
λ represents the coefficient of L 2 R , β is the sparsity

{
2
regularization parameter that sets the sparsity penalty term −ω adj . ωl ,l .θ f , f , l i ≠ l j
(10)
and ρ is the sparsity proportion that controls the required ψ adj ( x i , x j )= i j i j

−ω adj . ωl , l . ( 1−θ ) , li=l j


2
level of sparsity. i j fi,f j

In order to successfully co-segment a set of 3D objects,


we replace the decoder layer of the proposed autoencoder
with an output layer for label prediction by clustering all ω adj and ω li, l j are the learned-factor and the label-
shape patches on a common basis. Assumption that these dependent weights.
models have possession in equal number of clusters and that
a set of corresponding clusters exhibit similarity, we adopt For our implementation, such factors are set for pairs of
the GMMs which predict the probabilities that a patch polygons with geodesic distance of less than 10% of the
belongs to each cluster. In feature extraction, a patch in the bounding sphere's radius. This renders in our CRF models a
relatively dense and more sensitive to long range interactions
mesh is associated with a point x in the feature space.
between the surface variables. The geodesic distance-based
However, as a result of the clustering model, the factors are identified as shown in (11):
probability distribution function over the set of semantic

{
classes for each patch is estimated independently of its −ω geo . ω l ,l . d f , f ,l i ≠ l j
2

(11)
neighbors, leading to a likely local label inconsistency. As a ψ geo ( x i , x j )= i j i j
2
result, we enforce local label consistency using a surface- −ω geo . ωl ,l . (1−d
i j
f i ,f j ) ,l i=l j
based CRF (Conditional Random Field) approach [35] that
makes dense and structured predictions by utilizing
contextual information. The factor-dependent weight ω geo and label-dependent
weights ω li , l j are the learned parameters, and d f i ,f j
C. Co-Segmentation Refinement
represents the geodesic distance between f i and f j (distances
Built Conditional Random Fields (CRFs) have been
are normalized to [0; 1]).
successfully employed to co-segmentation refinement as a
post-processing approach [20], however their performance The predicted output value of x for M is generated by
relies on the appropriate optimization of the settings for each minimizing E(x ) (8). An exact inference of the CRF
dataset and the used prediction module.
distribution is intractable. However, we use mean-field
We assume that for each patch i∈ M (M the set of 3D inference to approximate the most likely joint assignment to
models), x i is a random variable associated to N patches of all random variables P(x∨M ) using a simpler distribution
M . This random variable is employed to assigns a label Q(x∨M ) which can be written as a product of
l i ∈ L to i, where L=(l 1 , … , l n) is the set of labels. independent marginals (12):

Conditional random fields (CRFs) define the energy term as Q ( x|M )=∏ Qi (x i∨M ).(12)
in (8): i

(8)
E ( x )=∑ ϕ ( x i ) + ∑ ψ (x i , x j) This approximate distribution is resulted by minimizing
i i≠ j the KL-divergence between the distributions P and Q . In the
case of the Gaussian distribution [35], the mean field
The unary term ϕ ( x i ) quantizes the cost of assigning x i approximation Q and the original distribution P have the
to patch i with the probability P(x i )and the pairwise term same mean. Hence, finding the Maximum A Posteriori
(MAP) probability solution x is equivalent to finding the
ψ (x i , x j) quantizes the cost of jointly assigning ( x i , x j ¿ to mean of the distribution Q .
patches i and j simultaneously. Moreover, the unary
energies are acquired from the GMM clustering algorithms
IV. RESULTS AND DISCUSSION Category Without
With CRF
CRF
Refinement
A. Dataset and Ground Truth Refinement
(%)
(%)
Evaluation of the performance of our co-segmentation is 90.5 91.30
Human
implemented for each category of a segmentation dataset
composed of 16 categories taken from the Princeton Cup 91.8 92.28
Segmentation Benchmark (PSB) [32] (except Bearing, Mech Glasses 98.05
and Bust, which have no obvious significant correspondence 96.57
between segments) and four categories taken from COSEG Airplane 91.04 92.84
[33]. Given that our approach is completely unsupervised, Ant 98.6 98.79
ground truth is only used for statistical evaluation. The Chair 97.23 98.35
choice of optimization parameters for our system is a crucial Octopus 97.3 98.49
step. For sparsity-based regularization parameters, the values Table 98.5 99.69
of λ , β , and ρ are set to 0.0001, 0.01 and 0.5 tuned Teddy 97.86 98.93
Hand 81.83 82.54
experimentally, where the number of iterations T is set to Plier 95.45 96.30
300. To evaluate co-segmentation results, the average Fish 92.0 92.28
accuracy which measures the correctly labelled shape area is Bird 88.5 89.43
adopted, as in (13). Armadillo 92.8 94.59
Vase 84.8 85.01

∑ ai δ (map ( li )−t i) Fourleg


Average
83.0
92.36
85.27
93.50

(13)
Accuracy (l , t )= i ∈T
∑ ai The average labeling accuracy for all classes stands at
i ∈T
approximately 92% during the initial co-segmentation and
increases to 93.5% after refinement. With the exception of
Let a i represent the area of patch i , l i denote the one class, all other classes achieve accuracies of at least 84%.
category label obtained from our co-segmentation, t i indicate The 10% accuracy gap between certain classes can be
the corresponding ground-truth label, the function δ (x) is a attributed to the increased variability observed in the hand,
function that equals to 1 if and only if x equals 0 . bird, and vase categories. Additionally, there is a 2%
improvement between the initial and refined co-segmentation
Additionally, the function map(.) refers to an optimal
processes. This improvement is a result of the boundary
permutation function that uses the Hungarian algorithm [36] refinement, which corrects over-segmented or mislabeled
to associate each cluster label with a category label. parts from the initial co-segmentation using statistical
B. Results models, as depicted in Figure 3.
As shown in Figure 2, the proposed co-segmentation C. Comparison
approach aims to segment shapes with different orientations, To demonstrate the effectiveness of our approach, we
locations and shape variations into consistent parts. By conduct a comparison with state-of-the-art methods. As
setting up the number of segmented components as a indicated in [4], accuracy is calculated as the percentage of
parameter, reasonable co-segmentation results are obtained. correctly classified faces' area over the total surface area. The
We can naturally obtain good co-segmentation results performance evaluation of existing methods relies on
with distinct segmented parts because the proposed multi- publicly reported results in the literature. The statistical
scale feature descriptor is an effective representation for a evaluation results are presented in Table 2 and Table 3, with
variety of shapes and the used symmetry-aware embedding the most significant outcomes highlighted in bold. For our
can also construct correspondences of the same semantic experiments, we designed two sets of tests: leave-one-out
components. Despite the topological differences between the cross-validation for the PSB dataset [32] and 5-fold cross-
models in the vases and candelabra categories, particularly validation on the COSEG dataset [33].
the shapes in the vases category that contain various
structural handles, their identical semantic components can
be segmented using our unsupervised co-segmentation TABLE II. LEAVE-ONE-OUT CROSS VALIDATION ON THE PSB
algorithm. DATASET

Moreover, to assess the quality of the results Category Kalo Sidi Hu[5 Wang[
Ours (%)
[13] [3] ] 4]
quantitatively, we show a statistical evaluation with and
Human 94.5 – 70.40 78.22 91.30
without refinement in Table 1. The first column corresponds
to the segmentation accuracy of the initial co-segmentation, Cup 93.8 – 97.40 97.01 92.28
and the second column proves the segmentation accuracy Glasses 96.6 – 98.30 98.50 98.05
after the refined labeling. Each input is the average accuracy
Airplane 93.0 – 83.30 85.79 92.84
for all the shapes in same category. The accuracy for a single
Ant 98.6 – 92.90 97.58 98.79
shape is defined by Equation 13. Since the proposed co- Chair 98.5 – 89.60 91.18 98.35
segmentation does not return labels associated with particular Octopus 98.3 – 97.50 97.79 98.49
semantic classes, we first determine the best one-to-one Table 99.5 – 99.00 99.25 99.69
match between the obtained labels and the ground truth Teddy 97.7 – 97.10 98.42 98.93
labels before calculating accuracy. The matching is applied Hand 84.8 – 91.90 94.68 82.54
consistently to the entire set. Plier 95.5 – 86.00 87.63 96.30
Fish 96.0 – 85.60 86.59 92.28
Bird 88.5 – 71.50 85.54 89.43
TABLE I. AVERAGE CO-ANALYSIS LABELLING ACCURACY USING Armadillo 92.8 – 87.30 96.09 94.59
LEAVE-ONE-OUT
Category Kalo Sidi Hu[5 Wang[
Ours (%) formulated based on geodesic distances and dihedral angles
[13] [3] ] 4] between adjacent faces on boundaries.
Vase 86.8 – 80.20 86.32 85.01
Fourleg 85.0 77.3 88.70 84.30 85.27 The experimental results demonstrate the efficiency of
Average 93.74 77.3 88.54 91.55 93.38 our proposed approach in extracting consistent parts across
the model set. Notably, it showcases insensitivity to pose and
Comparing the performance of our approach with shape variations, as well as robustness in handling outliers.
literature unsupervised methods [3, 5, 4] and the supervised In a future work, we will attempt to improve the
method [14], the accuracy of the proposed method produces performance of our proposed approach by using a large
similar results, as shown in Table 2 and Table 3. Notice that variety of 3D objects from multiple areas. We will also
we only discuss sets with low accuracies. Furthermore, to develop a new approach for shape recognition using end-to-
make results comparable for the Bird and Plane sets, we use end deep representation learning.
results from segmenting them into three classes. Comparison
is made between our co-segmentation results and the public REFERENCES
results as shared by the authors. Particularly, for [14], we [1] A. Golovinskiy and T. Funkhouser, “Consistent segmentation of 3D
adopted their leave-one-out experimental results, which is the models,” Computers & Graphics, vol. 33, no. 3, pp. 262–269, 2009.
most accurate matching results shared by the authors. [2] Z. Wu, R. Shou, Y. Wang, and X. Liu, “Interactive shape co-
segmentation via label propagation,” Computers & Graphics, vol.
We quantify the co-segmentation results by computing 38, pp. 248–254, 2014.
the total area of the given mesh that is labelled correctly [5]. [3] O. Sidi, O. van Kaick, Y. Kleiman , H. Zhang, and D. Cohen-Or,
We notice that the average accuracy of our approach is at “Unsupervised co-segmentation of a set of shapes via descriptor
least 82.54 % for these categories, which is higher than the space spectral clustering,” ACM Transactions on Graphics, vol. 30,
no. 6, pp. 126:1–126:9, 2011.
accuracies provided by the Hu’s method (70.4 % for human, [4] Z. Wu, Y. Wang, R. Shou et al., “Unsupervised co-segmentation of
71.5 % for bird). Furthermore, accuracies of Armadillo and 3D shapes via affinity aggregation spectral clustering,” Computers &
Foureleg sets are also very close to ours. However, our Graphics, vol. 37, no. 6, pp. 628–637, 2013.
method outperforms significantly theirs in the other three [5] R. Hu, L. Fan and L. Liu, “Co‐segmentation of 3d shapes via
categories. This achievement can be explained by the use of subspace clustering,” In Computer graphics forum, Oxford, UK:
Blackwell Publishing Ltd, vol. 31, no. 5, pp. 1703–1713, 2012.
our sparse network that performs an efficient feature [6] Z. Shu, C. Qi, S. Xin et al., “Unsupervised 3D shape segmentation
selection for gaussian mixture based-clustering. Note also and co-segmentation via deep learning,” Computer Aided Geometric
that the cited unsupervised methods are unstable when the Design, vol. 43, pp. 39–52, 2016.
data contains much noise. Compared to Kalogerakis [7] Q. Huang, V. Koltun and L. Guibas, “Joint-shape segmentation with
approach, our method achieves less performance; their linear programming,” ACM Transactions on Graphics, vol. 30, no. 6,
pp. 1–11, 2011.
approach is supervised and requires labelled training data [8] T. Funkhouser, M. Kazhdan, P. Shilane et al., “Modeling by
which is insipid and time-consuming. example,” ACM Transactions on Graphics, vol. 23, no. 3, pp. 652–
663, 2004.
TABLE III. 5-FOLD CROSS VALIDATION ON THE COSEG DATASET [9] P. Simari, D. Nowrouzezahrai, E. Kalogerakis et al., “multi-objective
shape segmentation and labeling,” Computer Graphics Forum, vol.
Category Kalo Sidi Hu[5 Wang[4] 28, no. 5, pp. 1415–1425, 2009.
Ours (%)
[13] [3] ] [10] L. Shalom, A. Shapira, D. Shamir et al., “Part analogies in sets of
Guitar 98.0 87.2 98.0 98.64 96.67 objects,” In Eurographics Workshop on 3D Object Retrieval, pp. 33–
93.0 94.3 90.7 95.27 94.65 40.
Lamp [11] E. Zhang, K. Mischaikow and G. Turk, “Feature-based surface
Candelabra 95.4 84.4 93.9 96.72 97.96 parameterization and texture mapping,” ACM Transactions on
Graphics, vol. 24, no. 1, pp. 1–27, 2005.
Goblets 97.2 98.2 99.2 99.50 99.61 [12] I. Despotović, B. Goossens and W. Philips, “MRI segmentation of
Average 95.9 91.02 95.45 97.53 97.22 the human brain: challenges, methods, and applications,”
Computational and mathematical methods in medicine, 2015.
[13] O. Van Kaick, A. Tagliasacchi, O. Sidi et al., “Prior knowledge for part
correspondence,” Computer Graphics Forum, vol. 30, no. 2, pp. 553–
V. CONCLUSION 562, 2011.
This paper introduces a novel unsupervised approach for [14] E. Kalogerakis, A. Hertzmann, and K. Singh, “Learning 3D mesh
consistently segmenting a collection of 3D shapes from the segmentation and labeling,” ACM Transactions on Graphics, vol. 29,
no. 4, pp. 1–11, 2010.
same family in a database into their corresponding parts. [15] K. Guo, D. Zou and X. Chen, “3D Mesh Labeling via Deep
Initially, we perform an over-segmentation of each mesh, Convolutional Neural Networks,” ACM Transactions on Graphics,
breaking them into primitive patches and computing various vol. 35, no.1, pp. 3:1–3:12, 2015.
features related to them. These features are utilized to assess [16] G. Liu, H. Bao and B. Han, “A stacked autoencoder-based deep
the similarity between the patches. Subsequently, we reassign neural network for achieving gearbox fault diagnosis,” Mathematical
Problems in Engineering, 2018.
labels to these patches based on an established [17] C. R. Qi, H. Su, K. Mo et al., “PointNet: Deep Learning on Point
correspondence scheme for the entire class, and they are Sets for 3D Classification and Segmentation,” In Proceedings of the
simultaneously quantized using the bag-of-feature paradigm. IEEE conference on computer vision and pattern recognition, pp.
652–660, 2017.
To enhance the clustering results, we propose the [18] L. Yi, V. G. Kim, D. Ceylan et al., “A scalable active framework for
utilization of a stacked sparse autoencoder, enabling a region annotation in 3d shape collections,” ACM Transactions on
nonlinear mapping of high-dimensional features to lower Graphics, vol. 35, no. 6, pp. 1–12, 2016.
dimensions. This process contributes to the improvement of [19] E. Kalogerakis, M. Averkiou, S. Maji et al., “3D shape segmentation
with projective convolutional networks,” in proceedings of the IEEE
clustering based on the Gaussian Mixture Model conference on computer vision and pattern recognition, pp. 3779–
(GMM).Moreover, we apply the CRF (Conditional Random 3788, 2017.
Fields) inference with mean-field approximation for further [20] Y. Abouqora, O. Herouane, L. Moumoun et al., “A Hybrid CNN-
refinement. In this step, the output probabilities of the GMM CRF Inference Models for 3D Mesh Segmentation,” In 6th IEEE
clustering serve as the unary term, while the pairwise term is Congress on Information Science and Technology, pp. 296–301,
2021.
[21] M. Chahhou, L. Moumoun, M. El Far et al., “Segmentation of 3D [29] A. E. Johnson, and M. Hebert, “Using spin images for efficient
meshes using p-spectral clustering,” IEEE transactions on pattern object recognition in cluttered 3d scenes”, IEEE Transactions on
analysis and machine intelligence, vol. 36, no. 8, pp. 1687–1693, pattern analysis and machine intelligence. vol. 21, no. 5, pp. 433–
2014. 449, 1999.
[22] Y. Kleiman and M. Ovsjanikov, “Robust Structure‐Based Shape [30] L. Shapira, A. Shamir, and D. Cohen-Or, “Consistent mesh
Correspondence,” Computer Graphics Forum, vol. 38, no. 1, pp. 7– partitioning and skeletonisation using the shape diameter function,”
20, 2019. The Visual Computer, vol. 24, no. 4, pp. 249. 2008.
[23] M. Ovsjanikov, B. Chen, J. Solomon et al., “Functional maps: a [31] M. M. Bronstein, and I. Kokkinos, “Scale-invariant heat kernel
flexible representation of maps between shapes,” ACM Transactions signatures for non-rigid shape recognition,” In IEEE Computer
on Graphics, vol. 31, no. 4, pp. 1–11, 2012. Society Conference on Computer Vision and Pattern Recognition,
[24] V. G. Kim, Y. Lipman and T. Funkhouser, “Blended intrinsic maps,” 2010, pp. 1704–1711.
ACM Transactions on Graphics. vol. 30, no. 4, pp. 1–12, 2011. [32] X. Chen, A. Golovinskiy and T. Funkhouser, “A benchmark for 3D
[25] A. C. Berg, T. L. Berg and J. Malik, “Shape matching and object mesh segmentation,” ACM Transactions on Graphics, vol. 28, no. 3,
recognition using low distortion correspondences,” In 2005 IEEE pp. 1–12, 2009.
computer society conference on computer vision and pattern [33] Y. Wang, S. Asafi, O. Van Kaick et al., “Active co-analysis of a set
recognition (CVPR'05). vol. 1, pp. 26–33, 2005. of shapes,” ACM Transactions on Graphics, vol. 31, no. 6, pp. 1–10,
[26] M. Ben-Chen, and G. Craig, “Characterizing Shape Using Conformal 2012.
Factors,” In 3D Object Retrieval, pp.1–8, 2008. [34] F. Perez-Cruz, “Kullback–Leibler divergence estimation of
[27] S. Belongie, J. Malik and J. Puzicha, “Shape matching and object continuous distributions,” In IEEE International Symposium on
recognition using shape contexts,” IEEE transactions on pattern Information Theory, vol. 31, no. 6, pp. 1666–1670, 2008.
analysis and machine intelligence, vol. 24, no. 4, pp.509–522, 2002. [35] P. Krähenbühl, and V. Koltun, “Efficient inference in fully
[28] M. Hilaga, Y. Shinagawa, T. Kohmura, and T. L. Kunii, “Topology connected crfs with gaussian edge potentials,” in Advances in Neural
matching for fully automatic similarity estimation of 3D shapes,” In Information Processing Systems, pp. 109–117, 2011.
Proceedings of the 28th annual conference on Computer graphics [36] C. H. Papadimitriou, and K. Steiglitz, “Combinatorial optimization:
and interactive techniques, pp. 203–212, 2001. algorithms and complexity,” Courier Corporation, 1998.

You might also like