(eccv) Package eccv Warning: Package ‘hyperref’ is loaded with option ‘pagebackref’, which is *not* recommended for camera-ready version
https://zhouzheyuan.github.io/r3d-ad
R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection
Abstract
3D anomaly detection plays a crucial role in monitoring parts for localized inherent defects in precision manufacturing. Embedding-based and reconstruction-based approaches are among the most popular and successful methods. However, there are two major challenges to the practical application of the current approaches: 1) the embedded models suffer the prohibitive computational and storage due to the memory bank structure; 2) the reconstructive models based on the MAE mechanism fail to detect anomalies in the unmasked regions. In this paper, we propose R3D-AD, reconstructing anomalous point clouds by diffusion model for precise 3D anomaly detection. Our approach capitalizes on the data distribution conversion of the diffusion process to entirely obscure the input’s anomalous geometry. It step-wisely learns a strict point-level displacement behavior, which methodically corrects the aberrant points. To increase the generalization of the model, we further present a novel 3D anomaly simulation strategy named Patch-Gen to generate realistic and diverse defect shapes, which narrows the domain gap between training and testing. Our R3D-AD ensures a uniform spatial transformation, which allows straightforwardly generating anomaly results by distance comparison. Extensive experiments show that our R3D-AD outperforms previous state-of-the-art methods, achieving 73.4% Image-level AUROC on the Real3D-AD dataset and 74.9% Image-level AUROC on the Anomaly-ShapeNet dataset with an exceptional efficiency.
Keywords:
3D anomaly detection, industrial applications, 3D reconstruction, self-supervised learning1 Introduction
Anomaly detection aims to identify instances containing anomalies and to precisely locate the specific positions of defects. This task is extensively applied across multiple fields and plays a crucial role in quality control within industrial production [29]. 3D anomaly detection [19] has emerged due to its intrinsic modality superior for avoiding blind spots in advanced processing and precision manufacturing. However, the discrete and disordered data form of point clouds makes it more difficult to acquire features compared to images. With the scarcity of anomalies, 3D anomaly detection also faces the problem of domain shift while only normal data are presented during training. The presence of these issues underscores the necessity and urgency of devising an efficient framework for the 3D anomaly detection task.
Similar to traditional 2D anomaly detection [29, 43], current 3D anomaly detection can be primarily categorized into embedding-based and reconstruction-based, as illustrated in Fig. 1. The embedding-based methods involve mapping features extracted with a pre-trained encoder onto a normal distribution for learning. Distributions that do not fall within the interval are classified as anomalies. Most existing 3D anomaly detection methods are based on a memory bank mechanism [11, 37, 19, 3], which stores some representative features during the training phase to implicitly construct a feature distribution. In the testing phase, the presence of anomalies is determined by calculating the Euclidean distance between the input test object and all template point clouds stored in memory. The reconstruction-based methods train a network capable of accurately reconstructing normal point clouds, under the presumption that anomalous point clouds will not be effectively reconstructed since they are not included during training. The anomaly map is produced through the comparison of discrepancies between the input point cloud and its reconstruction. IMRNet [18] employs PointMAE [26] to reconstruct the input in several iterations, getting the final anomaly map by calculating the explicit spatial coordinate differences and implicit deep feature differences of the point cloud, respectively.
However, existing methods face two key issues, high resource cost and irreparable reconstruction. Firstly, methods based on the memory bank [11, 37, 19, 3] store all features from the training phase, each test point cloud needs to be compared with all samples in the memory bank, significantly increasing memory overhead and inference time costs. This makes such methods almost inapplicable in real industrial production lines due to their inefficiency. Secondly, masked autoencoder (MAE) mechanism [9, 39, 26] only reconstructs the masked portions of the input, defects within unmasked portions may be preserved. This contradicts the fundamental assumption of detecting anomalies by comparing the original defect-containing point cloud with a reconstructed anomaly-free version. These methods inevitably lead to incorrect reconstructions, undermining their effectiveness in accurately localizing defects.
We propose R3D-AD, a novel 3D anomaly detection method that does not suffer from the space burden and time endurance in memory-based embedded models nor the anomaly unmasking probability in the MAE-based reconstructive models. In contrast to PointMAE, one of our key insights is to perform undifferentiated masking for 3D objects via the noise diffusion mechanism, which maximizes the preservation of anomaly-free shapes and reconstructs abnormal regions. In the reparameterized diffusion process, one-step full mask and reconstruction are achieved by converting the point cloud distribution, instead of the multiple iterative method [18]. We hypothesize that anomaly detection verifies the gap between the reconstructed shapes and the positive samples by learning point movement. Specifically, for input models with arbitrary anomalies, we encode them as latent shape embeddings as decoding conditions and explicitly control the point cloud reconstruction process by step-wise displacements (SWD) decoding. The shape embedding harbors abundant global features and makes it easier to train the network without dwelling on the introduction of local anomaly details. Another key to our approach is to implement a controllable method of point-wise displacement during the diffusion process to refine the point cloud deformation iteratively. We propose to inject latent shape embedding into each step of the inverse denoising process, which drives the anomalous regions to converge to a smooth surface. We further adopted a 3D anomaly simulation strategy Patch-Gen to address the limitations of the dataset, which generates abundant defectives by producing spatial irregularity that is faithful to the real scene, including bulges, sinks, etc. This point cloud data augmentation encourages the self-supervised model to reconstruct more realistic anomaly-free shapes when facing the actual anomaly.
To the best of our knowledge, this is the very first attempt at exploring diffusion in reconstruction-based 3D anomaly detection. Our main contributions are summarized as follows: (i) We introduce a novel framework, termed R3D-AD, which performs a one-step full mask and anomaly-free reconstruction for fast and accurate 3D anomaly detection. (ii) We propose to learn the step-wise displacement in the reverse diffusion process to explicitly control the reconstruction of anomalous shapes. (iii) We introduce a 3D anomaly simulation strategy named Patch-Gen to address the limitation of the data anomaly patterns and improve the reconstruction performance in a supervised setting. (iv) Extensive experiments demonstrate that our R3D-AD has achieved state-of-the-art performance on both Real3D-AD and Anomaly-ShapeNet datasets.
2 Related work
2.1 2D Anomaly Detection
Anomaly detection has received increasing attention from researchers in recent years, and many new methods have been proposed to address the problem. Flow-based methods [30, 8, 36, 40] use learned distributions and flow’s bijective properties to spot defects, while Memory-based approaches [29, 14, 1] gauge anomaly scores by contrasting test sample features with memory bank-stored norms. Reconstruction-based models [2, 43, 42] flag anomalies by comparing inputs to their online reconstructions. Recent works [16, 33, 12, 44] augment the anomaly detection datasets with generated synthetic anomalies to compensate for the negative example scarcity problem.
2.2 3D Anomaly Detection
This field lags behind the development of 2D anomaly detection since 3D data are harder to obtain, while point cloud data are sparser and contain more noise than image data. BTF [11] integration of handcrafted 3D descriptors with classic 2D method PatchCore [29], constructing a basic framework for 3D anomaly detection. M3DM [37] advances the field by separately analyzing features from point clouds and RGB images, then merging these for improved decision-making. CPMF [3] converts point clouds into two-dimensional images from multiple angles, extracting additional features from these images with a pre-trained network, and enhancing detection capabilities through information fusion. Reg3D-AD [19] develops a registration-based method, the RANSAC algorithm was used to align each sample before comparing it to the stored template during the test phase. IMRNet [18] trains a PointMAE [26] to reconstruct anomaly-free samples and identifies anomalies by juxtaposing the reconstructed point cloud against the initial input. Many of these use memory banks to store the features of the training samples or require multiple iterations to restore points. Unlike previous methods, our approach requires only one step of reconstruction and has significant advantages in both time and space efficiency.
2.3 Diffusion Models
Diffusion models have proven their effectiveness in several generative tasks, such as image generation [32], speech generation [15], and video generation [10]. Denoising Diffusion Probabilistic Models (DDPMs) [13, 35, 34] employ a forward noising mechanism, incrementally integrating Gaussian noise into images, alongside a reverse process meticulously trained to counteract the forward mechanism. Denoise AD [22] conducts DDPM for reconstructing within the features space, generating images that contain less noise. In recent years, many studies [25, 20, 6, 17] have attempted to use the diffusion model to explore the 3D reconstruction task. DPM [23] incorporates a shape latent variable to encapsulate the geometric intricacies of 3D shapes, it distinctively models this variable’s distribution utilizing Normalizing Flows [28, 7]. PVD [45] utilizes PVCNNs [21] for the point-voxel representation of 3D shapes and integrates structured locality into point clouds. This innovative approach leverages the strengths of both point and voxel representations, optimizing the model’s ability to capture the intricate spatial hierarchies and local geometries within 3D objects. Since diffusion-based reconstruction recovers the target shape from complete noise, the dilemma of reconstructing only the masked region in the MAE [9] mechanism does not exist.
3 Method
3.1 Overview
We model the anomaly detection problem as mapping an anomalous point cloud to a positive shape with which it is aligned. The framework of R3D-AD is shown in Fig. 2, where the simulated anomalous shapes are reconstructed in a self-supervised setting in the training phase and then compared with the original input to detect anomalies. The reconstructed anomaly-free model is aligned with the input, thus allowing direct computation of anomaly scores and segmentation of anomalous regions by conditioned distance functions. Simultaneously, the anomaly simulation strategy faithfully generates realistic defects and randomly synthesizes diverse anomaly shapes on normal samples, improving the generalization ability of the network in the case of limited anomaly samples.
3.2 Preliminary of denoising diffusion probabilistic models
A DDPM is inspired by the thermal diffusion process in an evolving thermodynamic system, which consists of a diffusion process and a reverse process.
The forward Markovian process gradually adds Gaussian noise to a clean sample from a data distribution and turns it into a Gaussian noise , which is defined as
(1) |
where is the Markov diffusion kernel, , is the number of diffusion steps, and is a variance schedule. We have by reparameterization with , . can be sampled by
(2) |
where is a standard Gaussian noise and . When is large enough, will eventually become a Gaussian noise.
The reverse process is also a Markovian process that denoises over a series of steps to generate meaningful data from the target distribution . The inverse process denoises the noise from a distribution , which is defined as
(3) |
where , the mean is estimated by a neural network parameterized by , is the latent condition encoding, and is a step-dependent variance. can be reparameterized as
(4) |
where is a neural network utilized to denoise the Gaussian noise from .
The training objective is minimized by training to approximate . The training objective is defined as
(5) |
where is sampled from the uniform distribution over 1,2, …, , is the distribution of , and is the Gaussian noise.
3.3 Diffusion-based 3D anomaly reconstruction
We formulate the point cloud reconstruction task of the anomaly-free model as the conditional generation, which decodes the explicit displacement with the target distribution , where is the decoding condition. The essential question of anomaly detection in this paper is how to conditional reconstruct anomaly-free shapes on the reference of input point clouds with different spatial transformations. Since there is a high similarity of global features between abnormal and normal samples during self-supervised reconstruction, the most immediate approach is to extract an efficient global feature from input to serve as an auxiliary conditional embedding for the denoising function . We implement the encoding of latent shape embedding as a conditional input to guide reconstruction in the reverse diffusion process.
3.3.1 Latent shape embedding
The feature encoder aims to encode the point cloud to the latent shape embedding with high-level features for the conditional generation process. Different from other global-local extracting methods [38, 41], we focus more on extracting global features, which characterize the semantic information of shape and pose of most anomaly-free regions in the point cloud. The feature encoder mainly consists of cascaded multi-layer perceptions (MLP) based on PointNet[5]. It implements max-pooling after mapping to different dimensions and then compresses them to extract the global shape embedding.
3.3.2 Step-wise displacement decoding
To achieve point cloud reconstruction with transformation consistency while preserving the structure of non-anomalous regions, our method injects latent shape embedding to the decoder at each step of the reverse diffusion process, as shown in Fig. 2. In principle, in the training phase, learns the added Gaussian noise in the forward diffusion process by the decoder to model the conditional probability distribution. Conditionally generating target shapes from Gaussian noise is a straightforward approach, but it is afflicted by the issues of reconstructing the point cloud details and transform consistency. Learning the relative deformation of points for anomalous objects is more efficient. Considering the mapping degradation of the vanilla autoencoder in the reconstruction training phase [22], we utilize the Gaussian noise of the forward process Eq. 2 to fully mask the point cloud object directly without blind spots, preventing the decoding process from receiving negative state shapes. The masked points and latent shape embedding are as the inputs of the SWD decoder. The point-wise displacement vector is generated at each step of the iterative process thus disentangling the prediction noise and the desired anomaly-free shape. The reverse process can be defined according to Eq. 3 and the displacement vector can be represented by
(6) |
where is the variance. A PointwiseNet is adopted for to decode the from the previous step and . is used to generate trigonometric position embedding = (, sin(), cos()). is concatenated with and then fed into the concatenate-squash linear module of PointwiseNet with a residual function. The output reconstructed point cloud at the step is . The registered original and reconstructed objects are distinguished from the anomalous shape by the anomaly scores based on the conditioned distance function.
3.4 3D anomaly simulation strategy
Given that a small number of normal samples is not conducive for the model to learning diverse and essential features, we propose the Patch-Gen strategy to simulate the defects from anomaly-free shapes for training data augmentation. Patch-Gen encourages the reconstruction model to learn to detect irregularity, where the anomaly-free point clouds and their diverse anomaly patterns are integrated into training pairs and are utilized to learn the discrimination feature between normal and anomalous surfaces. The intuition is that the diversity of simulated negative samples forces our network to learn how to reconstruct anomaly-free shapes instead of memorizing their complete outfits.
As shown in Fig. 3, the input normal point cloud is first randomly rotated. The random spatial rotation is designed to improve the generalization capability for test samples with very different spatial transformations, as defined by:
(7) |
where is the normal sample and is obtained by randomly selecting rotation angles for all three axes. In addition to global shape awareness of the model by the random rotation, we further perform a fine granularity of the anomaly simulation. We randomly take a viewpoint from the surface of the cube. Therefore, the patch of nearest points from can be determined according to the . The shape augmentation scheme Patch-Gen is defined as follows:
(8) |
where represents a normalization operation on a vector, is a predefined hyper-parameter that controls the scaling of the patch points, and is the translation matrix originating from a Gaussian distribution. The is finally obtained by only updating the patch region while keeping the rest points unchanged.
With the proposed Patch-Gen, we can simulate the generation of multiple anomalies, which is mainly done by controlling . Bulge or sink can be generated by sorting after sampling from the distribution, while damage can be generated by direct overlaying without manipulation. Fig. 7 further illustrates the contrast between the generated anomalies and actual ones, affirming that our approach can remarkably emulate real-world scenarios with a high degree of fidelity.
3.5 Training objective
In the reconstruction task of the object with points, the network learns a diffusion model with an mapping relation. Iterative denoising under the semantic condition of point embedding realizes the prediction of point offsets. Concretely, the network is trained to learn the noise that needs to be eliminated to recover the anomaly-free shape with the distance between the ground truth and the denoised reconstructed points. We make use of the mean squared error (MSE) loss as the primary reconstruction loss which evaluates the mean squared error of the element-wise distances between and . The MSE training loss is formulated as:
(9) |
4 Experiments
4.1 Datasets
Method | BTF[11] | M3DM[37] | PatchCore[29] | CPMF[3] | Reg3D-AD[19] | IMRNet[18] | Ours | ||
---|---|---|---|---|---|---|---|---|---|
Feat. | Raw | FPFH | PointMAE | FPFH | PointMAE | ResNet | PointMAE | PointMAE | Raw |
Airplane | 0.730 | 0.520 | 0.434 | 0.882 | 0.726 | 0.701 | 0.716 | 0.762 | 0.772 |
Candybar | 0.539 | 0.630 | 0.552 | 0.541 | 0.663 | 0.552 | 0.685 | 0.755 | 0.696 |
Car | 0.647 | 0.560 | 0.541 | 0.590 | 0.498 | 0.551 | 0.697 | 0.711 | 0.713 |
Chicken | 0.789 | 0.432 | 0.683 | 0.837 | 0.827 | 0.504 | 0.852 | 0.780 | 0.714 |
Diamond | 0.707 | 0.545 | 0.602 | 0.574 | 0.783 | 0.523 | 0.900 | 0.905 | 0.685 |
Duck | 0.691 | 0.784 | 0.433 | 0.546 | 0.489 | 0.582 | 0.584 | 0.517 | 0.909 |
Fish | 0.602 | 0.549 | 0.540 | 0.675 | 0.630 | 0.558 | 0.915 | 0.880 | 0.692 |
Gemstone | 0.686 | 0.648 | 0.644 | 0.370 | 0.374 | 0.589 | 0.417 | 0.674 | 0.665 |
Seahorse | 0.596 | 0.779 | 0.495 | 0.505 | 0.539 | 0.729 | 0.762 | 0.604 | 0.720 |
Shell | 0.396 | 0.754 | 0.694 | 0.589 | 0.501 | 0.653 | 0.583 | 0.665 | 0.840 |
Starfish | 0.530 | 0.575 | 0.551 | 0.441 | 0.519 | 0.700 | 0.506 | 0.674 | 0.701 |
Toffees | 0.703 | 0.462 | 0.450 | 0.565 | 0.585 | 0.390 | 0.827 | 0.774 | 0.703 |
Average | 0.635 | 0.603 | 0.552 | 0.593 | 0.595 | 0.586 | 0.704 | 0.725 | 0.734 |
4.1.1 Real3D-AD
[19] is a 3D anomaly detection dataset based on real samples, exhibiting a higher point precision and spatial distance per point cloud. Each category contains 4 training samples and 100 test samples. The training set contains 360° complete surface point clouds of the objects, which are obtained by manually calibrating and stitching the scans of multiple sides of the objects. The test samples are scans only one side with a huge difference from the training set. The distribution of the point clouds also varies among the total 12 categories, further deepening the detection difficulty compared to 2D scenes.
4.1.2 Anomaly-ShapeNet
[18] is a 3D anomaly detection, crafted through modifications to the synthetic samples found in ShapeNetCorev2 [4]. It contains 40 diverse categories, featuring over 1600 samples of its complete surface point clouds. Each category’s training set contains merely 4 samples, while the test sets are designed to assess the model’s performance across both normal and a spectrum of abnormal samples. It widely increases the anomaly types while keeping the number of points the same as the previous studies, which places higher demands on the robustness and generality of the proposed algorithms.
4.2 Evaluation metrics
For image-level anomaly detection, the Area Under the Receiver Operating Curve (AUROC) is utilized in line with established practices. For the evaluation of pixel-level anomalies, the AUROC metric is similarly applied in the context of point segmentation accuracy. A value of 0.5 of the AUROC score denotes no discriminative capability (equivalent to random guessing), whereas a score of 1.0 indicates perfect discrimination between positive and negative classes.
4.3 Implementation details
Our methodology is implemented using PyTorch [27] with end-to-end training across the network. The optimization is performed using the Adam optimizer, starting at an initial learning of 0.001. The training process involves a total batch size of 128 across 40,000 iterations for comprehensive learning. All input point clouds undergo a preprocessing step where they are randomly downsampled to a fixed size of 4096 and 2048 points on Real3D-AD and Anomaly-ShapeNet, respectively. Additionally, we normalized these point clouds by setting their center of gravity as the origin of coordinates and scaling their dimensions to fall within the range of -1 to 1, optimizing for the diffusion process.
Method | BTF[11] | M3DM[37] | PatchCore[29] | CPMF[3] | Reg3D-AD[19] | IMRNet[18] | Ours | ||
---|---|---|---|---|---|---|---|---|---|
Feat. | Raw | FPFH | PointMAE | FPFH | PointMAE | ResNet | PointMAE | PointMAE | Raw |
Ashtray | 0.578 | 0.420 | 0.577 | 0.587 | 0.591 | 0.353 | 0.597 | 0.671 | 0.833 |
Bag | 0.410 | 0.546 | 0.537 | 0.571 | 0.601 | 0.643 | 0.706 | 0.660 | 0.719 |
Bottle | 0.558 | 0.404 | 0.584 | 0.614 | 0.588 | 0.469 | 0.569 | 0.631 | 0.750 |
Bowl | 0.470 | 0.581 | 0.579 | 0.558 | 0.547 | 0.679 | 0.548 | 0.676 | 0.751 |
Bucket | 0.469 | 0.517 | 0.405 | 0.510 | 0.577 | 0.542 | 0.681 | 0.676 | 0.719 |
Cap | 0.509 | 0.562 | 0.599 | 0.645 | 0.583 | 0.601 | 0.632 | 0.704 | 0.726 |
Cup | 0.462 | 0.598 | 0.548 | 0.593 | 0.583 | 0.498 | 0.524 | 0.700 | 0.767 |
Eraser | 0.525 | 0.719 | 0.627 | 0.657 | 0.677 | 0.689 | 0.343 | 0.548 | 0.890 |
Headset | 0.447 | 0.505 | 0.597 | 0.610 | 0.609 | 0.551 | 0.574 | 0.698 | 0.767 |
Helmet | 0.508 | 0.569 | 0.488 | 0.465 | 0.495 | 0.532 | 0.491 | 0.603 | 0.704 |
Jar | 0.420 | 0.424 | 0.441 | 0.472 | 0.483 | 0.610 | 0.592 | 0.780 | 0.838 |
Microphone | 0.563 | 0.671 | 0.357 | 0.388 | 0.488 | 0.509 | 0.414 | 0.755 | 0.762 |
Shelf | 0.164 | 0.609 | 0.564 | 0.494 | 0.523 | 0.685 | 0.688 | 0.603 | 0.696 |
Tap | 0.549 | 0.553 | 0.747 | 0.760 | 0.498 | 0.528 | 0.659 | 0.686 | 0.818 |
Vase | 0.517 | 0.464 | 0.534 | 0.554 | 0.582 | 0.514 | 0.576 | 0.629 | 0.734 |
Average | 0.493 | 0.528 | 0.552 | 0.568 | 0.562 | 0.559 | 0.572 | 0.659 | 0.749 |
4.4 Main results
We conduct experiments on Real3D-AD [19] based on real sampling and Anomaly-ShapeNet [18] based on simulation.
As shown in Table 1, we first compare the image-level AUROC metric with current cutting-edge 3D anomaly detection models on Real3D-AD. It shows that our method achieves the best performance using only raw point cloud data, while most of the existing methods use Fast Point Feature Histograms (FPFH) operator [31] or ShapeNet [4] pre-trained PointMAE [26] as feature extractor. Due to significant disparities in quantity, size, and distribution among different categories of point clouds in Real3D-AD, scoring variations across categories are more pronounced with other methods. For instance, numerous methods perform under 0.5 in certain categories, indicating their inadequacy in extracting meaningful features while facing challenging samples. In contrast, our method not only exhibits superior performance in 3D anomaly detection across the majority of categories but also achieves the best overall average across all categories. This demonstrates the strong generalizability and robustness of our approach.
We further evaluate our method on Anomaly-ShapeNet in Table 2, which encompasses a broader array of categories and a greater diversity of defect types. Compared to Real3D-AD, Anomaly-ShapeNet significantly enhances the diversity of defects, wherein the increased variety of defect types further escalates the complexity of detection tasks. The results highlight the exceptional performance of our method across all evaluated categories, demonstrating an average improvement of 9% on AUROC relative to the approaches previously utilized.
Model | Diffusion | Condition | Relative | Patch-Gen | I-AUROC | P-AUROC |
---|---|---|---|---|---|---|
A | ✓ | ✗ | ✗ | ✗ | 0.586 | 0.524 |
B | ✓ | ✓ | ✗ | ✗ | 0.667 | 0.513 |
C | ✓ | ✓ | ✓ | ✗ | 0.712 | 0.573 |
D | ✓ | ✓ | ✓ | ✓ | 0.734 | 0.592 |
4.5 Ablation study
To delve into the effect of individual components, we conduct ablation experiments on the Real3D-AD dataset. To fully demonstrate and compare the performance of the models, we report both image-level and pixel-level results with I-AUROC and P-AUROC, respectively.
4.5.1 Main component
Table 3 compares the performance of different variants from R3D-AD, which includes the influence of the denoising condition embedding, displacement-based reconstruction way, and the data augmentation strategy of Patch-Gen. Model A is denoted as our baseline, which is a vanilla DDPM model for point cloud reconstruction. Introducing a condition into the DDPM (Model B) significantly boosts performance, particularly in terms of I-AUROC, which sees a 13.8% increase to 0.667. Model C, which predicts point displacements based on conditional DDPM, preserving detailed structural information while accommodating the relative displacement of points contributes to a notable 6.0% gain in P-AUROC over Model B. Model D is trained under the conditions of shape embedding with the Patch-Gen strategy. Considering that the defective portion contains only a small portion of the original point cloud, we try to reconstruct the relative displacement in a way that preserves as much detail as possible, which is effective for both 3D anomaly detection and segmentation.
4.5.2 Patch-Gen
Table. 4 analyzes the influence of two key parameters in Patch-Gen: the selection points ratio and the scaling points factor.
The selection points ratio from Table. LABEL:subtab:select determines the proportion of points in the point cloud that are selected for transformation. Our findings suggest that a selection ratio of 1/32 achieves the best performance. It appears that this ratio provides a balanced trade-off between maintaining sufficient structure for anomaly detection and introducing enough variation to simulate anomalies effectively. Notably, as the ratio increases beyond 1/16, both I-AUROC and P-AUROC scores decrease in severity, since real defects only account for a small portion of the overall point cloud, a wide selection of points not only destroys the structure of the original point cloud, but also makes the distribution of the training and test sets inconsistent.
The scaling points factor is the intensity of the random transformation applied to the selected points, as detailed in Table LABEL:subtab:scale. The optimal performance is observed at a scaling factor of 0.1, which implies that minor transformations are more effective for simulating anomalies without significantly altering the original data distribution. Larger scaling factors lead to a consistent decline in performance, underscoring the importance of subtle transformations for preserving the utility of the simulated anomalies for detection tasks.
ratio | I-AUROC | P-AUROC |
---|---|---|
1/64 | 0.716 | 0.584 |
1/32 | 0.734 | 0.592 |
1/16 | 0.727 | 0.579 |
1/8 | 0.683 | 0.528 |
factor | I-AUROC | P-AUROC |
---|---|---|
0.1 | 0.734 | 0.592 |
0.2 | 0.727 | 0.572 |
0.4 | 0.715 | 0.554 |
0.8 | 0.661 | 0.517 |
4.5.3 Memory and time cost
As depicted in Figure 4, we evaluate the disparity in both storage consumption and inference time of our model under identical experimental conditions, compared to existing methods. Regarding memory usage, our approach demonstrates a marked superiority by employing raw coordinate features instead of FPFH or PointMAE features, significantly reducing the memory footprint. Since no memory bank exists, our method is also more space-efficient compared to BTF which also uses raw features. Moreover, our method eliminates the necessity to compare all the features in memory, substantially increasing operational efficiency. The implementation of Patch-Gen inherently bestows our model with exceptional robustness, enabling precise reconstruction of point clouds from various angles without the need for the time-intensive RANSAC alignment process required by Reg3D-AD.
4.6 Qualitative results
Figure 5 presents some qualitative outcomes, with varying shades of color indicating different levels of anomaly scores. We select several representative defective samples to demonstrate the robustness of our algorithm. The left four columns display samples from Real3D-AD, while the right four columns samples are from Anomaly-ShapeNet. The illustration reveals that our R3D-AD algorithm has precisely reconstructed the defective portions of the point cloud across various samples: the deep sink in the Seahorse sample, the concavity in the Bag sample, and the bulge in the Jar sample. Leveraging the accurately reconstructed point clouds, final point cloud segmentation maps are also produced, further evidencing the efficacy of our approach.
5 Conclusion
In this work, we presented R3D-AD, a novel reconstructive 3D anomaly detection model based on conditional diffusion. Our goal is to overcome the limitations faced by current 3D anomaly detection methods, such as the inefficiencies due to the memory bank module and low performance caused by incorrect rebuilds with MAE. To address these challenges, we leverage the diffusion process for full reconstruction, followed by a direct comparison between the input and the reconstructed point cloud to obtain the final anomaly score. The embedded latent variable that spans the decoding process, step-wisely generating point-level displacements from the noise to the target anomaly-free sample. We also propose Patch-Gen, a data augmentation tailored for point cloud anomaly simulation. Extensive experiments conducted on 3D anomaly benchmarks validate the superiority of our R3D-AD in comparison to state-of-the-art alternatives in terms of both accuracy and versatility.
Acknowledgements
This work was supported in part by the Pioneer and Leading Goose R&D Program of Zhejiang (Grant No. 2022C01051), in part by the National Natural Science Foundation of China (Grant No. 52375271, 52275274), and in part by the Natural Science Foundation of Zhejiang Province (Grant No. LY23E050011).
References
- [1] Bae, J., Lee, J.H., Kim, S.: Pni: Industrial anomaly detection using position and neighborhood information. In: ICCV (2023)
- [2] Bergmann, P., Löwe, S., Fauser, M., Sattlegger, D., Steger, C.: Improving unsupervised defect segmentation by applying structural similarity to autoencoders. In: VISIGRAPP (2019)
- [3] Cao, Y., Xu, X., Shen, W.: Complementary pseudo multimodal feature for point cloud anomaly detection. arXiv preprint (2023)
- [4] Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: ShapeNet: An Information-Rich 3D Model Repository. arXiv preprint (2015)
- [5] Charles, R.Q., Su, H., Kaichun, M., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: CVPR (2017)
- [6] Chu, R., Xie, E., Mo, S., Li, Z., Nießner, M., Fu, C.W., Jia, J.: Diffcomplete: Diffusion-based generative 3d shape completion. In: NeurIPS (2023)
- [7] Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real nvp. In: ICLR (2017)
- [8] Gudovskiy, D., Ishizaka, S., Kozuka, K.: Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. In: WACV (2022)
- [9] He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR (2022)
- [10] Ho, J., Chan, W., Saharia, C., Whang, J., Gao, R., Gritsenko, A., Kingma, D.P., Poole, B., Norouzi, M., Fleet, D.J., Salimans, T.: Imagen video: High definition video generation with diffusion models (2022)
- [11] Horwitz, E., Hoshen, Y.: Back to the feature: Classical 3d features are (almost) all you need for 3d anomaly detection. In: CVPRW (2023)
- [12] Hu, T., Zhang, J., Yi, R., Du, Y., Chen, X., Liu, L., Wang, Y., Wang, C.: Anomalydiffusion: Few-shot anomaly image generation with diffusion model. In: AAAI (2024)
- [13] Jonathan Ho, Ajay Jain, and Pieter Abbeel: Denoising diffusion probabilistic models. In: NeurIPS (2020)
- [14] Kim, D., Park, C., Cho, S., Lee, S.: Fapm: Fast adaptive patch memory for real-time industrial anomaly detection. In: ICASSP (2023)
- [15] Kong, Z., Ping, W., Huang, J., Zhao, K., , Catanzaro, B.: Diffwave: A versatile diffusion model for audio synthesis. In: ICLR (2021)
- [16] Li, C.L., Sohn, K., Yoon, J., Pfister, T.: Cutpaste: Self-supervised learning for anomaly detection and localization. In: CVPR (2021)
- [17] Li, M., Duan, Y., Zhou, J., Lu, J.: Diffusion-sdf: Text-to-shape via voxelized diffusion. In: CVPR (2023)
- [18] Li, W., Xu, X., Gu, Y., Zheng, B., Gao, S., Wu, Y.: Towards scalable 3d anomaly detection and localization: A benchmark via 3d anomaly synthesis and a self-supervised learning network. arXiv preprint (2023)
- [19] Liu, J., Xie, G., Li, X., Wang, J., Liu, Y., Wang, C., Zheng, F., et al.: Real3d-ad: A dataset of point cloud anomaly detection. In: NeurIPS (2023)
- [20] Liu, Z., Feng, Y., Black, M.J., Nowrouzezahrai, D., Paull, L., Liu, W.: Meshdiffusion: Score-based generative 3d mesh modeling. In: ICLR (2023)
- [21] Liu, Z., Tang, H., Lin, Y., Han, S.: Point-voxel cnn for efficient 3d deep learning. In: NeurIPS (2019)
- [22] Lu, F., Yao, X., Fu, C., Jia, J.: Removing anomalies as noises for industrial defect localization. In: ICCV (2023)
- [23] Luo, S., Hu, W.: Diffusion probabilistic models for 3d point cloud generation. In: CVPR (2021)
- [24] Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of machine learning research (2008)
- [25] Mo, S., Xie, E., Chu, R., Hong, L., Nießner, M., Li, Z.: Dit-3d: Exploring plain diffusion transformers for 3d shape generation. In: NeurIPS (2023)
- [26] Pang, Y., Wang, W., Tay, F.E., Liu, W., Tian, Y., Yuan, L.: Masked autoencoders for point cloud self-supervised learning. In: ECCV (2022)
- [27] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. NeurIPS (2019)
- [28] Rezende, D.J., Mohamed, S.: Variational inference with normalizing flows. In: ICML (2015)
- [29] Roth, K., Pemula, L., Zepeda, J., Schölkopf, B., Brox, T., Gehler, P.: Towards total recall in industrial anomaly detection. In: CVPR (2022)
- [30] Rudolph, M., Wandt, B., Rosenhahn, B.: Same same but differnet: Semi-supervised defect detection with normalizing flows. In: WACV (2021)
- [31] Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (fpfh) for 3d registration. In: ICRA (2009)
- [32] Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., Ghasemipour, S.K.S., Ayan, B.K., Mahdavi, S.S., Lopes, R.G., Salimans, T., Ho, J., Fleet, D.J., Norouzi, M.: Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint (2022)
- [33] Schlüter, H.M., Tan, J., Hou, B., Kainz, B.: Natural synthetic anomalies for self-supervised anomaly detection and localization. In: ECCV (2022)
- [34] Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint (2020)
- [35] Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: ICLR (2021)
- [36] Tailanian, M., Pardo, Á., Musé, P.: U-flow: A u-shaped normalizing flow for anomaly detection with unsupervised threshold. arXiv preprint (2022)
- [37] Wang, Y., Peng, J., Zhang, J., Yi, R., Wang, Y., Wang, C.: Multimodal industrial anomaly detection via hybrid fusion. In: CVPR (2023)
- [38] Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph cnn for learning on point clouds. TOG (2019)
- [39] Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., Hu, H.: Simmim: A simple framework for masked image modeling. In: CVPR (2022)
- [40] Yu, J., Zheng, Y., Wang, X., Li, W., Wu, Y., Zhao, R., Wu, L.: Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows. arXiv preprint (2021)
- [41] Yu, X., Tang, L., Rao, Y., Huang, T., Zhou, J., Lu, J.: Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In: CVPR (2022)
- [42] Zavrtanik, V., Kristan, M., Skočaj, D.: Draem-a discriminatively trained reconstruction embedding for surface anomaly detection. In: ICCV (2021)
- [43] Zavrtanik, V., Kristan, M., Skočaj, D.: Reconstruction by inpainting for visual anomaly detection. Pattern Recognition (2021)
- [44] Zhang, X., Xu, M., Zhou, X.: Realnet: A feature selection network with realistic synthetic anomaly for anomaly detection. In: CVPR (2024)
- [45] Zhou, L., Du, Y., Wu, J.: 3d shape generation and completion through point-voxel diffusion. In: ICCV (2021)
Appendix 0.A Appendix
0.A.1 Additional implement details
0.A.1.1 Patch-Gen pseudocode
We formulate the process of the proposed 3D anomaly simulation strategy Patch-Gen in Algorithm 1. The procedure begins by taking an initial point cloud as input and aims to produce an augmented point cloud that reflects the addition of anomaly. The rotation matrix is obtained by applying arbitrary rotation angles to all the rotation axes. The translation matrix is sampled from a Gaussian distribution, and after normalization and scaling, it dictates the displacement of the nearest points towards the viewpoint, while the rest of the point cloud remains unchanged. The anomaly point cloud representing damage types can be acquired through direct manipulation of matrix , derived from random sampling procedures. By sorting the matrix , we can further simulate defects such as bulge and sink. This perspective can be likened to a gravitational force acting as an anchor, exerting influence on the patch points within the domain.
0.A.1.2 R3D-AD pseudocode
To further clarify the overall architecture of the proposed network R3D-AD, we provide the training and testing iteration procedures more compactly in Algorithm 2 and Algorithm 3, respectively.
During training, anomalies are simulated by Patch-Gen, and noise is artificially added following a Gaussian distribution. The model predicts this noise and calculates a displacement to correct for it. The reconstruction loss is measured by comparing the original and corrected point clouds.
During testing, noise is progressively removed from a simulated noisy version of the cloud, aiming to reconstruct its anomaly-free outfits. The anomaly score is assessed by comparing the clusters after KNN of the original and reconstructed point clouds.
Anomaly type | Bulge | Sink | Oracle |
---|---|---|---|
Airplane | 1.31 | 1.35 | 1.58 |
Candybar | 2.43 | 2.30 | 2.54 |
Car | 1.15 | 1.23 | 1.37 |
Chicken | 3.50 | 2.92 | 4.02 |
Diamond | 0.84 | 0.83 | 0.97 |
Duck | 1.53 | 1.29 | 1.67 |
Fish | 1.42 | 1.45 | 1.57 |
Gemstone | 2.58 | 5.23 | 5.26 |
Seahorse | 2.37 | 2.35 | 2.45 |
Shell | 1.30 | 1.29 | 1.40 |
Starfish | 2.47 | 2.46 | 2.64 |
Toffees | 1.73 | 1.71 | 1.79 |
0.A.2 Additional experiments
0.A.2.1 Quality of the generated anomalies
The proposed 3D anomaly simulation strategy Patch-Gen is designed to address the problem of the lack of 3D anomalous samples in the training phase.
T-distributed Stochastic Neighbor Embedding (t-SNE) [24] is particularly effective at visualizing high-dimensional samples by giving each data point a corresponding location in a low-dimensional map, allowing complex data to be understood at a glance. We follow [16] and use the t-SNE to validate the quality and effectiveness of our generated anomaly samples. As shown in Fig 6, the generated anomalies are clearly distinguished from normal samples and overlap with real anomalous samples, which strengthens our model to reconstruct well on unseen anomalies.
Peak Signal-to-Noise Ratio (PSNR) is an engineering term that quantifies the quality of the reconstruction of a signal. PSNR is typically measured in decibels (dB) and calculated based on the mean squared error between the origin and the reconstruction. The higher the PSNR value, the better the quality of the reconstruction. In Table 5, the PSNR is computed by comparing the generated samples with real anomalies. We randomly select two normal samples to calculate their PSNR, and we average the PSNR obtained from multiple times of randomization to obtain the upper bound of the PSNR limit for each category. The Oracle PSNR servers are a reference to the generation quality.
Training | Testing | I-AUROC | CD | Oracle | ||
Dataset | Category | Dataset | Category | |||
Real3D-AD | Airplane | ShapeNetCore.v2 | Airplane | - | 0.032 | 0.001 |
Real3D-AD | Car | ShapeNetCore.v2 | Car | - | 0.077 | 0.004 |
ShapeNetCore.v2 | Airplane | Real3D-AD | Airplane | 0.614 | - | 0.772 |
ShapeNetCore.v2 | Car | Real3D-AD | Car | 0.601 | - | 0.713 |
Anomaly-ShapeNet | {bowl0..3} | Anomaly-ShapeNet | bowl4 | 0.715 | - | 0.744 |
Method | BTF[11] | M3DM[37] | PatchCore[29] | CPMF[3] | Reg3D-AD[19] | IMRNet[18] | Ours | ||
---|---|---|---|---|---|---|---|---|---|
Feat. | Raw | FPFH | PointMAE | FPFH | PointMAE | ResNet | PointMAE | PointMAE | Raw |
ashtray0 | 0.578 | 0.420 | 0.577 | 0.587 | 0.591 | 0.353 | 0.597 | 0.671 | 0.833 |
bag0 | 0.410 | 0.546 | 0.537 | 0.571 | 0.601 | 0.643 | 0.706 | 0.660 | 0.720 |
bottle0 | 0.597 | 0.344 | 0.574 | 0.604 | 0.513 | 0.520 | 0.486 | 0.552 | 0.733 |
bottle1 | 0.510 | 0.546 | 0.637 | 0.667 | 0.601 | 0.482 | 0.695 | 0.700 | 0.737 |
bottle3 | 0.568 | 0.322 | 0.541 | 0.572 | 0.650 | 0.405 | 0.525 | 0.640 | 0.781 |
bowl0 | 0.564 | 0.509 | 0.634 | 0.504 | 0.523 | 0.783 | 0.671 | 0.681 | 0.819 |
bowl1 | 0.264 | 0.668 | 0.663 | 0.639 | 0.629 | 0.639 | 0.525 | 0.702 | 0.778 |
bowl2 | 0.525 | 0.510 | 0.684 | 0.615 | 0.458 | 0.625 | 0.490 | 0.685 | 0.741 |
bowl3 | 0.385 | 0.490 | 0.617 | 0.537 | 0.579 | 0.658 | 0.348 | 0.599 | 0.767 |
bowl4 | 0.664 | 0.609 | 0.464 | 0.494 | 0.501 | 0.683 | 0.663 | 0.676 | 0.744 |
bowl5 | 0.417 | 0.699 | 0.409 | 0.558 | 0.593 | 0.685 | 0.593 | 0.710 | 0.656 |
bucket0 | 0.617 | 0.401 | 0.309 | 0.469 | 0.593 | 0.482 | 0.610 | 0.580 | 0.683 |
bucket1 | 0.321 | 0.633 | 0.501 | 0.551 | 0.561 | 0.601 | 0.752 | 0.771 | 0.756 |
cap0 | 0.668 | 0.618 | 0.557 | 0.580 | 0.589 | 0.601 | 0.693 | 0.737 | 0.822 |
cap3 | 0.527 | 0.522 | 0.423 | 0.453 | 0.476 | 0.551 | 0.725 | 0.775 | 0.730 |
cap4 | 0.468 | 0.520 | 0.777 | 0.757 | 0.727 | 0.553 | 0.643 | 0.652 | 0.681 |
cap5 | 0.373 | 0.586 | 0.639 | 0.790 | 0.538 | 0.697 | 0.467 | 0.652 | 0.670 |
cup0 | 0.403 | 0.586 | 0.539 | 0.600 | 0.610 | 0.497 | 0.510 | 0.643 | 0.776 |
cup1 | 0.521 | 0.610 | 0.556 | 0.586 | 0.556 | 0.499 | 0.538 | 0.757 | 0.757 |
eraser0 | 0.525 | 0.719 | 0.627 | 0.657 | 0.677 | 0.689 | 0.343 | 0.548 | 0.890 |
headset0 | 0.378 | 0.520 | 0.577 | 0.583 | 0.591 | 0.643 | 0.537 | 0.720 | 0.738 |
headset1 | 0.515 | 0.490 | 0.617 | 0.637 | 0.627 | 0.458 | 0.610 | 0.676 | 0.795 |
helmet0 | 0.553 | 0.571 | 0.526 | 0.546 | 0.556 | 0.555 | 0.600 | 0.597 | 0.757 |
helmet2 | 0.602 | 0.542 | 0.623 | 0.425 | 0.447 | 0.462 | 0.614 | 0.641 | 0.633 |
helmet3 | 0.526 | 0.444 | 0.374 | 0.404 | 0.424 | 0.520 | 0.367 | 0.573 | 0.707 |
helmet4 | 0.349 | 0.719 | 0.427 | 0.484 | 0.552 | 0.589 | 0.381 | 0.600 | 0.720 |
jar0 | 0.420 | 0.424 | 0.441 | 0.472 | 0.483 | 0.610 | 0.592 | 0.780 | 0.838 |
microphone0 | 0.563 | 0.671 | 0.357 | 0.388 | 0.488 | 0.509 | 0.414 | 0.755 | 0.762 |
shelf0 | 0.164 | 0.609 | 0.564 | 0.494 | 0.523 | 0.685 | 0.688 | 0.603 | 0.696 |
tap0 | 0.525 | 0.560 | 0.754 | 0.753 | 0.458 | 0.359 | 0.676 | 0.676 | 0.736 |
tap1 | 0.573 | 0.546 | 0.739 | 0.766 | 0.538 | 0.697 | 0.641 | 0.696 | 0.900 |
vase0 | 0.531 | 0.342 | 0.423 | 0.455 | 0.447 | 0.451 | 0.533 | 0.533 | 0.788 |
vase1 | 0.549 | 0.219 | 0.427 | 0.423 | 0.552 | 0.345 | 0.702 | 0.757 | 0.729 |
vase2 | 0.410 | 0.546 | 0.737 | 0.721 | 0.741 | 0.582 | 0.605 | 0.614 | 0.752 |
vase3 | 0.717 | 0.699 | 0.439 | 0.449 | 0.460 | 0.582 | 0.650 | 0.700 | 0.742 |
vase4 | 0.425 | 0.510 | 0.476 | 0.506 | 0.516 | 0.514 | 0.500 | 0.524 | 0.630 |
vase5 | 0.585 | 0.409 | 0.317 | 0.417 | 0.579 | 0.618 | 0.520 | 0.676 | 0.757 |
vase7 | 0.448 | 0.518 | 0.657 | 0.693 | 0.650 | 0.397 | 0.462 | 0.635 | 0.771 |
vase8 | 0.424 | 0.668 | 0.663 | 0.662 | 0.663 | 0.529 | 0.620 | 0.630 | 0.721 |
vase9 | 0.564 | 0.268 | 0.663 | 0.660 | 0.629 | 0.609 | 0.594 | 0.594 | 0.718 |
Average | 0.493 | 0.528 | 0.552 | 0.568 | 0.562 | 0.559 | 0.572 | 0.659 | 0.749 |
0.A.2.2 Generalization on unseen data
To assess the robustness and generalization capabilities of our proposed model, we conduct a series of experiments on different categories from diverse datasets, as outlined in Table 6. The oracle result represents the performance ceiling of our model, which is obtained by training on the category that is identical to the testing.
For known categories, we focus on the well-regarded ShapeNetCore.v2 dataset [4], which includes categories such as Airplanes and Cars, also featured in the Real3D-AD dataset [19]. It’s pertinent to note that ShapeNetCore.v2 is not an anomaly detection dataset; it does not encompass anomalous samples. Therefore, for the first and second rows in Table 6, the AUROC metric cannot be utilized in this context. Instead, we resort to evaluating the generalization performance of models trained on Real3D-AD of the same category on ShapeNetCore.v2 using the Chamfer Distance (CD) metric. The marked decline in performance observed upon transitioning from ShapeNetCore.v2 to Real3D-AD, and vice versa, illuminates the hurdles presented by inconsistencies between datasets. This highlights the importance of our reconstruction approach, which effectively learns inductive biases, allowing for better generalization across different data distributions.
For unknown categories, we utilize the Anomaly-ShapeNet dataset [18], as shown in the last row of Table. 6. The model was trained on a subset of Bowl and tested a category it had never encountered during training. Remarkably, despite this lack of prior exposure, our model achieves an impressive score of 0.715 image-level AUROC. This performance surpassed all other methods trained and tested exclusively on “bowl4”, thus demonstrating the superior generalization capability of our method.
These results not only validate the effectiveness of our approach in handling both known and unknown categories but also underscore its potential for real-world applications where data diversity and unseen scenarios are commonplace.
0.A.3 Additional main results
Anomaly-ShapeNet [18] contains a total of 40 categories. In Table 2 of the main text, due to the space limitation, we consider objects that belong to the same kind but with differing appearances to be in the same category (e.g., bottle0, bottle1, bottle3 are categorized as Bottle). Here, we provide the specific image-level AUROC as in Table 7.
0.A.4 Additional qualitative results
To further demonstrate and compare the effect of our proposed 3D anomaly simulation strategy Patch-Gen, we conduct additional qualitative analysis on the Real3D-AD dataset and the Anomaly-ShapeNet dataset.
The first row shows the anomaly samples in the testing split, where the second row shows the normal samples in the training split, and the third row shows the anomaly samples simulated by Patch-Gen. It can be seen from Fig. 7 that our method fully simulates the defects that vary in different classes, proving that our method can well compensate for the domain gap caused by using only positive samples for training in 3D anomaly detection.