FDADNet: Detection of Surface Defects in Wood-Based Panels Based on Frequency Domain Transformation and Adaptive Dynamic Downsampling

Li, Hongli; Yi, Zhiqi; Wang, Zhibin; Wang, Ying; Ge, Liang; Cao, Wei; Mei, Liye; Yang, Wei; Sun, Qin

doi:10.3390/pr12102134

Open AccessArticle

FDADNet: Detection of Surface Defects in Wood-Based Panels Based on Frequency Domain Transformation and Adaptive Dynamic Downsampling

by

Hongli Li

^1,2,†,

Zhiqi Yi

^1,2,†,

Zhibin Wang

³,

Ying Wang

³

,

Liang Ge

⁴

,

Wei Cao

^5,6,

Liye Mei

⁷

,

Wei Yang

³

and

Qin Sun

^8,*

¹

School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China

²

Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan 430205, China

³

School of Information Science and Engineering, Wuchang Shouyi University, Wuhan 430064, China

⁴

Tianjin Institute of Surveying and Mapping Co., Ltd., Tianjin 300381, China

⁵

Hubei Geomatics Technology Group Stock Co., Ltd., Wuhan 430075, China

⁶

School of Computer Science, China University of Geosciences, Wuhan 430074, China

⁷

School of Computer Science, Hubei University of Technology, Wuhan 430068, China

⁸

School of Mechatronics and Automation, Wuchang Shouyi University, Wuhan 430064, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Processes 2024, 12(10), 2134; https://doi.org/10.3390/pr12102134

Submission received: 10 August 2024 / Revised: 26 September 2024 / Accepted: 29 September 2024 / Published: 30 September 2024

(This article belongs to the Special Issue Research on Intelligent Fault Diagnosis Based on Neural Network)

Download

Browse Figures

Versions Notes

Abstract

:

The detection of surface defects on wood-based panels plays a crucial role in product quality control. However, due to the complex background and low contrast of defects in wood-based panel images, features extracted by traditional deep learning methods based on spatial domain processing often contain noise and blurred boundaries, which severely affects detection performance. To address these issues, we have proposed a wood-based panel surface defect detection method based on frequency domain transformation and adaptive dynamic downsampling (FDADNet). Specifically, we designed a Multi-axis Frequency Domain Weighted Information Representation Module (MFDW), which effectively decoupled the indistinguishable low-contrast defects from the background in the transform domain. Gaussian filtering was then employed to eliminate noise and blur between the defects and the background. Additionally, to tackle the issue of scale differences in defects that led to difficulties in accurate capture, we designed an Adaptive Dynamic Convolution (ADConv) module for downsampling. This method flexibly compressed and enhanced features, effectively improving the differentiation of the features of objects of varying scales in the transform space, and ultimately achieved effective defect detection. To compensate for the lack of data, we constructed a dataset of wood-based panel surface defects, WBP-DET. The experimental results showed that the proposed FDADNet effectively improved the detection performance of wood-based panel surface defects in complex scenarios, achieving a solid balance between efficiency and accuracy.

Keywords:

defect detection; frequency domain transformation; feature decoupling; dynamic convolution; WBP-DET dataset

1. Introduction

Wood-based panels are among the most commonly used materials in the manufacturing industry due to their cost-effectiveness, stability, and ease of processing, leading to widespread applications in construction, furniture manufacturing, and decoration. However, defects inevitably arise during industrial production, and product quality control necessitates the accurate detection and localization of surface defects on wood-based panels to minimize raw material waste and reduce production costs. Furthermore, product quality issues directly impact the reputation of factories and market share. Therefore, rapidly and accurately detecting surface defects has become a key research priority in the industrial sector [1].

The collection and transportation of high-quality data, along with its encoding, are crucial for defect detection tasks [2,3,4]. However, due to the susceptibility of sensors to environmental factors such as dust, water mist, and lighting, images of wood-based panel surfaces often fail to accurately reflect the true condition of the panels. Especially under significant noise interference, image quality may be severely compromised, affecting defect detection and recognition. Specifically, detecting surface defects on wood-based panels faces several challenges. First, there is blur between the defects and the background. In complex industrial production environments, the grayscale values of surface defects on wood-based panels are often very similar to those of the surrounding background areas. During the detection process, it is challenging for detectors to effectively separate the defects from the background, leading to ambiguity between the defects and the background. Second, there is variability in defect size. The size of various defects can differ significantly. Detectors often struggle to balance between large and small defects, resulting in feature confusion among multi-scale objects and impacting overall detection performance. Third, there is significant inter-class variance. Defects within the same category may exhibit vastly different appearances. This high inter-class variance presents challenges for detectors in extracting and distinguishing defect features, reducing detection accuracy.

Initially, surface defect detection was primarily performed manually. However, manual inspection is limited by human capacity, resulting in low accuracy and lengthy processing times, which significantly impacts production efficiency [5]. With the rapid advancement of machine vision technology, various algorithms have been extensively applied in defect detection, broadly categorized into traditional methods and deep-learning-based approaches [6]. Traditional methods generally rely on handcrafted features for defect detection [7,8]. These methods encounter two major issues, as follows: First, handcrafted features often lack robustness, particularly under poor lighting conditions or high noise interference, making it challenging to achieve satisfactory detection results. Second, the reliance on prior knowledge for designing handcrafted features constrains the potential for enhancing detection performance, leading to significant limitations in traditional methods.

In recent years, with the rapid development of machine learning and deep learning technologies, models based on these techniques have been widely applied across various fields [9,10,11,12,13,14]. Convolutional Neural Networks (CNNs), with their powerful feature extraction capabilities, can adaptively identify defect features and achieve high-precision detection, even in complex environments, demonstrating strong robustness [15,16,17]. Dlamini et al. [18] proposed an automatic surface defect detection system based on MobileNetV2 and Feature Pyramid Networks (FPN), which performed exceptionally well on printed circuit boards. Zhang et al. [19] designed a lightweight defect detection network that uses efficient downsampling methods to extract richer defect features and employs multi-scale aggregation networks and efficient attention modules to mitigate interference from complex backgrounds, achieving commendable performance. Jiang et al. [20] introduced a multi-scale attention module that efficiently integrates high-level semantic information with low-level feature information, generating complete feature maps and improving fault recognition accuracy. Song et al. [21] proposed an encoder–decoder network for steel surface defect detection, utilizing a powerful attention mechanism in the encoder to extract rich multi-dimensional features and a channel weighting module in the decoder to integrate feature maps. Su et al. [22] designed a dynamic transformer model based on semantic alignment for steel defect detection. This network introduces a local attention mechanism to eliminate noise blur between defects and backgrounds while dynamically adjusting encoding blocks, shortening inference time while maintaining high accuracy. Dong et al. [23] combined spatial attention with channel attention, achieving good results in pavement defect detection. Su et al. [24] constructed a novel complementary attention network that utilizes spatial and channel features to dynamically suppress background noise, achieving high precision in defect detection for electroluminescence images of solar cells. Cao et al. [25] proposed a deep feature fusion pixel-level segmentation network for surface defect detection that aggregates multi-scale feature maps through multi-level feature fusion and uses a branch decoder to recover defect details, improving defect segmentation accuracy. However, in wood-based panel surface images, these deep learning methods still face challenges due to the ambiguity between defects and backgrounds as well as significant variations in defect sizes. They often encounter misclassification issues when handling blurry and small defects, and achieving an optimal balance between efficiency and accuracy remains difficult.

We re-examined the issue of low separability between low-contrast defects and backgrounds, identifying that the root cause lies in the fact that, when the grayscale values of defects and backgrounds are very close, deep learning methods based on spatial domain processing often struggle to effectively separate the foreground and the background in feature extraction modules. This leads to blurred boundaries of target features, particularly under complex noise interference, making it difficult for the network to distinguish between the target and the background. As a result, the detector fails to accurately extract true defect features, leading to false positives and missed detections. However, when processing images in the frequency domain, the sensitivity to contrast changes is lower, and the frequency domain offers significant advantages over spatial domain processing in terms of noise suppression and global feature extraction. Based on this observation, we propose a feature decoupling and denoising method using multi-axis frequency domain weighting. We convert low-contrast defects and backgrounds to the frequency domain using 2D discrete Fourier transform (2D DFT) and examine their signal intensity differences. As shown in Figure 1a,b, there is a significant difference in signal intensity between the target and the background. This indicates that low-contrast defects, which are difficult to distinguish in the spatial domain, exhibit higher separability in the frequency domain. Additionally, the frequency domain is more effective at removing image noise, enhancing the features of interest, and suppressing background information (as shown in Figure 1c). Leveraging this characteristic, we use frequency domain signal transformation, renowned for its global feature extraction and noise suppression capabilities, to separate the target from the background while suppressing background noise. Convolutional Neural Networks (CNNs), known for their local perception and texture feature extraction, are then employed to capture detailed feature information. By combining spatial and frequency domain transformations, we enhance the feature extraction and spatial discriminative capabilities of the backbone network, enabling the model to effectively differentiate subtle differences between the target and the background.

To address the challenge of defect detection in wood-based panels under complex interference conditions, we propose a method called FDADNet, which is based on frequency domain transformation and adaptive dynamic downsampling. We have designed a Multi-axis Frequency Domain Weighted Information Representation Module (MFDW) for feature separation and developed an Adaptive Dynamic Convolution (ADConv) module for downsampling, aimed at enhancing the feature representation of small and weak defects. By serially integrating these two modules, we have created a powerful feature extraction framework. We believe that these modules achieve information complementarity and functional synergy in several ways. First, convolutional transformations in the spatial domain are effective at handling local spatial features, while frequency domain signal transformations excel at capturing global frequency characteristics. The combination of these approaches enhances the model’s ability to express features across different dimensions. Second, Adaptive Dynamic Convolution enables the model to flexibly adapt to varying inputs, while the Multi-axis Frequency Domain Weighted Information Representation Module strengthens the expression of specific features. This integration improves the model’s detection accuracy and robustness in complex scenarios. Finally, although Adaptive Dynamic Convolution introduces additional parameters, the Multi-axis Frequency Domain Weighted Information Representation Module (MFDW) divides feature maps into four regions and performs feature transformations across the different domains and axes, thus maintaining a low parameter count and computational demand. This design achieves a good balance between efficiency and accuracy. Additionally, due to the current lack of open-source data, we have constructed a dataset of surface defects on wood-based panels, named WBP-DET. Our main contributions include the following:

1.: Due to the limited availability of datasets for wood-based panel defect detection, we established a wood-based panel defect detection dataset (WBP-DET), which includes surface defects such as Glue Spot, Oil Stains, Chalk, and Scratch.
2.: We introduced frequency domain signal transformation into the defect detection task and have proposed a method for surface defect detection in wood-based panels based on frequency domain transformation and adaptive dynamic downsampling (FDADNet). This approach utilizes frequency domain signal transformation in the feature extraction phase to handle the target and the background, providing strong noise resistance and discrimination capabilities.
3.: We developed a Multi-axis Frequency Domain Weighted Information Representation Module (MFDW) for feature extraction. The MFDW focuses on global frequency domain features, applying weighted transformations to different frequency domain signal characteristics to enhance the separability between the target and the background. Gaussian filtering is used to suppress background noise, reduce noise accumulation, and enhance the feature representation of objects.
4.: We designed Adaptive Dynamic Convolution (ADConv) for downsampling feature maps. ADConv can flexibly compress and enhance the features of different categories, increasing the semantic gap between the features and thereby reducing feature confusion among multi-scale objects.

2. Related Work

In this section, we will introduce the related work in the following three areas: deep-learning-based object detection methods, the application of Fourier transform in vision, and dynamic convolution.

2.1. Deep-Learning-Based Object Detection Methods

Recently, advancements in object detection algorithms have been rapid and diverse. In single-stage methods, RTMDet [26] achieves an optimal trade-off between speed and accuracy through large-kernel depthwise convolution and dynamic label assignment. YoloV7 [27] introduces a model scaling approach and dynamic label assignment strategy, demonstrating excellent performance in real-time detection. YoloV8 [28] employs anchor-free detection heads, eliminating the reliance on predefined anchor settings. YoloV9 [29] addresses information loss during convolution by proposing programmable gradient information. YoloV10 [30] utilizes one-to-one and one-to-many detection heads for training and uses only the one-to-one detection head during inference, effectively avoiding NMS operations and improving inference speed. Faster R-CNN [31], as a classic two-stage object detection network, continues to be effective in certain domains today. Transformer-based DETR [32] integrates dense feature interactions between the encoder and the decoder, resulting in slower convergence. Deformable-DETR [33] accelerates training convergence through efficient attention mechanisms, while DAB-DETR [34] and DN-DETR [35] enhance detection performance via denoising training. DINO [36] introduces hybrid query selection to optimize query initialization. RTDET [37] designs an efficient hybrid encoder, achieving a dual advantage in speed and accuracy. However, these methods primarily operate in the spatial domain, which limits their effectiveness in addressing low-contrast defects in wood-based panels.

2.2. Applications of Fourier Transform in Computer Vision

In the field of computer vision, the frequency domain has proven to be a powerful approach [38,39]. For instance, Rao et al. [39] replaced the self-attention mechanism in Vision Transformers (ViT) with frequency domain signal transformations, achieving significant results in classification tasks. Ruan et al. [40] demonstrated a robust performance in medical segmentation tasks by substituting the self-attention mechanism in ViT with external weighting in the frequency domain. Zhao et al. [41] employed a frequency domain residual diffusion adjustment module to extract frequency domain information, achieving excellent visual results in underwater image restoration. Although frequency domain signal information provides strong global perceptual capabilities, it is less effective in capturing the edge details of objects, making it challenging to handle complex local features. By combining the detailed feature extraction of the spatial domain with the powerful global perception of the frequency domain, it is possible to comprehensively extract a wider range of multidimensional image features. Currently, the feature extraction methods that combine spatial domain and frequency domain characteristics in the field of defect detection have not been thoroughly explored, and we believe this represents a highly promising area of research.

2.3. Dynamic Convolution

In traditional Convolutional Neural Networks (CNNs), static convolutional kernels share weights across all samples and positions. This fixed-weight approach effectively captures the local features in the input. However, it may lack flexibility when dealing with diverse and complex input data, potentially leading to suboptimal feature extraction. Dynamic convolution, on the other hand, offers greater flexibility and adaptability, better accommodating the diversity of input features. Yang et al. [42] introduced CondConv, a convolution kernel specifically designed for example learning, which enhances network flexibility and improves performance. Chen et al. [43] incorporated dynamic convolution into attention mechanisms, dynamically aggregating multiple parallel kernels and achieving excellent results in classification tasks. Zhang et al. [44] proposed a dynamic convolution method that generates convolution kernels adaptively based on image content, achieving remarkable performance in segmentation tasks. Li et al. [45] endowed convolutional kernels with dynamic properties through multidimensional information, increasing the network’s focus on critical information. Tian et al. [46] developed a heterogeneous dynamic convolution network for image super-resolution, where each module consists of extended dynamic convolution layers, ReLU, and residual operations, enhancing context information acquisition. Han et al. [47] employed dynamic convolution to introduce additional parameters, achieving high classification performance while maintaining low FLOPs.

3. Methods

In this section, we will first introduce the overall architecture and workflow of the network. Then, in Section 3.1 and Section 3.2, we will discuss the operation mechanisms and functions of the three key modules. Finally, in Section 3.3, we will introduce the loss function that we used.

Based on the integration of frequency domain and spatial domain transformations, we designed the FDADNet. As illustrated in Figure 2, the network is composed of the following three main components: the backbone, the neck, and the heads. In the backbone, we introduce the Multi-axis Frequency Domain Weighted Module (MFDW) as an efficient feature aggregation unit to extract and refine multi-scale image features, facilitating both frequency domain and spatial domain information extraction. Subsequently, these features (P3, P4, P5) are passed to the neck for multi-scale feature fusion and information response. The neck enhances global semantic information through a top-down feature pyramid while integrating local texture information via a bottom-up path. Finally, three detection heads perform object classification and bounding box regression on the fused features. This multi-level detection approach enhances the model’s ability to discriminate between the targets of different scales. In the backbone, each stage consists of a combination of one ADConv and one MFDW for progressive feature extraction. After four stages, the spatial resolution of the features is gradually reduced to 1/4, 1/8, 1/16, and 1/32 of the original image, while the number of channels increases progressively to 2C₁, 4C₁, 8C₁ and 16C₁. During the feature fusion stage, high-level feature maps are aligned with low-level feature maps via 2× upsampling, while low-level feature maps are fused with high-level feature maps via 2× downsampling. This design helps to fully leverage feature information at multiple scales, thereby enhancing detection accuracy and robustness.

3.1. Multi-Axis Frequency Domain Weighted Information Representation Module

In the field of wood-based panel defect detection, recent methods often emphasize extracting spatial domain information, frequently overlooking the significance of frequency domain information. In the spatial domain, the edges between the objects and the backgrounds are often blurred, and backgrounds frequently contain substantial noise, making it challenging to extract clean and complete object features. In contrast, in the frequency domain, different objects exhibit distinct frequency signals, and the feature spaces of the objects and backgrounds are more dispersed. By extracting both spatial and frequency domain information simultaneously, the model can achieve enhanced perceptual capabilities. However, despite the efforts in previous studies [39,40,41] to utilize frequency domain information in deep learning, if effective denoising is not performed in the frequency domain, significant noise will persist in the subsequent spatial domain feature extraction process, leading to noise accumulation and adversely affecting detection performance. To address this issue, we propose the Multi-axis Frequency Domain Weighted Information Representation Module (MFDW). This module not only extracts multi-axis frequency domain information, but also effectively suppresses noise. By performing denoising in the frequency domain, the backbone can extract more complete object features, thereby reducing misdetections and missed detections caused by blurred boundaries and noise interference.

The structure of the efficient feature aggregation module MFDW and the ELAN [48] module is illustrated in Figure 3. The MFDW module consists of convolution, split, a Signal Extraction Module (SEM), and concatenation operations. For a given input feature map X ∈ R^Cin×H×W, where C_in, H and W denote the number of input channels, height, and width, respectively, the input feature map X is first processed by a convolution operation to adjust the number of channels to C_out. The feature map is then split along the channel dimension into two parts. One of these parts is processed by the SEM to extract the frequency domain signal features. After processing, the two split parts of the feature map, along with the output from SEM, are concatenated, and the concatenated feature map is then processed by another convolution to adjust the number of channels. This design ensures that important spatial domain information is preserved during the transformation process, maximizing the retention and utilization of all useful information from the input image.

Assuming that the feature map input to SEM is

X \in R^{C_{i n} \times H \times W}

, to obtain the multi-axis frequency domain information, we split X′ along the channel dimension into four equal parts and input each part into four different branches. The specific process is described in Equation (1), as follows:

x_{1}, x_{2}, x_{3}, x_{4} = S p l i t (X^{'})

(1)

where Split(·) denotes the operation of splitting along the channel dimension; and x₁, x₂, and x₃ correspond to the first three branches, each of which is passed along the H–W axis, C–W axis, and C–H axis, respectively, into the Adaptive Frequency Filters (AFF) for feature weighting and noise suppression. Multi-branch processing calculates the frequency domain features of each branch in parallel, enhancing the signal perception capability of the module and significantly improving computational efficiency. Next, we convert the feature map to the frequency domain using 2D DFT, transforming the spatial domain pixel features into frequency domain signal features. In the frequency domain, we use learnable multi-axis weights to perform weighted transformations on the signal information. Through weighted processing, we enhance the differences between the target and background signals in the frequency domain, achieving feature optimization and enhancement. Subsequently, we apply Gaussian filtering in the frequency domain to suppress background noise, improving the signal-to-noise ratio and making the target features clearer and more realistic. Finally, we convert the frequency domain information back to the spatial domain using 2D inverse discrete Fourier transform (2D IDFT) for subsequent processing. The AFF process is represented in Equations (2) and (3), as follows:

x_{i (I, J)} = G a u s s (W_{(I, J)} ⊙ F_{(I, J)} [x_{i}]) i = 1, 2, 3

(2)

x_{i (I, J)}^{'} = F_{(I, J)}^{- 1} [x_{i (I, J)}] i = 1, 2, 3

(3)

where

W_{(I, J)}

and

F_{(I, J)}

represent the learnable weights and the 2D DFT corresponding to the respective axes;

⊙

denotes the element-wise product; and

G a u s s (\cdot)

signifies Gaussian filtering for denoising. For the fourth branch, we utilize depthwise separable convolution (DWConv) to capture the local features. Finally, the feature maps of the four branches are concatenated along the channel dimension, restored to the same size as the input feature map, and the final output is generated through residual connection with the input feature map, as shown in Equations (4) and (5).

x_{4}^{'} = D W (x_{4})

(4)

Y = C o n c a t (x_{1}^{'}, x_{2}^{'}, x_{3}^{'}, x_{4}^{'}) + X^{'}

(5)

where

D W (\cdot)

represents DWConv and

C o n c a t (\cdot)

denotes concatenation of the feature maps along the channel dimension. MFDW efficiently extracts and enhances the multi-axis frequency domain signals of the target while effectively suppressing background noise. Compared to traditional convolution operations, it requires fewer parameters and computational resources.

3.2. Adaptive Dynamic Convolution

Typically, an image containing defects may simultaneously present both large and small defects, and detectors often face the challenge of balancing performance when handling targets of different scales. This challenge arises mainly because, as the image undergoes successive layers of feature extraction and spatial transformation, the spatial resolution of the feature map gradually decreases, causing the features of small defects to be more easily lost. Meanwhile, large defects generally exhibit stronger responses in the image, and these strong response areas can overshadow or merge with the subtle semantic information of small defects, thereby reducing the discernibility of the small defects. To address this issue, we propose the Adaptive Dynamic Convolution Module (ADConv) as the downsampling module of the network. ADConv adaptively enhances the feature representation of small target defects by enlarging the feature distance between different scales in the transformed space, thereby improving the distinction between the features of objects at different scales and alleviating feature confusion among multi-scale targets.

As shown in Figure 4, for the input

X \in R^{C_{i n} \times H \times W}

and the weight tensor

W \in R^{C_{out} \times C_{i n} \times K \times K}

, the traditional convolution operation is expressed in Equation (6) as follows:

Y = X * W

(6)

where

Y \in R^{C_{o u t} \times H' \times W'}

is the output and ∗ represents the convolution operation. For simplicity, we omit the bias operation. However, traditional convolution operations use fixed kernels, which limits their adaptability to input features. To allow the kernel weights to dynamically adjust according to specific input features, we create a map to modify the kernel weights, as shown in Equations (7)–(9), as follows:

Y = X * W^{'}

(7)

W^{'} = α W

(8)

α = s o f t \max (F C L (P o o l (X)))

(9)

where

α

represents the dynamic coefficient and

F C L (\cdot)

stands for the fully connected layer (FCL). We aim to enhance the representation capability of small defects using ADConv, thereby increasing the semantic gap between the features of different scales and achieving balanced attention to multi-scale targets. To this end, we associate the dynamic coefficient with the input features. Specifically, we first perform global average pooling on the input feature map

X

along the channel dimension, aggregating the global information into a vector. Then, we use FCL and an activation function to generate the dynamic coefficient. This dynamic coefficient is element-wise multiplied with the convolution kernel weights, and the weighted convolution kernel is then used to convolve the input feature map. ADConv effectively improves the downsampling module’s ability to enhance and retain small defect features while preventing the dominance of large defect features. This method introduces more learnable parameters while maintaining low FLOPs, preserving the integrity of large defect information and significantly improving the feature representation of small defects, thus enhancing overall detection performance.

3.3. Loss Function

The loss function of the network consists of two main components [28]: the object category loss and the bounding box loss.

L_{t o t a l} = λ_{1} L_{c l a s s} + λ_{2} L_{b b o x}

(10)

where

λ_{1}

and

λ_{2}

are coefficients used to adjust the weights of these two loss components. The object classification loss is calculated using binary cross-entropy loss (BCE), while the bounding box loss combines CIOU loss with Distribution Focal Loss (DFLs).

4. Experiments

In this section, we will introduce the dataset, experimental details, and the results and conclusions of the experiments. Specifically, Section 4.1 will present the WBP-DET dataset, Section 4.2 will detail the experimental parameters and evaluation metrics for our method, Section 4.3 will discuss the hyperparameters of the comparison methods, Section 4.4 will showcase the results of the comparative experiments, Section 4.5 will analyze the generalization of our network, Section 4.6 will display the results of the ablation experiments and discuss the roles of the modules, and Section 4.7 will provide the experimental conclusions.

4.1. WBP-DET Dataset

Currently, there are few publicly available benchmark datasets for defect detection in wood-based panels. To facilitate future research, we introduce a new dataset for surface defect detection in wood-based panels called the WBP-DET dataset. The dataset was collected in 2024 by Sun Qin in Wuhan, featuring infrared thermal imaging with a resolution of 2048 × 4300. The WBP-DET dataset includes the following five common types of surface defect in wood-based panels: Glue Spot (GS), Oil Stains (OS), Chalk (Ch), Scratch (Sc), and Other Defects (OD). Since the original images have a high resolution, they have been cropped to 512 × 512 for research purposes. Additionally, Figure 5 illustrates the distribution of ground truth bounding boxes, showing a variety of defect shapes. The defects in the WBP-DET dataset are randomly distributed, reflecting the real-world conditions of surface defect detection in wood-based panels.

After cropping, we obtained the WBP-DET dataset, which contains 1793 images for wood-based panel surface defect detection. Figure 6 shows some examples of the defects on wood-based panel surfaces. The Glue Spot defects are typically small, black, circular spots with a gray-level similar to that of the background; Oil Stains are usually irregular black shapes with varying scales and uneven feature distribution; Chalk defects are more prominent, generally appearing as white arcs; Scratch defects are black diagonal lines with significant differences in length and width and have lower contrast; and Other Defects have no distinct characteristics. These images are divided into training, validation, and test sets in an 8:1:1 ratio. Table 1 provides detailed information about the dataset. The WBP-DET dataset will be made publicly available for researchers at the following address: https://github.com/LazyShark2001/FDADNet (accessed on 3 August 2024).

4.2. Implementation Details

Considering that wood-based panel images often contain numerous small defects, we normalized the image size to 640 × 640 to balance the model detection accuracy and computational complexity. This normalization facilitates the easier deployment of the model on edge devices. All of the experiments were conducted on a 16 G Nvidia RTX 4060 Ti GPU (Maxsun, Wuhan, China) with PyTorch 2.0.1. For each dataset, we set the training, validation, and test set ratios to 8:1:1. To ensure fairness in model comparisons, all ablation and comparative experiments were conducted without using pre-trained weights, unless otherwise specified. The other training parameters are detailed in Table 2.

In this study, we used the most common metrics for defect detection to evaluate the methods, including precision (P), Recall (R), and mean average precision (mAP) [49]. Additionally, to assess the model’s complexity and size, we considered the number of floating-point operations (FLOPs) and the number of parameters (Params) to evaluate the model’s computational efficiency and complexity, which are crucial for deploying the network on edge devices. The formulas for the relevant evaluation metrics are as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(11)

R e c a l l = \frac{T P}{T P + F N}

(12)

m A P = \frac{\sum_{i = 1}^{c} \int_{0}^{1} P (R) d R}{c}

(13)

where True Positives (TP) and True Negatives (TN) represent the correct predictions, False Positives (FP) and False Negatives (FN) represent the incorrect predictions, P stands for precision, R stands for Recall, and c denotes the number of classes.

4.3. Comparison Methodology

The experimental details and hyperparameter settings for the comparison methods are provided in Table 3. It is important to note that, except for Faster R-CNN, which scales both the width and the height of the images to within the range of [800, 1333], the input image size for all other networks is resized to (640, 640). Due to the large number of parameters in the Faster R-CNN and RTDETR models, and the relatively small dataset size, training with the initial weights resulted in slower convergence. Therefore, we loaded the official pretrained weights for these two models during training, while the other models were trained using the initialized weights. We ensured that the models did not overfit by monitoring the accuracy curves on the validation set. Through multiple experiments, we determined the relatively appropriate number of epochs for each network. Additionally, for each model, we saved the training weights from the final epoch and used them to test on the test set. The results of the test set were then compared across models. Apart from our method, the reproduction of the other methods utilized the mm-detection and ultralytics frameworks. When training the other comparative methods, we modified only the epochs and batch size parameters, while all other hyperparameters were retained as the default settings of their respective network frameworks.

4.4. Comparison with State-of-the-Art Models

To evaluate the proposed FDADNet comprehensively, we conducted both quantitative and visual analyses using eight state-of-the-art object detection methods. These models are categorized into CNN-based and Transformer-based models. The CNN-based models include Faster R-CNN [31], YoloV5 [50], YoloX [51], YoloV7 [27], YoloV8 [28], YoloV10 [30], and RTMDet [26], while RTDETR [37] falls under the Transformer-based models.

Visual Analysis: We visualize the detection results of the different models in Figure 7. Specifically, we selected representative complex scenes, including low-contrast targets, complex background targets, and small defect targets. The visual results indicate that the CNN-based methods, such as Faster R-CNN, RTMDet, YoloV8, and YoloV10, perform exceptionally well in local perception, effectively detecting targets with clear edges, such as the Chalk defect shown in Figure 7c. However, these methods struggle to accurately distinguish between targets and backgrounds, especially for weak-featured, low-contrast targets in noisy environments (e.g., Figure 7a,b). For small-scale defect targets (e.g., Figure 7d,f), the Transformer-based RTDETR may overlook the edge details around small defects while capturing global information. Although the self-attention mechanism of Transformer-based models effectively captures global features, it may lack sensitivity to the subtle features of small defects, leading to false positives and missed detections. Additionally, for defects with significant inter-class shape and scale differences and uneven feature distribution (e.g., Figure 7e), Faster R-CNN with anchor-based detection and YoloV10 with dual detection heads may encounter false positives when dealing with dense and uneven defects. In contrast, FDADNet generates more accurate detection boxes, effectively detecting all defects, even in noisy backgrounds, and distinguishing between the foreground and the background with high precision, demonstrating an outstanding performance in detecting small defect targets.

Quantitative Analysis: Table 4 presents the quantitative comparison results. Our FDADNet not only achieves the best detection performance, with 79.6% improvement in mAP₅₀, but also demonstrates an advantage in model parameters and FLOPs (4.5 M and 6.2 G) compared to other SOTA methods. Due to the presence of numerous low-contrast defects in the images (such as Glue Spot and Scratch), most spatial-convolution-based methods perform poorly in detecting these defects. Although RTMDet improves the issue of large morphological differences in Scratch defects through dynamic label assignment and kernel depth convolution, its ability to perceive details for low-contrast small defects (such as Glue Spot) remains insufficient. On the other hand, RTDETR, based on spatial transformers, performs well with large-scale targets (such as Oil Stains, Scratch, and Other Defects) but often overlooks local feature information when detecting small defects (such as Glue Spot), leading to missed detections. Transformer models rely on self-attention mechanisms to capture global information, which provides an advantage in identifying large defects. However, for small defects, the self-attention mechanism may fails to adequately focus on minute details, and the position encoding performs poorly on fine-grained spatial information, resulting in reduced detection accuracy for small defects. This neglect of local features and positional information discrepancies are likely the main reasons for RTDETR’s poor performance in small defect detection. In contrast, FDADNet excels not only in detecting low-contrast targets, but also effectively balances the detection performance for both large and small objects. As shown in Figure 8, we plotted the confusion matrix for the results of each network. In the figure, our diagonal values are relatively concentrated, with fewer false detections. This indicates that our network maintains a high level of accuracy for each target.

4.5. Generalization Analysis

To further validate the generalization ability of our model, we conducted experiments on several defect datasets, including GC10-DET [52], APDDD [53], and NEU-DET [54]. These datasets feature surface defects in steel and aluminum from industrial scenarios, commonly used to assess the performance of defect detection models. Table 5 provides detailed information about these datasets. We compared our model with the current mainstream single-stage lightweight detection models, as shown in Table 6. The results indicate that our model achieved the best detection performance on these three datasets, with mAP₅₀ reaching 56.8%, 66.3%, and 76%, respectively. Additionally, in terms of parameters and FLOPs, our model ranks high among the mainstream single-stage lightweight detectors. Our approach demonstrates strong generalization capability across various industrial defect datasets and effectively addresses complex defect detection tasks.

4.6. Ablation Study

To validate the effectiveness of each module, we designed a series of ablation experiments. In these experiments, we replaced ADConv with conventional convolution and ELAN with MFDW as the baseline.

Frequency Domain Module Analysis: As shown in Table 7 (comparing the first and second rows of results), the baseline model performs well in detecting high-contrast defects with clear contours (such as Chalk and Other Defects). However, it performs poorly on low-contrast and small defects (like Glue Spot, Oil Stains, and Scratch). Adding the MFDW module significantly improves the detection performance of Glue Spot, Oil Stains, and Scratch. This enhancement is due to the MFDW module’s capability to process image signal information in the frequency domain, allowing for a more precise detection of defects that are close in grayscale to the background, thus reducing missed detections of low-contrast defects. Additionally, this approach significantly reduces the model’s parameter count and computational load (parameters from 3.01 M to 2.78 M and FLOPs from 8.1 G to 7.4 G), lowering deployment and inference costs and making the model more competitive in practical applications.

ADConv Module Analysis: As shown in Table 7 (first and third row results), replacing the downsampling module in the baseline network with ADConv significantly improves the performance in detecting small defects (e.g., Glue Spot), with mAP₅₀ increasing from 61.2% to 71%. This improvement is mainly because Glue Spot, as a small defect, has relatively weak features that can be overshadowed by larger defect features. ADConv, through dynamic convolutional downsampling, provides more refined processing of features with different attributes. This dynamic convolutional downsampling mechanism not only effectively preserves the edge texture details of small defects, but also adapts to the feature attributes, enhancing the expression capability of multi-scale defect information. Although ADConv introduces additional parameters (increasing from 3.01 M to 4.47 M), the overall computation (FLOPs) actually decreases (from 8.1 G to 6.9 G) due to the omission of subsequent computation steps such as BatchNorm. This design optimizes computational complexity, improves inference efficiency, and is advantageous for deploying and running the network on edge devices.

The compatibility between MFDW and ADConv is well demonstrated: As shown in Table 7 (first and fourth row results), when both MFDW and ADConv are used together, there is a significant performance improvement, especially in detecting low-contrast defects (such as Glue Spot, Oil Stains, and Scratch), compared to the baseline model. The detection capabilities for Glue Spot and Oil Stains are significantly enhanced. This indicates that MFDW and ADConv work synergistically, with good compatibility. ADConv and MFDW process features in spatial and frequency domains, respectively, complementing and enhancing the information across different dimensions. This combination provides the model with strong feature representation capabilities across various dimensions, thereby improving defect detection.

The necessity of Gaussian filtering: As shown in Table 7 (results from the fourth and sixth rows), after incorporating Gaussian filtering for denoising, the detection performance for Glue Spot, Oil Stains, Scratch, and Chalk defects has significantly improved, achieving leading levels. This enhancement is primarily due to the complementary advantages and synergistic effects of the MFDW and ADConv methods in feature processing. Gaussian filtering removes the noise around the target before converting the features back to the spatial domain, significantly reducing background noise in the feature maps. This allows ADConv to focus more effectively on extracting the local features of the defects without interference from background noise, thereby improving the accuracy of bounding box predictions. In practice, when applying Gaussian filtering to feature maps, the goal is to preserve as much of the original image information as possible while suppressing background noise. As shown in Figure 9a, for the WBP-DET dataset, with Gaussian filter standard deviations of 20 and 40, the noise removal effect is significant, but the texture information of the image is less preserved. Conversely, with standard deviations of 80 and 100, while more texture information is retained, the denoising effect is insufficient, and noise interference remains. A standard deviation of 60 strikes a balance by preserving rich image texture details while effectively suppressing noise. As illustrated in Figure 9b and Table 8, with a Gaussian filter standard deviation of 60, the resulting images retain rich edge information and effectively remove noise, enabling the subsequent spatial domain convolution operations to focus more precisely on the target features, achieving optimal detection performance.

Why MFDW is Used Only in the Feature Extraction Stage: As shown in Table 7 (fifth and sixth rows), extending the use of MFDW from the backbone to both the backbone and neck stages results in decreased detection performance for low-contrast defects (such as Glue Spot, Oil Stains, and Scratch). Although Gaussian filtering effectively removes noise, it also slightly loses image details. MFDW in the backbone extracts relatively complete feature information and significantly reduces background noise interference. However, the neck stage processes the output feature maps from the backbone, and using MFDW in the neck stage not only makes it difficult to further denoise, but also increases the risk of losing substantial semantic information due to detail loss, particularly when dealing with high-level feature maps. Therefore, applying MFDW only in the backbone network maximizes noise suppression while preserving object information integrity, thus optimizing detection performance.

4.7. Experimental Conclusions

The results of the comparative experiments and generalization experiments indicate that our method can effectively detect small target defects and low-contrast target defects while still performing excellently when generalized to other tasks. This demonstrates that combining spatial and frequency domains for feature extraction enhances feature encoding and adaptability, as well as providing superior stability. Additionally, the results of the ablation experiments show that the frequency domain module has strong feature extraction capabilities for low-contrast targets, and that Gaussian filtering, when used to eliminate background noise, enables the detector to better detect small defects and blurred boundary targets. Furthermore, the ADConv and MFDW modules complement and enhance information from different dimensions, exhibiting strong feature representation capabilities.

5. Conclusions

Addressing the ambiguity between defects and backgrounds, as well as the variability in defect sizes, is crucial for detecting surface defects in wood-based panels. This paper proposes a defect detection method, FDADNet, based on frequency domain transformation and adaptive dynamic downsampling. Specifically, we designed the MFDW module to tackle noise accumulation and boundary blurring issues in feature extraction. MFDW enhances the separability of signals between targets and backgrounds in the frequency domain. Additionally, Gaussian filtering is used to suppress noise, reducing its impact on feature representation and further enhancing the expression of target features. Furthermore, we introduced the ADConv module for image downsampling to address the variability in defect sizes. ADConv adaptively compresses and reinforces feature maps, enabling the flexible enhancement of targets of different scales. This mechanism allows both large and small defect features to be adaptively enhanced in the transformed space, reducing feature confusion between multi-scale objects and improving the discriminability of small defects, thus achieving a more balanced detection performance. Moreover, we established a new dataset for defect detection, WBP-DET. Compared to the current mainstream object detection methods, our model achieves the highest detection accuracy on the WBP-DET dataset, particularly excelling in detecting small and low-contrast defects. Additionally, our method demonstrates significant advantages in terms of parameter count and computational complexity. Our approach also performs excellently across three other mainstream industrial material defect detection datasets. The outstanding performance of FDADNet makes it highly suitable for practical applications in complex industrial scenarios. In future work, we will further explore the potential of frequency domain techniques and expand their application to other industrial inspection tasks.

Author Contributions

Conceptualization, H.L., Z.Y., W.Y. and L.M.; methodology, L.M.; software, Z.Y. and L.G.; validation, Z.Y., Q.S., Z.W. and W.C.; formal analysis, Y.W.; investigation, Z.Y.; data curation, Q.S.; writing—original draft preparation, Z.Y.; writing—review and editing, L.M.; visualization, H.L.; supervision, W.Y., L.M. and H.L.; project administration, W.Y.; funding acquisition, H.L., L.M. and W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Open Research Fund Program of LIESMARS (Grant No. 21E02); the Hubei Key Laboratory of Intelligent Robot (Wuhan Institute of Technology) of China (HBIRL 202113); the Hubei Province Young Science and Technology Talent Morning Hight Lift Project (202319); the Natural Science Foundation of Hubei Province (2024AFB1050); the University Student Innovation and Entrepreneurship Training Program Project (202210500028); and the Doctoral Starting Up Foundation of Hubei University of Technology (XJ2023007301); Science and Technology Research Project of Education Department of Hubei Province (B2023362); Excellent Young and Middle aged Science and Technology Innovation Team Project for Higher Education Institutions of Hubei Province (T2022054, T2023045).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Liang Ge was employed by the Tianjin Institute of Surveying and Mapping Co., Ltd. Author Wei Cao was employed by the Hubei Geomatics Technology Group Stock Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Liu, Y.; Zhang, C.; Dong, X. A survey of real-time surface defect inspection methods based on deep learning. Artif. Intell. Rev. 2023, 56, 12131–12170. [Google Scholar] [CrossRef]
Roselinkiruba, R.; Hemalatha, R. Secure video steganography using key frame and region selection technique. Int. J. Inf. Technol. 2023, 15, 1299–1308. [Google Scholar] [CrossRef]
Roselinkiruba, R.; Sharmila, T.S. Dynamic optimal pixel block selection data hiding approach using bit plane and image encryption. Int. J. Inf. Technol. 2023, 15, 3441–3448. [Google Scholar] [CrossRef]
Roselinkiruba, R.; Sharmila, T.S.; Julina, J.K.J. An efficient Moving object, Encryption, Compression and Interpolation technique for video steganography. Multimed. Tools Appl. 2024, 83, 1–31. [Google Scholar] [CrossRef]
Xia, J.; Jeong, Y.; Yoon, J. An automatic machine vision-based algorithm for inspection of hardwood flooring defects during manufacturing. Eng. Appl. Artif. Intell. 2023, 123, 106268. [Google Scholar]
Guo, B.; Wang, Y.; Zhen, S.; Yu, R.; Su, Z. SPEED: Semantic prior and extremely efficient dilated convolution network for real-time metal surface defects detection. IEEE Trans. Ind. Inform. 2023, 19, 11380–11390. [Google Scholar] [CrossRef]
Luo, Q.; Su, J.; Yang, C.; Silven, O.; Liu, L. Scale-selective and noise-robust extended local binary pattern for texture classification. Pattern Recognit. 2022, 132, 108901. [Google Scholar] [CrossRef]
Liu, W.; Yang, X.; Yang, X.; Gao, H. A novel industrial chip parameters identification method based on cascaded region segmentation for surface-mount equipment. IEEE Trans. Ind. Electron. 2021, 69, 5247–5256. [Google Scholar] [CrossRef]
Diwakar, M.; Pandey, N.K.; Singh, R.; Sisodia, D.; Arya, C.; Singh, P.; Chakraborty, C. Low-dose COVID-19 CT image denoising using CNN and its method noise thresholding. Curr. Med. Imaging 2023, 19, 182–193. [Google Scholar]
Fu, Y.; Huang, M.; Gong, D.; Lin, H.; Fan, Y.; Du, W. Dynamic simulation and prediction of carbon storage based on land use/land cover change from 2000 to 2040: A case study of the Nanchang urban agglomeration. Remote Sens. 2023, 15, 4645. [Google Scholar] [CrossRef]
Chen, Z.; Huang, M.; Zhu, D.; Altan, O. Integrating remote sensing and a markov-FLUS model to simulate future land use changes in Hokkaido, Japan. Remote Sens. 2021, 13, 2621. [Google Scholar] [CrossRef]
Huang, M.; Gong, D.; Zhang, L.; Lin, H.; Chen, Y.; Zhu, D.; Xiao, C.; Altan, O. Spatiotemporal dynamics and forecasting of ecological security pattern under the consideration of protecting habitat: A case study of the Poyang Lake ecoregion. Int. J. Digit. Earth 2024, 17, 2376277. [Google Scholar] [CrossRef]
Li, X.; Lin, C.; Zhang, M.; Li, S.; Sun, S.; Liu, J.; Hu, T.; Sun, L. Predicting the rate of forest fire spread toward any directions based on a CNN model considering the correlations of input variables. J. For. Res. 2023, 28, 111–119. [Google Scholar] [CrossRef]
Zhang, M.; Xu, B.; Wen, J. Deep Image Segmentation Using a Morphological Edge Operator. Recent Adv. Comput. Sci. Commun. Former. Recent Pat. Comput. Sci. 2023, 16, 96–102. [Google Scholar] [CrossRef]
Mei, L.; Shen, H.; Yu, Y.; Weng, Y.; Li, X.; Zahid, K.R.; Huang, J.; Wang, D.; Liu, S.; Zhou, F. High-throughput and high-accuracy diagnosis of multiple myeloma with multi-object detection. Biomed. Opt. Express 2022, 13, 6631–6644. [Google Scholar] [CrossRef]
Mei, L.; Ye, Z.; Xu, C.; Wang, H.; Wang, Y.; Lei, C.; Yang, W.; Li, Y. SCD-SAM: Adapting Segment Anything Model for Semantic Change Detection in Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5626713. [Google Scholar] [CrossRef]
Xu, C.; Ye, Z.; Mei, L.; Yu, H.; Liu, J.; Yalikun, Y.; Jin, S.; Liu, S.; Yang, W.; Lei, C. Hybrid attention-aware transformer network collaborative multiscale feature alignment for building change detection. IEEE Trans. Instrum. Meas. 2024, 73, 5012914. [Google Scholar] [CrossRef]
Dlamini, S.; Kuo, C.-F.J.; Chao, S.-M. Developing a surface mount technology defect detection system for mounted devices on printed circuit boards using a MobileNetV2 with Feature Pyramid Network. Eng. Appl. Artif. Intell. 2023, 121, 105875. [Google Scholar] [CrossRef]
Zhang, L.; Chen, J.; Chen, J.; Wen, Z.; Zhou, X. LDD-Net: Lightweight printed circuit board defect detection network fusing multi-scale features. Eng. Appl. Artif. Intell. 2024, 129, 107628. [Google Scholar] [CrossRef]
Jiang, W.; Li, T.; Zhang, S.; Chen, W.; Yang, J. PCB defects target detection combining multi-scale and attention mechanism. Eng. Appl. Artif. Intell. 2023, 123, 106359. [Google Scholar] [CrossRef]
Song, G.; Song, K.; Yan, Y. EDRNet: Encoder–decoder residual network for salient object detection of strip steel surface defects. IEEE Trans. Instrum. Meas. 2020, 69, 9709–9719. [Google Scholar] [CrossRef]
Su, J.; Luo, Q.; Yang, C.; Gui, W.; Silvén, O.; Liu, L. PMSA-DyTr: Prior-Modulated and Semantic-Aligned Dynamic Transformer for Strip Steel Defect Detection. IEEE Trans. Ind. Inform. 2024, 20, 6684–6695. [Google Scholar] [CrossRef]
Dong, H.; Song, K.; Wang, Q.; Yan, Y.; Jiang, P. Deep metric learning-based for multi-target few-shot pavement distress classification. IEEE Trans. Ind. Inform. 2021, 18, 1801–1810. [Google Scholar] [CrossRef]
Su, B.; Chen, H.; Chen, P.; Bian, G.; Liu, K.; Liu, W. Deep learning-based solar-cell manufacturing defect detection with complementary attention network. IEEE Trans. Ind. Inform. 2020, 17, 4084–4095. [Google Scholar] [CrossRef]
Cao, J.; Yang, G.; Yang, X. A pixel-level segmentation convolutional neural network based on deep feature fusion for surface defect detection. IEEE Trans. Instrum. Meas. 2020, 70, 5003712. [Google Scholar] [CrossRef]
Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. Rtmdet: An empirical study of designing real-time object detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Ultralytics/Ultralytics. Available online: https://github.com/ultralytics/ultralytics (accessed on 4 May 2024).
Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. Yolov9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
Liu, S.; Li, F.; Zhang, H.; Yang, X.; Qi, X.; Su, H.; Zhu, J.; Zhang, L. Dab-detr: Dynamic anchor boxes are better queries for detr. arXiv 2022, arXiv:2201.12329. [Google Scholar]
Li, F.; Zhang, H.; Liu, S.; Guo, J.; Ni, L.M.; Zhang, L. Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 13619–13627. [Google Scholar]
Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.-Y. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 16965–16974. [Google Scholar]
Zhong, Y.; Li, B.; Tang, L.; Kuang, S.; Wu, S.; Ding, S. Detecting camouflaged object in frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 4504–4513. [Google Scholar]
Rao, Y.; Zhao, W.; Zhu, Z.; Lu, J.; Zhou, J. Global filter networks for image classification. Adv. Neural Inf. Process. Syst. 2021, 34, 980–993. [Google Scholar]
Ruan, J.; Gao, J.; Xie, M.; Xiang, S. Learning Multi-axis Representation in Frequency Domain for Medical Image Segmentation. arXiv 2023, arXiv:2312.17030. [Google Scholar]
Zhao, C.; Cai, W.; Dong, C.; Hu, C. Wavelet-based fourier information interaction with frequency diffusion adjustment for underwater image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 8281–8291. [Google Scholar]
Yang, B.; Bender, G.; Le, Q.V.; Ngiam, J. Condconv: Conditionally parameterized convolutions for efficient inference. Adv. Neural Inf. Process. Syst. 2019, 32, 767. [Google Scholar]
Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11030–11039. [Google Scholar]
Zhang, Y.; Zhang, J.; Wang, Q.; Zhong, Z. Dynet: Dynamic convolution for accelerating convolutional neural networks. arXiv 2020, arXiv:2004.10694. [Google Scholar]
Li, C.; Zhou, A.; Yao, A. Omni-dimensional dynamic convolution. arXiv 2022, arXiv:2209.07947. [Google Scholar]
Tian, C.; Zhang, X.; Ren, J.; Zuo, W.; Zhang, Y.; Lin, C.-W. A Heterogeneous Dynamic Convolutional Neural Network for Image Super-resolution. arXiv 2024, arXiv:2402.15704. [Google Scholar]
Han, K.; Wang, Y.; Guo, J.; Wu, E. ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 15751–15761. [Google Scholar]
Wang, C.-Y.; Liao, H.-Y.M.; Yeh, I.-H. Designing network design strategies through gradient path analysis. arXiv 2022, arXiv:2211.04800. [Google Scholar]
Li, H.; Yi, Z.; Mei, L.; Duan, J.; Sun, K.; Li, M.; Yang, W.; Wang, Y. SCFNet: Lightweight Steel Defect Detection Network Based on Spatial Channel Reorganization and Weighted Jump Fusion. Processes 2024, 12, 931. [Google Scholar] [CrossRef]
Ultralytics/Yolov5:v5.0. Available online: https://github.com/ultralytics/yolov5/tree/v7.0 (accessed on 4 June 2024).
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Lv, X.; Duan, F.; Jiang, J.J.; Fu, X.; Gan, L. Deep metallic surface defect detection: The new benchmark and detection network. Sensors 2020, 20, 1562. [Google Scholar] [CrossRef]
Sui, T.; Wang, J. PDDD-Net: Defect Detection Network Based on Parallel Attention Mechanism and Dual-Channel Spatial Pyramid Pooling. IEEE Access 2023, 11, 141764–141775. [Google Scholar] [CrossRef]
Song, K.; Yan, Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surf. Sci. 2013, 285, 858–864. [Google Scholar] [CrossRef]

Figure 1. Processing wood-based panel surface images in the frequency domain. In panels (a,b), 2D DFT transformation is applied to four small regions, where green represents Glue Spot, blue represents Oil Stains, and red represents the background. Panel (a) shows a comparison of signal intensity between the Glue Spot region and the background in the frequency domain, while panel (b) shows a comparison between the Oil Stains region and the background. Panel (c) illustrates the result of Gaussian filtering applied to the defect image to suppress noise, with yellow highlighting the defect target.

Figure 2. Overview of the proposed FDADNet. The network is composed of three main components: the backbone (yellow background box in the lower part), the neck (blue background box in the lower part), and the heads (pink background box in the lower part). In MFDW, the blue squares represent feature maps.

Figure 3. MFDW and ELAN structure diagram. Here, (H, W), (C, W), and (C, H) represent feature transformations along the height–width axis, channel–width axis, and channel–height axis of the feature map, respectively.

Figure 4. ADConv structure diagram. The left side shows the heatmap of the input feature map, while the right side displays the heatmap of the output feature map. The 3D plots illustrate the spatial distribution of features for objects of different scales, with the left plot representing the input and the right plot representing the output. The yellow spheres denote small object features, and the blue spheres denote large object features.

Figure 5. GT distribution of the WBP-DET dataset. (a) Number of instances for the different categories. (b) Distribution of bounding boxes. (c) Distribution of relative positions of center points. (d) Distribution of relative height-to-width ratios of bounding boxes.

Figure 6. A portion of the WBP-DET dataset, with different types of defects highlighted in red bounding boxes. (a) Original images of the wood-based Panel. (b) Glue Spot defects. (c) Oil Stains defects. (d) Scratch defects. (e) Chalk defects. (f) Other Defects.

Figure 7. Visual comparison of FDADNet with other networks. Among them, (a,b) represent low-contrast defect targets, (c) represents defect targets prone to false detection, (d,f) represent small defect targets, and (e) represents defect targets with significant scale variations.

Figure 8. Confusion matrix for different methods, where values closer to 1 on the diagonal of the categories indicate better performance.

Figure 9. (a) Visual analysis of denoising with different standard deviations of Gaussian filtering, with the yellow box highlighting Glue Spot. (b) Comparison of model results with Gaussian filtering at different standard deviations.

Table 1. Introduction to the WBP-DET dataset.

WBP-DET	Defect Images						Use
Number	Total	Glue Spot	Oil Stains	Chalk	Scratch	Other	Train	Val	Test
Number	1793	450	352	465	221	406	1434	179	180

Table 2. Parameter settings during training.

Parameters	Setup
Epochs	400
NMS IoU	0.5
Batch Size	16
Standard Deviation (σ)	60
Optimizer	AdamW
Initial Learning Rate	1.111 × 10⁻³
Final Learning Rate	1.111 × 10⁻⁵
Weight Decay	5 × 10⁻⁴
Momentum	0.9
Mosaic	1.0
Close Mosaic	Last 10 epochs

Table 3. The hyperparameter settings for the comparative experiments.

Methods	Epochs	Batch Size	Optimizer	Learning Rate	Weight Decay	NMS IoU
Faster R-CNN	12	8	SGD	0.02	0.0001	0.7
YoloV5s	400	16	SGD	0.01	0.0005	0.6
YoloX-Tiny	400	16	SGD	0.01	0.0005	0.65
RTMDet	400	16	AdamW	0.004	0.05	0.65
YoloV7-Tiny	400	16	SGD	0.01	0.0005	0.65
YoloV8n	400	16	AdamW	0.001111	0.0005	0.7
YoloV10n	400	16	AdamW	0.001111	0.0005	0.7
RTDETR	300	16	AdamW	0.001111	0.0005	0.7
FDADNet	400	16	AdamW	0.001111	0.0005	0.5

Table 4. Performance comparison of FDADNet with eight other mainstream methods on the WBP-DET dataset. The table includes class average precision (AP, %), mean average precision at IoU threshold 0.5 (mAP₅₀, %), the number of parameters (Params), and floating-point operations (GFLOPs). * represents using pretrained weight.

Methods	OS	GS	Sc	Ch	OD	mAP₅₀	Params/M	GFLOPs
Faster R-CNN [31] * (2015)	64.9	65.7	65.5	96	75.8	73.6	41.37	134
YoloV5s [50] (2021)	57.3	56.1	62.2	98	68.3	68.4	7.02	15.8
YoloX-Tiny [51] (2021)	67.9	65.8	45.8	93.9	60.5	66.8	5.03	7.57
RTMDet [26] (2022)	52.9	69.9	86.1	97.6	82.2	77.8	4.87	8.02
YoloV7-Tiny [27] (2023)	61.2	74.6	40.5	97.2	70.1	68.7	6.01	13.1
YoloV8n [28] (2023)	57.9	61.9	59.9	97.4	79	71.2	3.01	8.1
YoloV10n [30] (2024)	51.1	64.8	63.1	97.3	68.9	69.1	2.7	8.2
RTDETR [26] * (2024)	56	56.7	77.9	91.8	74.6	71.4	31.9	103.5
FDADNet	61.3	79.3	85.5	98.4	73.7	79.6	4.5	6.2

Red bold text indicates the top-ranked performance, while blue bold text denotes the second-ranked performance.

Table 5. Details of the GC10-DET, APDDD, and NEU-DET datasets.

Dataset	Type	Res	No.	Train	Val	Test
GC10-DET	10	2048 × 1000	2294	1835	229	230
APDDD	10	2560 × 1920	1885	1508	188	189
NEU-DET	6	200 × 200	1800	1440	180	180

Table 6. Comparison of FDADNet with State-of-the-Art Single-Stage Object Detectors on APDDD, GC10-DET, and NEU-DET datasets. mAP_50–95 represents the average mAP over 10 thresholds ranging from 50% to 95%, with a 5% interval.

Methods	APDDD		GC10-DET		NEU-DET		Params/M	GFLOPs
Methods	mAP₅₀	mAP_50–95	mAP₅₀	mAP_50–95	mAP₅₀	mAP_50–95	Params/M	GFLOPs
YoloV5n	54.7	30.2	60.7	30.2	71.6	38.8	1.77	4.2
YoloV5s	55.3	31.5	62.8	30.3	74.6	44.2	7.04	15.8
YoloX	50	24.7	63.3	28.7	67	31.7	5.03	7.57
YoloV7	52.5	28.5	63.6	29.1	67.9	36.8	6.03	13.1
YoloV8n	55.6	32.7	62.7	30.7	74.8	43.3	3.01	8.1
YoloV10n	48.1	27.4	59.9	28.9	71.5	40.8	2.7	8.2
FDADNet	56.8	31.3	66.3	31.5	76	44.4	4.5	6.2

Red bold text indicates the top-ranked performance, while blue bold text denotes the second-ranked performance.

Table 7. Ablation experiments with different structures. A indicates the use of MFDW in the backbone, B indicates the use of DyConv as the model’s downsampling, C indicates the addition of Gaussian filtering in MFDW, and D indicates the use of MFDW in both the backbone and neck.

A	B	C	D	OS	GS	Sc	Ch	OD	mAP₅₀	Params/M	GFLOPs
				57.1	61.2	61.8	97.7	79.7	71.5	3.01	8.1
√				58.1	69.9	75.4	98.6	75.3	75.5	2.78	7.4
	√			59.1	71	66.6	97.1	75.7	73.9	4.74	6.9
√	√			64.5	76.6	70	97.2	70.6	76.5	4.5	6.2
	√	√	√	54	68.4	73.3	98.7	77.4	74.4	4.3	5.7
√	√	√		61.3	79.3	85.5	98.4	73.7	79.6	4.5	6.2

Red bold text indicates the top-ranked performance, while blue bold text denotes the second-ranked performance.

Table 8. Comparison of results with different standard deviations of Gaussian Filtering.

ρ	P	R	mAP₅₀	mAP_50–95	mAP₇₅
10	78.6	72.8	75.4	41.5	37.9
20	73.2	72.5	73.2	41	37.6
30	74.3	75.3	76.8	43.2	41.8
40	76.5	73.2	75.6	41.3	40
50	71.3	75.9	74.3	41.1	39.5
60	83.6	75.7	79.6	43.6	42
70	80.6	67.1	74.3	42.2	41.6
80	75.8	73.3	73.8	40.7	36.3
90	77.2	72.5	76.4	42.8	42.5
100	78.8	72.1	74.7	39.7	37.7

Red bold text indicates the top-ranked performance, while blue bold text denotes the second-ranked performance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, H.; Yi, Z.; Wang, Z.; Wang, Y.; Ge, L.; Cao, W.; Mei, L.; Yang, W.; Sun, Q. FDADNet: Detection of Surface Defects in Wood-Based Panels Based on Frequency Domain Transformation and Adaptive Dynamic Downsampling. Processes 2024, 12, 2134. https://doi.org/10.3390/pr12102134

AMA Style

Li H, Yi Z, Wang Z, Wang Y, Ge L, Cao W, Mei L, Yang W, Sun Q. FDADNet: Detection of Surface Defects in Wood-Based Panels Based on Frequency Domain Transformation and Adaptive Dynamic Downsampling. Processes. 2024; 12(10):2134. https://doi.org/10.3390/pr12102134

Chicago/Turabian Style

Li, Hongli, Zhiqi Yi, Zhibin Wang, Ying Wang, Liang Ge, Wei Cao, Liye Mei, Wei Yang, and Qin Sun. 2024. "FDADNet: Detection of Surface Defects in Wood-Based Panels Based on Frequency Domain Transformation and Adaptive Dynamic Downsampling" Processes 12, no. 10: 2134. https://doi.org/10.3390/pr12102134

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FDADNet: Detection of Surface Defects in Wood-Based Panels Based on Frequency Domain Transformation and Adaptive Dynamic Downsampling

Abstract

1. Introduction

2. Related Work

2.1. Deep-Learning-Based Object Detection Methods

2.2. Applications of Fourier Transform in Computer Vision

2.3. Dynamic Convolution

3. Methods

3.1. Multi-Axis Frequency Domain Weighted Information Representation Module

3.2. Adaptive Dynamic Convolution

3.3. Loss Function

4. Experiments

4.1. WBP-DET Dataset

4.2. Implementation Details

4.3. Comparison Methodology

4.4. Comparison with State-of-the-Art Models

4.5. Generalization Analysis

4.6. Ablation Study

4.7. Experimental Conclusions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI