Textile Defect Detection Algorithm Based on the Improved YOLOv8
Textile Defect Detection Algorithm Based on the Improved YOLOv8
ABSTRACT Automatic detection of textile defects is a crucial factor in improving textile quality. Fast
and accurate detection of these defects is key to achieving automation in the textile industry. However,
the detection of textile defects faces challenges such as small defect targets, low contrast between defects
and the background, and significant variations in the aspect ratio of defects. To address these issues, this
study proposes a new method for textile defect detection based on an improved version of You Only Look
Once Version 8(YOLOv8) called DA-YOLOv8s. Deep & Cross Network(DCNv2) is introduced into the
Backbone Network to replace the C2F module, enhancing the extraction of network features; an self-attention
mechanism, Polarized Self-Attention(PSA), is adopted to increase feature fusion capability and reduce
feature loss in both channel and spatial dimensions; finally, a Small Object Detection Head (SOHead) is
added to improve the feature extraction ability for small targets. Experimental results show that the improved
YOLOv8 algorithm achieves has achieved [email protected] and mAP of 44.6% and 48.6% respectively, which is
an improvement of 4.2% and 3.8% over the original algorithm, and also outperforms the Optimal YOLOv9s
model and the latest YOLOv11s model in these two metrics. The speed of textile defect detection has
reached 257.38 frames per second (FPS) and the floating-point operation speed is 36.6 GFLOPS, ensuring
the accuracy and speed of textile defect detection, with practical engineering application value.
INDEX TERMS Interest point detection, textile industry, quality management, YOLOv8, textile defect
detection, polarized self-attention, deep & cross network.
curtain cloth, which divides the image into several regions based on convolutional neural networks. This method decom-
and analyzes the changes in grayscale ratio to determine poses textile images into multiple local patches and labels
the defects in the window of textiles. Zhu et al. [4] used them, then transmits them to a pre-trained deep CNN for
the autocorrelation function to determine the pattern cycle learning, and uses the trained model to detect each patch,
of colored fabrics to determine the size of the detection thereby obtaining the category and position of each defect.
window, then calculated the Gray Level Co-occurrence Ma et al. [13] used an improved parameter VGG16 model
Matrix (GLCMs) to characterize the original image, and to train a classifier for detecting and recognizing defects in
computed the Euclidean distance of GLCMs between the denim fabrics and constructed a defect detection algorithm
images to be detected and the defect-free template image to based on a cascading architecture by merging the two models.
achieve defect detection. Guan et al. [5] first highlighted the Deep learning methods lack in speed and accuracy.
defect areas using image enhancement techniques, then used In recent years, with the further development of computer
the first-order derivative for edge detection while employing vision technology, research on object detection algorithms
the Roberts operator to detect the edges of the defect areas to has largely focused on those based on candidate regions and
improve detection accuracy. regression-based deep convolutional neural networks. The
Spectral analysis methods treat images as two-dimensional Faster R-CNN algorithm is a representative of the candidate
signals with amplitude variations and perform frequency region-based object detection algorithms, demonstrating
domain analysis through certain transformation algorithms, excellent performance in the field of object detection [14],
commonly including Fourier transform, Wavelet transform, [15], [16], [17]. Wei et al. [18] proposed a Faster R-CNN
and Gabor filter transform. Hu et al. [6] proposed an model based on an improved VGG structure, which adapts
unsupervised method based on the combination of the to the characteristics of fabric defect images by reducing
discrete Fourier transform (DFT) and the discrete wavelet the number of anchor points in the Faster R-CNN. The
transform (DWT), which performs wavelet shrinkage denois- VGG16 was modified to include 13 convolutional layers
ing on the residual image after Fourier recovery, and applies (with the activation function being ReLu) and four pooling
inverse transformation to the approximation coefficients and layers to extract feature maps. Additionally, the Region
processed wavelet coefficients separately to achieve defect Proposal Network (RPN) and the ROI pooling layer were
information segmentation using simple thresholding. Li and improved to enhance the model. An et al. [19] improved
Zhou [7] proposed a Defect Direction Projection Algorithm the Faster R-CNN network for textile defect detection by
(DDPA) based on fabric defects characteristics, which filters using deep residual networks instead of traditional VGG-
the input image using Gabor filters, performs a Radon 16 for feature extraction, and incorporating methods such as
transform projection after using hard threshold segmentation, adding feature pyramid modules and increasing the number of
and selects the optimal Gabor filter channel,that is, the anchor boxes. Chen et al. [20] designed a Genetic Algorithm
channel with the maximum defect value, to detect defects. Gabor Faster R-CNN (Faster GG R-CNN) model, which
Xiang et al. [8] proposed a defect detection algorithm based embeds Gabor kernels into Faster R-CNN and employs
on Fourier convolution, which generates image pairs for a two-stage training method based on Genetic Algorithm
training using random masking in the training phase and (GA) and backpropagation for textile defect detection. The
incorporates a Fourier convolution layer into the autoencoder training of the Faster R-CNN algorithm is a two-stage
to achieve automatic detection of dyed fabric defects. object detection algorithm, with the first stage completing
The defect detection algorithms based on traditional region box proposals and the second stage conducting object
computer vision have high computational requirements and recognition within the region boxes, which affects the speed
need to be improved in terms of detection speed and accuracy. and accuracy of object detection.
Deep learning is a new framework in computer vision Regression-based object detection algorithms directly
research, which has been widely applied in the field of regress the bounding box coordinates and object categories
defect detection with the rapid development of big data and at multiple positions in the input image, addressing the
artificial intelligence technologies [9], [10]. Deep learning coexistence of accuracy and speed issues; YOLO is a typical
can automatically extract features, optimize and iterate representative of such algorithms. The diversity of YOLO’s
parameters, thus achieving the function of detecting defects applications in industrial defect detection also verifies the
in textile images. Mei et al. [11] designed a Multi-Scale effectiveness of the algorithm [21], [22], [23], [24]. Yue et al.
Convolutional Denoising Autoencoder Network (MSCDAE) [25] proposed an improved YOLOv4 textile defect detection
that achieved unsupervised detection of textile defects. The algorithm, which, based on the expansion of the dataset
algorithm trained the Convolutional AutoEncoder (CAE) using combined data augmentation methods, improved the
with positive samples, enabling it to extract fabric features head prediction layer and integrated the Convolutional Block
and reconstruct fabric images. Detection is realized by iden- Attention Module (CBAM) to achieve accurate classification
tifying defects based on the difference in features between and localization of tiny target defects. Jin and Di [26] also
defective images and normal fabric images. Jing et al. [12] made improvements to the YOLOv5 network, introducing
proposed an automatic detection method for fabric defects spatial and channel attention models into the backbone
network and designing a multi-task learning strategy with YOLOv8s. We take the currently popular object detection
two detection heads for detecting common defects and algorithm YOLOv8 as the baseline model and improve
identifying specific defects to improve the accuracy of the traditional convolutional neural network in the C2F
defect recognition. These methods have weaker detection module of YOLOv8’s backbone network using the Deep and
capabilities for irregularly sized defects, and the accuracy can Cross Network. Additionally, we introduce the PSA attention
also be further enhanced. mechanism to enhance YOLOv8’s neck network. At the
Section I in our paper introduces the background of same time, we incorporate a small object detection head in
textile defect detection and the development of detection YOLOv8’s head module to improve feature extraction and
methods, summarizes the characteristics of these methods, fusion capabilities, thereby enhancing the accuracy of textile
and proposes improvements and innovations to the YOLOv8s defect detection. The main contributions of our paper are as
benchmark model.Section II describes the principles of follows:
convolutional neural networks, the feasibility of incorporat- (1) We incorporate DCNv2 (Deep & Cross Network) [27]
ing attention mechanisms into neural networks, and intro- into the Backbone to replace the C2f module, enhancing the
duces the basic network structure of YOLOv8. Section III model’s feature extraction capability.
specifically proposes the construction of DA-YOLOV8s (2) We introduce a new feature fusion algorithm, PSA
model, including the introduction of DCNv2 to enhance the (Polarized Self-Attention) [28], which uses an extreme
feature extraction capability of the backbone network, the self-attention mechanism to reduce feature loss in channel
introduction of PSA to improve the feature fusion capability and spatial dimensions.
of the network neck, and the addition of SOHead detection (3) We add a detection head for small objects SOHead to
head to enhance the detection capability of small targets. prevent the loss of small object features, thereby improving
Section IV introduces the dataset and evaluation metrics, detection performance.
conducts comparative experiments and ablation study, and Experiments show that using YOLOv8s as the base net-
analyzes the experimental results. Section V summarizes the work model achieves textile defect detection and recognition,
work achievements of this paper and looks forward to future with improved DA-YOLOv8s increasing [email protected] and
research.The structure of the paper content is shown in Fig 1. [email protected],0.3,0.1 by 4.2% and 3.8%, reaching 0.446% and
48.6%, but there is still room for improvement in terms of
model accuracy.
Input Layer: This layer standardizes the input image data brates the weight of each channel, which can be considered
and performs data augmentation operations to ensure that the an object selection process, thus determining what to pay
input data meets the requirements of the convolutional layer. attention to [31]. Spatial attention can be regarded as
Convolutional Layer: This layer performs convolutional an adaptive spatial region selection mechanism: where to
operations and feature extraction on the input image. pay attention. It emphasizes or suppresses information at
Each element of the convolutional kernel corresponds to these locations by assigning different weights to each pixel
a weight coefficient and a bias vector. When operating, or pixel block. Self-attention mechanisms, on the other
the convolutional kernel systematically sweeps across the hand, automatically learn the associations between different
input features, performing matrix element multiplication and positions, thereby capturing richer contextual information
summation within the receptive field, and adding the bias [32].
term. Attention mechanisms have achieved significant progress
Activation Function: Its primary role is to apply a in computer vision applications and are often used in
non-linear mapping to the output of the convolutional layer, conjunction with classic object detection models. Therefore,
enabling the network to have better learning capabilities. based on the characteristics of textile defects, we intro-
Commonly used activation functions include Sigmoid, Tanh, duce the Polarized Self-Attention(PSA) mechanism, which
ReLU, etc. combines channel, spatial, and self-attention mechanisms,
Pooling Layer: After feature extraction by the convo- to improve the network structure of YOLOv8 and enhance
lutional layer, the output feature maps are passed to the algorithm performance. The specific principles are elaborated
pooling layer for feature selection and information filtering. in Section III, Part C.
The pooling layer contains pre-defined pooling functions,
which replace the result of a single point in the feature map
C. THE NETWORK STRUCTURE OF YOLOv8
with a statistical measure of its neighboring area. Common
The network structure of YOLOv8 can be divided into
pooling functions include average pooling and max pooling
four parts: the input end, the backbone network, the neck
strategies.
network, and the head module, as shown in Fig 2. The input
Fully Connected Layer: While the convolutional neural
end mainly includes Mosaic data augmentation, adaptive
network is capable of feature extraction from input data, the
anchor box calculation, and adaptive grayscale padding. The
role of the fully connected layer is to perform non-linear
Backbone network contains structures such as Conv, C2f, and
combinations of the features extracted by the convolutional
SPPF, among which the C2f module is the primary module
and pooling layers to produce the output.
for learning residual features, with branch connections
Output Layer: For image classification problems, the out-
across layers that enrich the gradient flow of the model
put layer uses a logistic function or a normalized exponential
and form a neural network module with stronger feature
function (softmax function) to output classification labels.
representation capabilities. The Neck network adopts the
In object detection problems, the output layer can be designed
PAN (Path Aggregation Network) structure, which enhances
to output the center coordinates, size, and classification of the
the network’s ability to fuse features of objects at different
object.
scaling scales. The Head module is the output end, which
Traditional CNNs extract features in the form of lin-
decouples the classification and regression processes; the loss
ear models, which have limited extraction capabilities.
calculation process mainly includes positive and negative
In contrast, the Cross Network can achieve multi-layer
sample assignment strategies and loss computation, with the
feature interactions, with each layer producing higher-order
assignment strategy using the dynamic assignment method of
interactions based on existing ones, and retaining interactions
Task Aligned Assigner [33], which selects positive samples
from previous layers. The cross network can be trained
based on the weighted results of classification and regression
jointly with a deep neural network [30]. Here, we introduce
scores. The loss calculation for the classification branch
the DCNv2 model to improve the C2f in YOLOv8, which
adopts the BCE Loss method, while the regression branch
includes the classic CNN, as discussed in Section III, Part B.
uses the distribution focal loss [34] and the CIOU (complete
intersection over union) loss function algorithm. The network
B. ATTENTION MECHANISM structure is shown in the Fig 2.
The Attention Mechanism is an approach that emulates
the human visual and cognitive systems. By incorporating III. CONSTRUCTION OF DA-YOLOv8s MODEL
attention mechanisms into neural networks, the networks Our study adopts YOLOv8s as the baseline model, with the
can automatically learn to selectively focus on important aim of enhancing the precision of textile defect detection
information within the input data, thereby enhancing the while also meeting the speed requirements for industrial-
model’s performance and generalization capabilities. The grade applications. To achieve these objectives, we propose
most typical attention mechanisms in the field of computer an algorithm called DA-YOLOv8s. This section will provide
vision include channel attention, spatial attention, and self- a detailed introduction to the model architecture of DA-
attention mechanisms. Channel attention adaptively recali- YOLOv8s and its innovative points of improvement.
FIGURE 2. YOLOv8 network structure diagram. Here w represents the width of the convolutional kernel, r represents the scale
factor.
network structure is shown in Fig 3. The parts marked with the intersection of sparse and dense features in images but
an asterisk ‘‘*’’ and in bold italic are the modified or newly also enhances the model’s perception and learning ability for
added modules. defect details.
Embedding Layer: Classifies the input features into a
B. INTRODUCTION OF DCNv2 TO ENHANCE FEATURE combination of sparse and dense features. Transforms the
EXTRACTION ABILITY OF BACKBONE NETWORK sparse features into embedding vectors and normalizes the
In conventional image convolution operations, the ability dense features, with its output being the concatenation of
to extract targets of various shapes is limited. Therefore, all embedding vectors and normalized dense features: x0 =
to enhance the feature extraction capability of the YOLOv8 xembed,1 ; · · · ; xembed,n ; xdense .
backbone network for target features, this paper designs Cross networks and deep networks: Cross networks are
the DCNv2 network structure to replace four C2f modules characterized by the features of the th layer being operated
in the backbone. The DCNv2 integrates a deep cross with the learned weight matrix and bias vector, then combined
network structure, modeling explicit feature interactions with the first-order original features in the base layer to
through multiple cross-layer networks, and combines deep produce the features of the next layer. The operation rule for
networks to model implicit feature interactions, thereby a single layer is as shown in Fig 5.
achieving automated feature cross-encoding and improving Deep networks, on the other hand, take the features of a
the efficiency of high-order feature extraction. We designed certain layer, operate them with the weight matrix and bias
stacked and parallel structures, and here we adopt the parallel vector, then activate them with the ReLU function to serve as
structure, passing the input features through the cross network the feature input for the next layer, following the operation
layer and the deep network layer, and finally connecting rules as shown in (1).
them. The network structure of DCNv2 and DCNv2_Block
is shown in Fig 4. This not only enables efficient learning of hl+1 = f (Wl hl + bl ) (1)
FIGURE 4. The network structure of DCNv2. The left figure details the specific algorithm of DCNv2_Block.
where b
yi is the prediction; yi is the true label; N is the total
number of inputs; λ is the L2 regularization parameter.
processes it in parallel using channel-wise self-attention and where Wq , Wv and Wz are the convolutional layers 1 × 1, σ1
spatial self-attention, performing convolutions, reshaping, and σ2 are two tensor reshaping operators, FSM (·) is the
and applying the Sigmoid function, among other operations. SoftMax operator, ‘‘×’’ is the matrix dot product operation
The results of these computations are then combined and as (5). The number of internal channels between Wv | Wq
output to the detection head and Conv modules. The enhanced and Wz is C/2, and the output of channel self-attention is
structure of the neck network in YOLOv8 is illustrated in Zch = Ach (X) ⊙ch X ∈ RC×H×W , where ch is the channel
Fig 6. The PSA computation process is as follows: multiplication operator.
Channel Dimension Self-Attention Ach (X) ∈ RC×1×1 :
First, the input features X are transformed into Wq and Wv
Np
using the convolution of 1 × 1, where the channels of Wq X exj
are fully compressed, while the channel dimension of Wv FSM (X ) = xj (5)
Np
remains at a relatively high level (C/2). Because the channel j=1 P
exm
dimension of Wq is compressed, information enhancement is m=1
required through HDR, so the information of Wq is enhanced
using Softmax. Then, Wq and Wv are subjected to matrix Spatial Dimension Self-Attention Asp (X) ∈ RC×H×W :
multiplication, followed by a 1 × 1 convolution and LN to First, the input features were transformed into Wq and Wv
increase the dimension on the channels to C. Finally, the using convolution of 1 × 1. For Wq features, Global Pooling
Sigmoid function is used to keep all parameters between 0-1. was applied to compress the spatial dimension, transforming
The operation is as shown in (4): it into the size of 1 × 1; while the spatial dimension of Wv
features was maintained at a relatively large level (H × W).
Since the spatial dimension of Wq was compressed, Softmax
i
Ach (X ) = FSG Wz|θ1 σ1 (Wv (X )) × FSM σ2 Wq (X ) was used to enhance the information of Wq. Finally, matrix
multiplication was performed on Wq and Wv, followed
(4) by reshape and Sigmoid to ensure all parameters remained
PSAP (X ) = Z ch + Z sp
b
Asp (X )
= FSG σ3 FSM σ1 FGP Wq (X ) × σ2 (Wv (X )) (6)
A. DATASET INTRODUCTION
The dataset is from the Tianchi 2019 Guangdong Industrial
FIGURE 10. Statistics of defect points by category.
Intelligent Manufacturing Innovation Competition. It is a
textile defect dataset created from images collected at a textile
workshop. The dataset contains a total of 4774 images, with
image sizes of 2446 px * 1000 px. There are 3819 images
in the training set and 955 images in the validation set.
It covers 20 important categories of fabric defects in the
textile industry, with a total of 6126 defect instances, and
provides detailed annotations. Each image contains one or
more defect instances, with some typical defects shown in
Fig 9.
model; the mean average precision (mAP) metric is used a 3.8% improvement in average mAP relative to the baseline
to evaluate the accuracy of the model, with its calculation YOLOv8s model, although the detection speed has decreased
formula as shown in (10). by 52.2%, and the GFLOPS has risen by 12.1. Given YOLO’s
X excellent performance in detection speed, the DA-YOLOv8s
PmA = PA /N (10) still achieves a detection speed of 257.38 FPS, which meets
In the formula, mAP, N represents the total number of the requirements for industrial applications.
categories, and P3 is the area enclosed by the curve formed by
TABLE 5. Comparison of detections for different models.
recall on the horizontal axis and precision on the vertical axis.
We use [email protected] and mAP as evaluation metrics, where
[email protected] is the mean Average Precision at an Intersection
over Union (IOU) threshold of 0.5, and mAP is the mean
Average Precision at IOU thresholds of 0.5, 0.3, and 0.1.
Simultaneously, we employ FPS (Frames Per Second) to
evaluate the detection speed of the model. FPS denotes the
number of image frames that can be processed and outputted
within a second. The calculation method is illustrated in (11).
In this equation, t1 represents the image preprocessing time,
t2 signifies the image inference time, and t3 indicates the post-
processing time.
1000 ms
FPS = ; (11)
t1 + t2 + t3
The evaluation metrics are shown in the Table 4:
E. EXPERIMENTAL RESULTS
1) COMPARATIVE DETECTION EXPERIMENTS WITH
DIFFERENT MODELS
To fully evaluate the detection algorithm of the improved
YOLOv8s model in this paper, ten algorithms were selected
for experimental comparison, including the Faster R-CNN,
Cascade R-CNN, and unimproved YOLOV3-tiny, YOLOv5s,
YOLOv6s, YOLOX, YOLOv8s, YOLOv9s, YOLOv10s, and
YOLOv11s. Meanwhile, a comparison was made with the
typical two-stage object detection algorithm, Faster-RCNN.
The mean Average Precision (mAP) after 100 iterations
was used as the evaluation criterion for different detection
algorithms, which can scientifically and reasonably assess FIGURE 12. Comparison chart of training accuracy across epochs.
the object detection capability and computational speed of
various detection algorithms. The results of the experimental The box_loss, cls_loss (Classification Loss), and dfl_loss
comparison are shown in Table 5. The mean precision after (Distributional Feature Loss) of the DA-YOLOv8s model
each training iteration is shown in Fig 12.The results indicate gradually decreased during the training process, as shown in
that, considering the balance between detection accuracy and Fig 13, indicating that the model’s performance in identifying
speed, the YOLOv8s, YOLOv9s, and YOLOv11s models the location, size, and category of textile defects is improving.
significantly outperform other models.Using the improved The consistency between training loss and validation loss,
DA-YOLOv8s model, which is superior to all the models it both showing a downward trend, suggests that the DA-
is compared with, there is a 4.2% increase in [email protected] and YOLOv8s model has good generalization capabilities.
2) ABLATION STUDY
In the Backbone, the DCNv2 module is used to replace all
original C2f modules of YOLOv8 for ablation experiments,
[email protected] and mAP have improved 2.3% and 2.6% respec- FIGURE 14. Comparison chart of training accuracy across different
tively, indicating that this modification effectively enhances epochs.
the feature extraction capability for small objects.At the same
time, the FPS has decreased to 351.92 FPS, and GFLOPS
has increased, but this does not affect its requirements for
industrial applications.
V. CONCLUSION AND FUTURE PROSPECTS [14] F. Xu, Y. Liu, B. Zi, and L. Zheng, ‘‘Application of deep learning for
A textile defect detection algorithm based on an improved defect detection of paint film,’’ in Proc. 6th Int. Conf. Intell. Comput.
Signal Process. (ICSP), Xi’an, China, Apr. 2021, pp. 1118–1121, doi:
YOLOv8 algorithm is proposed to address the issues of 10.1109/ICSP51882.2021.9408956.
low detection accuracy and poor real-time performance [15] B. Zhao, M.-R. Dai, P. Li, and X.-N. Ma, ‘‘Data mining in railway defect
of traditional methods. Experimental results show that image based on object detection technology,’’ in Proc. Int. Conf. Data
Mining Workshops (ICDMW), Beijing, China, Nov. 2019, pp. 814–819,
improving the C2f to DCNv2 in the YOLOv8s baseline doi: 10.1109/ICDMW.2019.00120.
network can enhance the feature extraction capability of the [16] Y. Zhang, Z. Zhang, K. Fu, and X. Luo, ‘‘Adaptive defect detection
network, incorporating the self-attention mechanism PSA for 3-D printed lattice structures based on improved faster R-CNN,’’
IEEE Trans. Instrum. Meas., vol. 71, 2022, Art. no. 5020509, doi:
can increase the feature fusion capability on the channel 10.1109/TIM.2022.3200362.
and spatial dimensions, and adding detection heads can [17] X. Gao, M. Jian, M. Hu, M. Tanniru, and S. Li, ‘‘Faster multi-defect
improve the detection ability for small targets. Through these detection system in shield tunnel using combination of FCN and faster
RCNN,’’ Adv. Struct. Eng., vol. 22, no. 13, pp. 2907–2921, May 2019.
innovations, compared to the basic model, DA-YOLOv8s has [18] B. Wei, K. Hao, X. Tang, and L. Ren, ‘‘Fabric defect detection based on
improved by 4.2% and 3.8% respectively in [email protected] and faster RCNN,’’ in Proc. Artif. Int. Fashion Textiles Conf., Shanghai, China,
mAP. The textile defect detection algorithm designed in Oct. 2018, pp. 45–51.
this paper has high real-time performance and accuracy, [19] M. An, S. Wang, L. Zheng, and X. Liu, ‘‘Fabric defect detection using deep
learning: An improved faster R-approach,’’ in Proc. Int. Conf. Comput. Vis.,
meeting the practical application scenarios of enterprises.The Image Deep Learn. (CVIDL), Chongqing, China, Jul. 2020, pp. 319–324,
detection results are presented in Fig 15. doi: 10.1109/CVIDL51233.2020.00-78.
Textile defect detection is mostly dominated by small [20] M. Chen, L. Yu, C. Zhi, R. Sun, S. Zhu, Z. Gao, Z. Ke, M. Zhu, and
Y. Zhang, ‘‘Improved faster R-CNN for fabric defect detection based on
objects, with the detection ability for small targets as a Gabor filter with genetic algorithm optimization,’’ Comput. Ind., vol. 134,
research focus for the next step. Jan. 2022, Art. no. 103551, doi: 10.1016/j.compind.2021.103551.
[21] Z. Liu, W. Wu, X. Gu, S. Li, L. Wang, and T. Zhang, ‘‘Application of
combining YOLO models and 3D GPR images in road detection and
REFERENCES maintenance,’’ Remote Sens., vol. 13, no. 6, p. 1081, Mar. 2021.
[1] L. Tong, W. K. Wong, and C. K. Kwong, ‘‘Fabric defect detection [22] X. Liao, S. Lv, D. Li, Y. Luo, Z. Zhu, and C. Jiang, ‘‘YOLOv4-MN3
for apparel industry: A nonlocal sparse representation approach,’’ IEEE for PCB surface defect detection,’’ Appl. Sci., vol. 11, no. 24, p. 11701,
Access, vol. 5, pp. 5947–5964, 2017. Dec. 2021, doi: 10.3390/app112411701.
[2] A. Rasheed, B. Zafar, A. Rasheed, N. Ali, M. Sajid, S. H. Dar, U. Habib, [23] S. Teng, Z. Liu, and X. Li, ‘‘Improved YOLOv3-based bridge surface
T. Shehryar, and M. T. Mahmood, ‘‘Fabric defect detection using computer defect detection by combining high-and low-resolution feature images,’’
vision techniques: A comprehensive review,’’ Math. Problems Eng., Buildings, vol. 12, no. 8, p. 1225, Aug. 2022.
vol. 2020, pp. 1–24, Nov. 2020. [24] Z. Cong, X. Li, and Z. Huang, ‘‘Research on brake pad surface defect
[3] W. Y. Zhang, J. Zhang, Y. Hou, and S. Geng, ‘‘MWGR: A new method detection method based on deep learning,’’ in Proc. Int. Conf. Adv. Electr.
for real-time detection of cord fabric defects,’’ in Proc. Int. Conf. Adv. Eng. Comput. Appl. (AEECA), Dalian, China, Aug. 2023, pp. 813–818,
Mech. Syst., Tokyo, Japan, Sep. 2012, pp. 458–461. doi: 10.1109/AEECA59734.2023.00149.
[4] D. Zhu, R. Pan, W. Gao, and J. Zhang, ‘‘Yarn-dyed fabric defect detection [25] X. Yue, Q. Wang, L. He, Y. Li, and D. Tang, ‘‘Research on tiny target
based on autocorrelation function and GLCM,’’ Autex Res. J., vol. 15, no. 3, detection technology of fabric defects based on improved YOLO,’’ Appl.
pp. 226–232, Sep. 2015. Sci., vol. 12, no. 13, p. 6823, Jul. 2022.
[5] M. Guan, Z. Zhong, and Y. Rui, ‘‘Automatic defect segmentation for [26] Y. Ji and L. Di, ‘‘Textile defect detection based on multi-proportion
plain woven fabric images,’’ in Proc. Int. Conf. Commun., Inf. Syst. spatial attention mechanism and channel memory feature fusion network,’’
Comput. Eng. (CISCE), Haikou, China, Jul. 2019, pp. 465–468, doi: IET Image Process., vol. 18, no. 2, pp. 412–427, Feb. 2024, doi:
10.1109/CISCE.2019.00108. 10.1049/ipr2.12957.
[6] G.-H. Hu, Q.-H. Wang, and G.-H. Zhang, ‘‘Unsupervised defect detection [27] R. Wang, R. Shivanna, D. Cheng, S. Jain, D. Lin, L. Hong, and E. Chi,
in textiles based on Fourier analysis and wavelet shrinkage,’’ Appl. Opt., ‘‘DCN v2: Improved deep & cross network and practical lessons for Web-
vol. 54, no. 10, p. 2963, Apr. 2015. scale learning to rank systems,’’ in Proc. Web Conf., New York, NY, USA,
[7] L. Yihong and Z. Xiaoyi, ‘‘Fabric defect detection with optimal Gabor Jun. 2021, pp. 1785–1797, doi: 10.1145/3442381.3450078.
wavelet based on radon,’’ in Proc. IEEE Int. Conf. Power, Intell. [28] H. Liu, F. Liu, X. Fan, and D. Huang, ‘‘Polarized self-attention: Towards
Comput. Syst. (ICPICS), Shenyang, China, Sep. 2020, pp. 788–793, doi: high-quality pixel-wise regression,’’ 2021, arXiv:2107.00782.
10.1109/ICPICS50287.2020.9202242. [29] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based
[8] J. Xiang, R. Pan, and W. Gao, ‘‘Yarn-dyed fabric defect detection based on learning applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11,
an improved autoencoder with Fourier convolution,’’ Textile Res. J., vol. 93, pp. 2278–2324, Nov. 1998.
nos. 5–6, pp. 1153–1165, Mar. 2023. [30] R. Wang, B. Fu, G. Fu, and M. Wang, ‘‘Deep & cross network for ad
[9] A. M. Kamoona, A. K. Gostar, A. Bab-Hadiashar, and R. Hosein- click predictions,’’ in Proc. ADKDD, New York, NY, USA, Aug. 2017,
nezhad, ‘‘Point pattern feature-based anomaly detection for manufacturing pp. 2278–2324, doi: 10.1109/5.726791.
defects, in the random finite set framework,’’ IEEE Access, vol. 9, [31] M.-H. Guo, T.-X. Xu, J. Liu, Z.-N. Liu, P.-T. Jiang, T. Mu, S. Zhang,
pp. 158672–158681, 2021, doi: 10.1109/ACCESS.2021.3130261. R. R. Martin, M. Cheng, and S. Hu, ‘‘Attention mechanisms in computer
[10] F. Alghanim, M. Azzeh, A. El-Hassan, and H. Qattous, ‘‘Software vision: A survey,’’ Comput. Vis. Media, vol. 8, no. 3, pp. 331–368,
defect density prediction using deep learning,’’ IEEE Access, vol. 10, Mar. 2022, doi: 10.1007/s41095-022-0271-y.
pp. 114629–114641, 2022, doi: 10.1109/ACCESS.2022.3217480. [32] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
[11] S. Mei, Y. Wang, and G. Wen, ‘‘Automatic fabric defect detection L. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ 2017,
with a multi-scale convolutional denoising autoencoder network model,’’ arXiv:1706.03762.
Sensors, vol. 18, no. 4, p. 1064, Apr. 2018. [33] C. Feng, Y. Zhong, Y. Gao, M. R. Scott, and W. Huang, ‘‘TOOD:
[12] J. Jing, H. Ma, and H. Zhang, ‘‘Automatic fabric defect detection using a Task-aligned one-stage object detection,’’ in Proc. IEEE/CVF Int. Conf.
deep convolutional neural network,’’ Coloration Technol., vol. 135, no. 3, Comput. Vis. (ICCV), Montreal, QC, Canada, Oct. 2021, pp. 3490–3499,
pp. 213–223, Mar. 2019. doi: 10.1109/ICCV48922.2021.00349.
[13] S. Ma, R. Zhang, Y. Dong, Y. H. Feng, and G. Zhang, ‘‘A defect [34] X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang,
detection algorithm of denim fabric based on cascading feature extraction ‘‘Generalized focal loss: Learning qualified and distributed bounding
architecture,’’ J. Inf. Process. Syst., vol. 19, no. 1, pp. 109–117, Feb. 2023, boxes for dense object detection,’’ in Adv. Neural Inf. Proc. Syst., vol. 33,
doi: 10.3745/JIPS.04.0265. Jan. 2020, pp. 21002–21012.
[35] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards JIAHUI ZHANG received the B.S. degree in
real-time object detection with region proposal networks,’’ IEEE Trans. computer science and technology from Hangzhou
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017, doi: Dianzi University, China, in 2021, and the M.S.
10.1109/TPAMI.2016.2577031. degree in computer science and engineering from
[36] Z. Cai and N. Vasconcelos, ‘‘Cascade R-CNN: Delving into high the University at Buffalo (SUNY), USA, in 2023.
quality object detection,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern He is currently working as a Teaching Assistant
Recognit., Salt Lake City, UT, USA, Jun. 2018, pp. 6154–6162. with Zhejiang Industry Polytechnic College. His
[37] J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improvement,’’
research interests include machine learning, deep
2018, arXiv:1804.02767.
learning, and computer vision.
[38] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, ‘‘YOLOX: Exceeding YOLO
series in 2021,’’ 2021, arXiv:2107.08430.
[39] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng,
W. Nie, Y. Li, B. Zhang, Y. Liang, L. Zhou, X. Xu, X. Chu, X. Wei,
and X. Wei, ‘‘YOLOv6: A single-stage object detection framework for
industrial applications,’’ 2022, arXiv:2209.02976.
[40] C.-Y. Wang, I.-H. Yeh, and H.-Y. Mark Liao, ‘‘YOLOv9: Learning what
you want to learn using programmable gradient information,’’ 2024,
arXiv:2402.13616.
[41] A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding,
‘‘YOLOv10: Real-time end-to-end object detection,’’ 2024,
arXiv:2405.14458.
[42] R. Khanam and M. Hussain, ‘‘YOLOv11: An overview of the key
architectural enhancements,’’ 2024, arXiv:2410.17725. MEILIAN ZHENG received the Ph.D. degree
[43] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, ‘‘Repvgg: from Zhejiang University, in June 2008. She is
Making VGG-style ConvNets great again,’’ in Proc. IEEE/CVF Conf. currently working as an Associate Professor with
Comput. Vis. Pattern Recognit. (CVPR), Nashville, TN, USA, Jun. 2021, the School of Management, Zhejiang University
pp. 13728–13737, doi: 10.1109/CVPR46437.2021.01352. of Technology, and the Vice Director of Zhe-
jiang Hithink RoyalFlush Artificial Intelligence
Research Institute. Her research interests include
organizational behavior, innovation management,
digital fusion, and data analytics.
WENFEI SONG received the B.S. and M.S.
degrees in information science from Beijing Nor-
mal University, in 2001 and 2004, respectively.
Since 2004, she has been working as a Fac-
ulty Member with the Department of Computer
Science, Zhejiang Industry Polytechnic College,
where she is currently working as an Associate
Professor. She has authored four textbooks and
published more than ten papers. Her research
interests include computer vision technology and
intelligent information processing.