0% found this document useful (0 votes)
25 views15 pages

Textile Defect Detection Algorithm Based on the Improved YOLOv8

This document presents a textile defect detection algorithm based on an improved version of YOLOv8, named DA-YOLOv8s, which enhances feature extraction and fusion capabilities through the incorporation of a Deep & Cross Network and a Polarized Self-Attention mechanism. Experimental results indicate that the improved algorithm achieves significant performance improvements, with [email protected] and mAP metrics reaching 44.6% and 48.6%, respectively, while maintaining a detection speed of 257.38 FPS. The study highlights the importance of automation in textile quality management and proposes solutions to overcome challenges in detecting small and low-contrast defects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views15 pages

Textile Defect Detection Algorithm Based on the Improved YOLOv8

This document presents a textile defect detection algorithm based on an improved version of YOLOv8, named DA-YOLOv8s, which enhances feature extraction and fusion capabilities through the incorporation of a Deep & Cross Network and a Polarized Self-Attention mechanism. Experimental results indicate that the improved algorithm achieves significant performance improvements, with [email protected] and mAP metrics reaching 44.6% and 48.6%, respectively, while maintaining a detection speed of 257.38 FPS. The study highlights the importance of automation in textile quality management and proposes solutions to overcome challenges in detecting small and low-contrast defects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Received 25 December 2024, accepted 4 January 2025, date of publication 13 January 2025, date of current version 21 January 2025.

Digital Object Identifier 10.1109/ACCESS.2025.3528771

Textile Defect Detection Algorithm Based on the


Improved YOLOv8
WENFEI SONG 1, DU LANG 1, JIAHUI ZHANG 1, MEILIAN ZHENG2 , AND XIAOMING LI 3
1 School of Information and Design, Zhejiang Industry Polytechnic College, Shaoxing, Zhejiang 312000, China
2 School of Management, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
3 School of International Business, Zhejiang Yuexiu University, Shaoxing, Zhejiang 312000, China

Corresponding author: Meilian Zheng ([email protected])


This work was supported in part by the National Science Foundation of China under Grant 62272311 and Grant 62102262,
and in part by the 2020 Visiting Scholar Program of Zhejiang Provincial Department of Education.

ABSTRACT Automatic detection of textile defects is a crucial factor in improving textile quality. Fast
and accurate detection of these defects is key to achieving automation in the textile industry. However,
the detection of textile defects faces challenges such as small defect targets, low contrast between defects
and the background, and significant variations in the aspect ratio of defects. To address these issues, this
study proposes a new method for textile defect detection based on an improved version of You Only Look
Once Version 8(YOLOv8) called DA-YOLOv8s. Deep & Cross Network(DCNv2) is introduced into the
Backbone Network to replace the C2F module, enhancing the extraction of network features; an self-attention
mechanism, Polarized Self-Attention(PSA), is adopted to increase feature fusion capability and reduce
feature loss in both channel and spatial dimensions; finally, a Small Object Detection Head (SOHead) is
added to improve the feature extraction ability for small targets. Experimental results show that the improved
YOLOv8 algorithm achieves has achieved [email protected] and mAP of 44.6% and 48.6% respectively, which is
an improvement of 4.2% and 3.8% over the original algorithm, and also outperforms the Optimal YOLOv9s
model and the latest YOLOv11s model in these two metrics. The speed of textile defect detection has
reached 257.38 frames per second (FPS) and the floating-point operation speed is 36.6 GFLOPS, ensuring
the accuracy and speed of textile defect detection, with practical engineering application value.

INDEX TERMS Interest point detection, textile industry, quality management, YOLOv8, textile defect
detection, polarized self-attention, deep & cross network.

I. INTRODUCTION through automated means has become a research hotspot and


In the transformation and upgrading of the textile industry, difficulty in the textile industry.
the automated detection of defects and minor damages in With the development of visual technology, defect detec-
textiles is an important field of industrial upgrading, as well tion based on computer vision technology has become an
as a significant factor in improving the quality of textiles. important field in the research of automatic fabric defect
Traditional defect detection in textiles relies on manual detection [2]. The most commonly used computer vision
visual inspection, that is, through the naked eye observation methods include statistical feature methods, spectral analysis
of inspectors, which is very limited in effectiveness and methods, model-based detection methods, and deep learning
has a high rate of false positives and false negatives. methods.
According to statistics, manual inspection can detect about The principle of the statistical feature method is to study
40%-60% defects [1]. Therefore, how to detect textile defects the pixel grayscale values in the texture of textile images,
compare the differences in statistical features of different
regions, and thus determine whether a region is a defect
The associate editor coordinating the review of this manuscript and point area. Zhang et al. [3] proposed a multi-window
approving it for publication was Turgay Celik . grayscale ratio (MWGR) method for detecting defects in
2025 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 13, 2025 For more information, see https://creativecommons.org/licenses/by/4.0/ 11217
W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

curtain cloth, which divides the image into several regions based on convolutional neural networks. This method decom-
and analyzes the changes in grayscale ratio to determine poses textile images into multiple local patches and labels
the defects in the window of textiles. Zhu et al. [4] used them, then transmits them to a pre-trained deep CNN for
the autocorrelation function to determine the pattern cycle learning, and uses the trained model to detect each patch,
of colored fabrics to determine the size of the detection thereby obtaining the category and position of each defect.
window, then calculated the Gray Level Co-occurrence Ma et al. [13] used an improved parameter VGG16 model
Matrix (GLCMs) to characterize the original image, and to train a classifier for detecting and recognizing defects in
computed the Euclidean distance of GLCMs between the denim fabrics and constructed a defect detection algorithm
images to be detected and the defect-free template image to based on a cascading architecture by merging the two models.
achieve defect detection. Guan et al. [5] first highlighted the Deep learning methods lack in speed and accuracy.
defect areas using image enhancement techniques, then used In recent years, with the further development of computer
the first-order derivative for edge detection while employing vision technology, research on object detection algorithms
the Roberts operator to detect the edges of the defect areas to has largely focused on those based on candidate regions and
improve detection accuracy. regression-based deep convolutional neural networks. The
Spectral analysis methods treat images as two-dimensional Faster R-CNN algorithm is a representative of the candidate
signals with amplitude variations and perform frequency region-based object detection algorithms, demonstrating
domain analysis through certain transformation algorithms, excellent performance in the field of object detection [14],
commonly including Fourier transform, Wavelet transform, [15], [16], [17]. Wei et al. [18] proposed a Faster R-CNN
and Gabor filter transform. Hu et al. [6] proposed an model based on an improved VGG structure, which adapts
unsupervised method based on the combination of the to the characteristics of fabric defect images by reducing
discrete Fourier transform (DFT) and the discrete wavelet the number of anchor points in the Faster R-CNN. The
transform (DWT), which performs wavelet shrinkage denois- VGG16 was modified to include 13 convolutional layers
ing on the residual image after Fourier recovery, and applies (with the activation function being ReLu) and four pooling
inverse transformation to the approximation coefficients and layers to extract feature maps. Additionally, the Region
processed wavelet coefficients separately to achieve defect Proposal Network (RPN) and the ROI pooling layer were
information segmentation using simple thresholding. Li and improved to enhance the model. An et al. [19] improved
Zhou [7] proposed a Defect Direction Projection Algorithm the Faster R-CNN network for textile defect detection by
(DDPA) based on fabric defects characteristics, which filters using deep residual networks instead of traditional VGG-
the input image using Gabor filters, performs a Radon 16 for feature extraction, and incorporating methods such as
transform projection after using hard threshold segmentation, adding feature pyramid modules and increasing the number of
and selects the optimal Gabor filter channel,that is, the anchor boxes. Chen et al. [20] designed a Genetic Algorithm
channel with the maximum defect value, to detect defects. Gabor Faster R-CNN (Faster GG R-CNN) model, which
Xiang et al. [8] proposed a defect detection algorithm based embeds Gabor kernels into Faster R-CNN and employs
on Fourier convolution, which generates image pairs for a two-stage training method based on Genetic Algorithm
training using random masking in the training phase and (GA) and backpropagation for textile defect detection. The
incorporates a Fourier convolution layer into the autoencoder training of the Faster R-CNN algorithm is a two-stage
to achieve automatic detection of dyed fabric defects. object detection algorithm, with the first stage completing
The defect detection algorithms based on traditional region box proposals and the second stage conducting object
computer vision have high computational requirements and recognition within the region boxes, which affects the speed
need to be improved in terms of detection speed and accuracy. and accuracy of object detection.
Deep learning is a new framework in computer vision Regression-based object detection algorithms directly
research, which has been widely applied in the field of regress the bounding box coordinates and object categories
defect detection with the rapid development of big data and at multiple positions in the input image, addressing the
artificial intelligence technologies [9], [10]. Deep learning coexistence of accuracy and speed issues; YOLO is a typical
can automatically extract features, optimize and iterate representative of such algorithms. The diversity of YOLO’s
parameters, thus achieving the function of detecting defects applications in industrial defect detection also verifies the
in textile images. Mei et al. [11] designed a Multi-Scale effectiveness of the algorithm [21], [22], [23], [24]. Yue et al.
Convolutional Denoising Autoencoder Network (MSCDAE) [25] proposed an improved YOLOv4 textile defect detection
that achieved unsupervised detection of textile defects. The algorithm, which, based on the expansion of the dataset
algorithm trained the Convolutional AutoEncoder (CAE) using combined data augmentation methods, improved the
with positive samples, enabling it to extract fabric features head prediction layer and integrated the Convolutional Block
and reconstruct fabric images. Detection is realized by iden- Attention Module (CBAM) to achieve accurate classification
tifying defects based on the difference in features between and localization of tiny target defects. Jin and Di [26] also
defective images and normal fabric images. Jing et al. [12] made improvements to the YOLOv5 network, introducing
proposed an automatic detection method for fabric defects spatial and channel attention models into the backbone

11218 VOLUME 13, 2025


W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

network and designing a multi-task learning strategy with YOLOv8s. We take the currently popular object detection
two detection heads for detecting common defects and algorithm YOLOv8 as the baseline model and improve
identifying specific defects to improve the accuracy of the traditional convolutional neural network in the C2F
defect recognition. These methods have weaker detection module of YOLOv8’s backbone network using the Deep and
capabilities for irregularly sized defects, and the accuracy can Cross Network. Additionally, we introduce the PSA attention
also be further enhanced. mechanism to enhance YOLOv8’s neck network. At the
Section I in our paper introduces the background of same time, we incorporate a small object detection head in
textile defect detection and the development of detection YOLOv8’s head module to improve feature extraction and
methods, summarizes the characteristics of these methods, fusion capabilities, thereby enhancing the accuracy of textile
and proposes improvements and innovations to the YOLOv8s defect detection. The main contributions of our paper are as
benchmark model.Section II describes the principles of follows:
convolutional neural networks, the feasibility of incorporat- (1) We incorporate DCNv2 (Deep & Cross Network) [27]
ing attention mechanisms into neural networks, and intro- into the Backbone to replace the C2f module, enhancing the
duces the basic network structure of YOLOv8. Section III model’s feature extraction capability.
specifically proposes the construction of DA-YOLOV8s (2) We introduce a new feature fusion algorithm, PSA
model, including the introduction of DCNv2 to enhance the (Polarized Self-Attention) [28], which uses an extreme
feature extraction capability of the backbone network, the self-attention mechanism to reduce feature loss in channel
introduction of PSA to improve the feature fusion capability and spatial dimensions.
of the network neck, and the addition of SOHead detection (3) We add a detection head for small objects SOHead to
head to enhance the detection capability of small targets. prevent the loss of small object features, thereby improving
Section IV introduces the dataset and evaluation metrics, detection performance.
conducts comparative experiments and ablation study, and Experiments show that using YOLOv8s as the base net-
analyzes the experimental results. Section V summarizes the work model achieves textile defect detection and recognition,
work achievements of this paper and looks forward to future with improved DA-YOLOv8s increasing [email protected] and
research.The structure of the paper content is shown in Fig 1. [email protected],0.3,0.1 by 4.2% and 3.8%, reaching 0.446% and
48.6%, but there is still room for improvement in terms of
model accuracy.

II. RELATED WORK


In order to enhance the feature extraction capabilities of
computer images and improve detection accuracy, extensive
research has been conducted. In this section, we introduce
the relevant research work and, in combination with various
studies, propose improved methods. The methods and their
performance are shown in the Table 1.

TABLE 1. Related studies and improvement methods.

A. CONVOLUTIONAL NEURAL NETWORKS


Convolutional Neural Networks (CNNs), as one of the most
significant algorithms in computer vision, were successfully
applied in the LeNet network structure in 1998 for recog-
nizing handwritten digits [29], demonstrating the powerful
capabilities of CNNs in image processing. Since then,
FIGURE 1. Paper content structure diagram. CNNs have been widely used in various fields such as
object detection and classification algorithms. A CNN is
YOLOv8 is an open-source object detection model a feedforward neural network that includes convolutional
released in January 2023. Our research focuses on improving computations and deep architectures, with its core concept
the accuracy of textile defect detection using deep neural being the extraction of features through operations such as
network technology, and we propose a method called DA- convolution and pooling. The components of a CNN include:

VOLUME 13, 2025 11219


W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

Input Layer: This layer standardizes the input image data brates the weight of each channel, which can be considered
and performs data augmentation operations to ensure that the an object selection process, thus determining what to pay
input data meets the requirements of the convolutional layer. attention to [31]. Spatial attention can be regarded as
Convolutional Layer: This layer performs convolutional an adaptive spatial region selection mechanism: where to
operations and feature extraction on the input image. pay attention. It emphasizes or suppresses information at
Each element of the convolutional kernel corresponds to these locations by assigning different weights to each pixel
a weight coefficient and a bias vector. When operating, or pixel block. Self-attention mechanisms, on the other
the convolutional kernel systematically sweeps across the hand, automatically learn the associations between different
input features, performing matrix element multiplication and positions, thereby capturing richer contextual information
summation within the receptive field, and adding the bias [32].
term. Attention mechanisms have achieved significant progress
Activation Function: Its primary role is to apply a in computer vision applications and are often used in
non-linear mapping to the output of the convolutional layer, conjunction with classic object detection models. Therefore,
enabling the network to have better learning capabilities. based on the characteristics of textile defects, we intro-
Commonly used activation functions include Sigmoid, Tanh, duce the Polarized Self-Attention(PSA) mechanism, which
ReLU, etc. combines channel, spatial, and self-attention mechanisms,
Pooling Layer: After feature extraction by the convo- to improve the network structure of YOLOv8 and enhance
lutional layer, the output feature maps are passed to the algorithm performance. The specific principles are elaborated
pooling layer for feature selection and information filtering. in Section III, Part C.
The pooling layer contains pre-defined pooling functions,
which replace the result of a single point in the feature map
C. THE NETWORK STRUCTURE OF YOLOv8
with a statistical measure of its neighboring area. Common
The network structure of YOLOv8 can be divided into
pooling functions include average pooling and max pooling
four parts: the input end, the backbone network, the neck
strategies.
network, and the head module, as shown in Fig 2. The input
Fully Connected Layer: While the convolutional neural
end mainly includes Mosaic data augmentation, adaptive
network is capable of feature extraction from input data, the
anchor box calculation, and adaptive grayscale padding. The
role of the fully connected layer is to perform non-linear
Backbone network contains structures such as Conv, C2f, and
combinations of the features extracted by the convolutional
SPPF, among which the C2f module is the primary module
and pooling layers to produce the output.
for learning residual features, with branch connections
Output Layer: For image classification problems, the out-
across layers that enrich the gradient flow of the model
put layer uses a logistic function or a normalized exponential
and form a neural network module with stronger feature
function (softmax function) to output classification labels.
representation capabilities. The Neck network adopts the
In object detection problems, the output layer can be designed
PAN (Path Aggregation Network) structure, which enhances
to output the center coordinates, size, and classification of the
the network’s ability to fuse features of objects at different
object.
scaling scales. The Head module is the output end, which
Traditional CNNs extract features in the form of lin-
decouples the classification and regression processes; the loss
ear models, which have limited extraction capabilities.
calculation process mainly includes positive and negative
In contrast, the Cross Network can achieve multi-layer
sample assignment strategies and loss computation, with the
feature interactions, with each layer producing higher-order
assignment strategy using the dynamic assignment method of
interactions based on existing ones, and retaining interactions
Task Aligned Assigner [33], which selects positive samples
from previous layers. The cross network can be trained
based on the weighted results of classification and regression
jointly with a deep neural network [30]. Here, we introduce
scores. The loss calculation for the classification branch
the DCNv2 model to improve the C2f in YOLOv8, which
adopts the BCE Loss method, while the regression branch
includes the classic CNN, as discussed in Section III, Part B.
uses the distribution focal loss [34] and the CIOU (complete
intersection over union) loss function algorithm. The network
B. ATTENTION MECHANISM structure is shown in the Fig 2.
The Attention Mechanism is an approach that emulates
the human visual and cognitive systems. By incorporating III. CONSTRUCTION OF DA-YOLOv8s MODEL
attention mechanisms into neural networks, the networks Our study adopts YOLOv8s as the baseline model, with the
can automatically learn to selectively focus on important aim of enhancing the precision of textile defect detection
information within the input data, thereby enhancing the while also meeting the speed requirements for industrial-
model’s performance and generalization capabilities. The grade applications. To achieve these objectives, we propose
most typical attention mechanisms in the field of computer an algorithm called DA-YOLOv8s. This section will provide
vision include channel attention, spatial attention, and self- a detailed introduction to the model architecture of DA-
attention mechanisms. Channel attention adaptively recali- YOLOv8s and its innovative points of improvement.

11220 VOLUME 13, 2025


W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

FIGURE 2. YOLOv8 network structure diagram. Here w represents the width of the convolutional kernel, r represents the scale
factor.

A. STRUCTURE OF DA-YOLOv8s NETWORK MODEL on an improved YOLOv8s. By incorporating DCNv2


To address issues such as small defect targets, low contrast into the Backbone network to replace the C2f module,
between defects and background, and variable aspect ratios adopting the Polarized Self-Attention (PSA) mechanism to
of defects in textile defect detection, this study proposes a improve the Neck network, and introducing the SOHead
new textile defect detection method, DA-YOLOv8, based detection head for small objects, the improved DA-YOLOv8s

VOLUME 13, 2025 11221


W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

FIGURE 3. Improved DA-YOLOv8 network structure diagram.

network structure is shown in Fig 3. The parts marked with the intersection of sparse and dense features in images but
an asterisk ‘‘*’’ and in bold italic are the modified or newly also enhances the model’s perception and learning ability for
added modules. defect details.
Embedding Layer: Classifies the input features into a
B. INTRODUCTION OF DCNv2 TO ENHANCE FEATURE combination of sparse and dense features. Transforms the
EXTRACTION ABILITY OF BACKBONE NETWORK sparse features into embedding vectors and normalizes the
In conventional image convolution operations, the ability dense features, with its output being the concatenation of
to extract targets of various shapes is limited. Therefore, all embedding vectors and normalized  dense features: x0 =
to enhance the feature extraction capability of the YOLOv8 xembed,1 ; · · · ; xembed,n ; xdense .
backbone network for target features, this paper designs Cross networks and deep networks: Cross networks are
the DCNv2 network structure to replace four C2f modules characterized by the features of the th layer being operated
in the backbone. The DCNv2 integrates a deep cross with the learned weight matrix and bias vector, then combined
network structure, modeling explicit feature interactions with the first-order original features in the base layer to
through multiple cross-layer networks, and combines deep produce the features of the next layer. The operation rule for
networks to model implicit feature interactions, thereby a single layer is as shown in Fig 5.
achieving automated feature cross-encoding and improving Deep networks, on the other hand, take the features of a
the efficiency of high-order feature extraction. We designed certain layer, operate them with the weight matrix and bias
stacked and parallel structures, and here we adopt the parallel vector, then activate them with the ReLU function to serve as
structure, passing the input features through the cross network the feature input for the next layer, following the operation
layer and the deep network layer, and finally connecting rules as shown in (1).
them. The network structure of DCNv2 and DCNv2_Block
is shown in Fig 4. This not only enables efficient learning of hl+1 = f (Wl hl + bl ) (1)

11222 VOLUME 13, 2025


W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

FIGURE 4. The network structure of DCNv2. The left figure details the specific algorithm of DCNv2_Block.

where b
yi is the prediction; yi is the true label; N is the total
number of inputs; λ is the L2 regularization parameter.

C. INTRODUCING PSA TO ENHANCE FEATURE FUSION


ABILITY OF THE NECK NETWORK
The Neck network in YOLOv8, through a series of convolu-
tional layers and upsampling layers, fuses low-level feature
maps with high-level feature maps, thereby enhancing the
FIGURE 5. Single-layer operation rule for cross networks. accuracy of object detection. However, to more effectively
capture the details of objects and the feature information
within small target regions, with a focus on feature extraction
Deep and cross combination: The combination of cross at the channel and specific spatial levels. In response to
networks and deep networks results in two structures, namely this issue, Polarized Self-Attention (PSA) is introduced to
the stacked structure and the parallel structure. The stacked enhance the representation fusion capability of the entire
structure feeds the output of the cross network into the network.
deep network as its input. The parallel structure, however, Polarized self-attention mechanisms are used to address
processes the two networks in parallel and finally combines pixel-level regression tasks, maintaining relatively high
their outputs with a single output layer. In practice, which resolution in both channel and spatial dimensions (retaining
architecture performs better depends on the data. C/2 dimensions in the channel and [H, W] dimensions
The formula for the predictive function is as shown in (2): in the spatial dimension), which can reduce information
  loss caused by dimensionality reduction; it also synthesizes
byi = σ Wlogit
T
xfinal (2) nonlinear functions that directly correspond to the typical
fine-grained regression output distribution, making the fit-
where Wlogit is the weight vector for the logit, σ (x) = ted output more refined and closer to the actual output.
1/ (1 + exp (−x)) ). For the final loss, the logarithmic loss This structure includes self-attention mechanisms in both
function Log Loss is used, as shown in (3). channel and spatial dimensions and fuses the results of
these two dimensions to obtain the polarized self-attention
loss output. There are two structures of the polarized attention
N mechanism, namely the parallel structure and the sequential
1 X X 2 structure, as shown in the left figure in Fig 6 and Fig 7
=− yi log ( b yi ) + λ
yi ) + (1 − yi ) log (1 −b Wl 2
N respectively. In this project, the parallel structure is integrated
i=1 l
(3) into the YOLOv8 network. Within the Neck network, the
PSA module receives the output from the C2F module and

VOLUME 13, 2025 11223


W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

FIGURE 6. Polarized self-attention module with a parallel structure in YOLOV8.

processes it in parallel using channel-wise self-attention and where Wq , Wv and Wz are the convolutional layers 1 × 1, σ1
spatial self-attention, performing convolutions, reshaping, and σ2 are two tensor reshaping operators, FSM (·) is the
and applying the Sigmoid function, among other operations. SoftMax operator, ‘‘×’’ is the matrix dot product operation
The results of these computations are then combined and as (5). The number of internal channels between Wv | Wq
output to the detection head and Conv modules. The enhanced and Wz is C/2, and the output of channel self-attention is
structure of the neck network in YOLOv8 is illustrated in Zch = Ach (X) ⊙ch X ∈ RC×H×W , where ch is the channel
Fig 6. The PSA computation process is as follows: multiplication operator.
Channel Dimension Self-Attention Ach (X) ∈ RC×1×1 :
First, the input features X are transformed into Wq and Wv
Np
using the convolution of 1 × 1, where the channels of Wq X exj
are fully compressed, while the channel dimension of Wv FSM (X ) = xj (5)
Np
remains at a relatively high level (C/2). Because the channel j=1 P
exm
dimension of Wq is compressed, information enhancement is m=1
required through HDR, so the information of Wq is enhanced
using Softmax. Then, Wq and Wv are subjected to matrix Spatial Dimension Self-Attention Asp (X) ∈ RC×H×W :
multiplication, followed by a 1 × 1 convolution and LN to First, the input features were transformed into Wq and Wv
increase the dimension on the channels to C. Finally, the using convolution of 1 × 1. For Wq features, Global Pooling
Sigmoid function is used to keep all parameters between 0-1. was applied to compress the spatial dimension, transforming
The operation is as shown in (4): it into the size of 1 × 1; while the spatial dimension of Wv
features was maintained at a relatively large level (H × W).
Since the spatial dimension of Wq was compressed, Softmax
i
Ach (X ) = FSG Wz|θ1 σ1 (Wv (X )) × FSM σ2 Wq (X ) was used to enhance the information of Wq. Finally, matrix

multiplication was performed on Wq and Wv, followed
(4) by reshape and Sigmoid to ensure all parameters remained

11224 VOLUME 13, 2025


W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

forms a PSA parallel structure, as shown in (8).

PSAP (X ) = Z ch + Z sp
b

= Ach (X ) ⊙ch X + Asp (X ) ⊙sp X (8)


b b

The sequential combination of channel and spatial


self-attention forms a PSA series structure, as shown in (9).
 
PSAP (X ) = Z sp Z ch
 
= Asp Ach (X ) ⊙ch X ⊙sp Ach (X ) ⊙ch X (9)

D. INCREASE SOHead DETECTION HEAD TO IMPROVE


THE ABILITY OF SMALL TARGET DETECTION
The classic YOLOv8 has three detection heads, which can
detect targets at multiple scales, with detection sizes of:
The corresponding detection feature map size for P3/8 is
80 × 80, used for detecting targets larger than 8 × 8.
The corresponding detection feature map size for P4/16 is
40 × 40, used for detecting targets larger than 16 × 16.
The corresponding detection feature map size for P5/32 is
20 × 20, used for detecting targets larger than 32 × 32.
However, there may be missed detections or poor detection
ability for tiny targets, hence adding a detection head for
small objects, maintaining the extraction of small target
features through downsampling.

FIGURE 7. Polarized self-attention module in serial structure.

between 0-1. The operation is as shown in (6).

Asp (X )
= FSG σ3 FSM σ1 FGP Wq (X ) × σ2 (Wv (X )) (6)
  

In the formula, Wq and Wv are standard 1×1 convolutional


layers, σ1 ,σ2 and σ3 are three tensor reshaping operators,
FSM (·) is the SoftMax operator. FGP (·) is a global pooling
operator as (7). ‘‘×’’ denotes the dot product operation
of matrices. The output of spatial self-attention is Zsp =
Asp (X) ⊙sp X ∈ RC×H×W , where sp is the spatial multipli- FIGURE 8. Modules added for small target detection.
cation operator.
H X
W
The newly added detection feature map for 160 × 160 is
1 X used for detecting targets larger than 4 × 4.Upon the addition
FGP (X ) = X (:, i, j) (7)
H ×W of this detection head, the Neck network necessitates the
i=1 j=1
augmentation of a corresponding detection module. In this
Combination of Channel and Spatial Self-Attention: The context, we incorporate the Unsample, Concat, C2f, and
parallel combination of channel and spatial self-attention PSA modules in sequential order to perform computations.

VOLUME 13, 2025 11225


W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

The outcomes of these operations are then fed into a Conv


module for continued feature extraction, and concurrently
into a Detect module to finalize the object detection process.
The network architecture is depicted in Fig 8.

IV. RESULTS ANALYSIS


Our study used the Tianchi 2019 Guangdong Industrial
Intelligent Manufacturing Innovation Competition dataset to
conduct comparative experiments with DA-YOLOv8s and
ten other models, including a baseline model. Ablation
experiments were also performed. The results indicated that
the DA-YOLOv8s model showed a significant improvement
in mAP and achieved a balance between accuracy and speed.

A. DATASET INTRODUCTION
The dataset is from the Tianchi 2019 Guangdong Industrial
FIGURE 10. Statistics of defect points by category.
Intelligent Manufacturing Innovation Competition. It is a
textile defect dataset created from images collected at a textile
workshop. The dataset contains a total of 4774 images, with
image sizes of 2446 px * 1000 px. There are 3819 images
in the training set and 955 images in the validation set.
It covers 20 important categories of fabric defects in the
textile industry, with a total of 6126 defect instances, and
provides detailed annotations. Each image contains one or
more defect instances, with some typical defects shown in
Fig 9.

FIGURE 11. Scatter plot of defect sizes.

The server system is configured with Ubuntu 20.04 version,


CUDA Toolkit 12.0 version, and the deep learning framework
platform is Pytorch 2.0.1 version.
FIGURE 9. Visual comparisons of original models.
TABLE 2. Experimental environment configuration.

The dataset has an imbalanced distribution of categories,


with the ‘knots’ defect category exceeding 1200 instances,
while the ‘centiped’ category has only 79 instances. The
statistics of each defect category are shown in Fig 10; we
have normalized the standard size of defect points, and the
distribution of the length and width of defect points is shown
in Fig 11. It can be observed from the figure that there is a In the experiment, parameters are set according to the
significant size variation among defects in the dataset, and the default hyperparameters of the model, batch size (batch size)
detection target is relatively small compared to the original is 16, the number of training periods (epochs) is 100, and
image. The proportion of defect point widths to the width of the initial learning rate (learning rate) is 0.0001, as shown in
the original image is within 0.1, accounting for 86.0%. Table 3.

B. EXPERIMENTAL ENVIRONMENT C. INTRODUCTION TO COMPARATIVE ALGORITHMS


The experimental environment for this paper is set up on a To verify the effectiveness of the improved model, we com-
server with the following hardware configuration as Table 2: pared it with the classical object detection algorithms
Jensen RTX 4090 24GB graphics card, 64 Gb memory. Faster R-CNN [35], Cascade R-CNN [36], YOLOv3 [37],

11226 VOLUME 13, 2025


W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

TABLE 3. Parameter configuration. also improves detection accuracy. YOLOX introduces an


anchor-free design concept, achieving anchor-free detection
by directly predicting the center point coordinates and width
and height information of objects, which not only enhances
detection flexibility but also effectively avoids performance
bottlenecks caused by anchor boxes. Moreover, YOLOX
employs an advanced label assignment strategy, taking into
YOLOv5, YOLOX [38], YOLOv6 [39], YOLOv9 [40], account factors such as the size, position, and shape of
YOLOv10 [41] and YOLOv11 [42], under the experimental objects, making label assignment more rational and accurate.
environment consistent with Table 3, using the default YOLOv6 algorithm model: YOLOv6 is an object detection
parameters of the model. framework developed by Meituan’s Visual Intelligence
Faster R-CNN Algorithm Model: Faster R-CNN is a Department and released in 2022. YOLOv6 has made
classical object detection algorithm proposed by Ross numerous improvements to network structures such as
Girshick in 2016. Faster R-CNN uses VGG16 as the Backbone, Neck, and Head. In this paper, YOLOv6 also
network’s backbone, first extracting features with a set of uses the YOLOv6s version as the comparison algorithm.
basic conv+relu+pooling layers, then passing through the The Backbone network in YOLOv6s is inspired by the
RPN (Region Proposal Networks) to generate candidate RepVGG [43] Style structure, composed of RepBlock in the
boxes based on the Anchor mechanism. Finally, feature training phase, and in the inference phase, each RepBlock is
extraction, candidate box selection, bounding box regression, converted into a 3*3 convolutional layer stack (represented
and classification are integrated into one network, thus as RepConv) through the ReLU activation function, which
providing higher detection accuracy and efficiency. can reduce inference latency while enhancing representation
The Cascade R-CNN Algorithm Model: This represents an capabilities. The Neck network layer replaces the CSPBlock
enhancement over the Faster R-CNN, proposing a cascaded used in YOLOv5 with RepBlock, i.e., Rep-PAN, and
R-CNN detection and inference architecture composed of a correspondingly adjusts width and depth; the Head network
sequence of detectors trained with increasing IoU thresholds. adopts a hybrid-channel strategy to construct a more efficient
This cascaded sampling progressively enhances the quality of decoupled head, which further reduces computational cost.
detection. YOLOv9 introduced the concept of Programmable Gra-
The YOLOv3 Algorithm Model: Introduced in 2018, the dient Information (PGI) to address the diverse changes
YOLO detection model utilizes a Feature Pyramid Network required by deep networks to meet various objectives. Fur-
(FPN) for feature fusion, incorporating residual connection thermore, YOLOv9 has developed a new lightweight network
modules to enable multi-scale training. It outputs feature architecture, the Generalized Efficient Layer Aggregation
maps at three scales, and in this context, we adopt the Network (GELAN), which employs gradient path planning
YOLOv3-tiny as the baseline model. to significantly improve detection performance.
YOLOv5 algorithm model: YOLOv5 is an object detection YOLOv10 has introduced a consistent dual assignment
model released by Ultralytics in June 2020, which excels strategy with dual label assignments and a consistent
in inference speed. This paper adopts the YOLOv5s model matching metric to address the issue of redundant predictions
version, which has a smaller number of parameters and in post-processing without the need for Non-Maximum
is suitable for lightweight resource devices or scenarios Suppression (NMS). It has proposed a lightweight classifi-
requiring fast inference. The YOLOv5 network structure cation head, spatial-channel decoupled downsampling, and
consists of an input end (Input), a Backbone network, a Neck rank-guided block design to reduce explicit computational
network, and a detection end (Head). The input end uses data redundancy and achieve a more efficient architecture.
augmentation techniques and adaptive anchor calculation The YOLOv11 Algorithm Model: This is the latest
methods to enrich the dataset and reduce the occupation of iteration of the YOLO model, which builds upon YOLOv8 by
GPU resources; the Backbone network utilizes the Focus refining the network architecture. It replaces the C2f module
and CSPDarknet53 structures to optimize the classifier, with the C3K2 module and adds an attention mechanism
enhancing the diversity and robustness of features; the Neck C2PSA module following the SPPF layer. Additionally,
network adopts the SPP module and FPN+PAN structure, it improves the structure of the detection head. We have
which strengthens semantic and positional dual information, also introduced this model as a comparative model in our
fuses feature information extracted by the Backbone network, experimental evaluation.
and further improves the model’s performance and accuracy.
YOLOX algorithm model: YOLOX is an object detection
algorithm proposed by Megvii Technology in 2021. It is D. EVALUATION METRICS
improved based on YOLOv3-SPP, adopting the design To validate the effectiveness and execution time of the model,
idea of decoupled heads, with classification prediction and this paper uses GFLOPS (Giga Floating-point Operations Per
position prediction handled by two separate networks. This Second, representing 1 billion floating-point operations per
approach not only reduces information redundancy but second) as a measure of the execution time of the network

VOLUME 13, 2025 11227


W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

model; the mean average precision (mAP) metric is used a 3.8% improvement in average mAP relative to the baseline
to evaluate the accuracy of the model, with its calculation YOLOv8s model, although the detection speed has decreased
formula as shown in (10). by 52.2%, and the GFLOPS has risen by 12.1. Given YOLO’s
X excellent performance in detection speed, the DA-YOLOv8s
PmA = PA /N (10) still achieves a detection speed of 257.38 FPS, which meets
In the formula, mAP, N represents the total number of the requirements for industrial applications.
categories, and P3 is the area enclosed by the curve formed by
TABLE 5. Comparison of detections for different models.
recall on the horizontal axis and precision on the vertical axis.
We use [email protected] and mAP as evaluation metrics, where
[email protected] is the mean Average Precision at an Intersection
over Union (IOU) threshold of 0.5, and mAP is the mean
Average Precision at IOU thresholds of 0.5, 0.3, and 0.1.
Simultaneously, we employ FPS (Frames Per Second) to
evaluate the detection speed of the model. FPS denotes the
number of image frames that can be processed and outputted
within a second. The calculation method is illustrated in (11).
In this equation, t1 represents the image preprocessing time,
t2 signifies the image inference time, and t3 indicates the post-
processing time.
1000 ms
FPS = ; (11)
t1 + t2 + t3
The evaluation metrics are shown in the Table 4:

TABLE 4. Evaluation metrics functions and objectives.

E. EXPERIMENTAL RESULTS
1) COMPARATIVE DETECTION EXPERIMENTS WITH
DIFFERENT MODELS
To fully evaluate the detection algorithm of the improved
YOLOv8s model in this paper, ten algorithms were selected
for experimental comparison, including the Faster R-CNN,
Cascade R-CNN, and unimproved YOLOV3-tiny, YOLOv5s,
YOLOv6s, YOLOX, YOLOv8s, YOLOv9s, YOLOv10s, and
YOLOv11s. Meanwhile, a comparison was made with the
typical two-stage object detection algorithm, Faster-RCNN.
The mean Average Precision (mAP) after 100 iterations
was used as the evaluation criterion for different detection
algorithms, which can scientifically and reasonably assess FIGURE 12. Comparison chart of training accuracy across epochs.
the object detection capability and computational speed of
various detection algorithms. The results of the experimental The box_loss, cls_loss (Classification Loss), and dfl_loss
comparison are shown in Table 5. The mean precision after (Distributional Feature Loss) of the DA-YOLOv8s model
each training iteration is shown in Fig 12.The results indicate gradually decreased during the training process, as shown in
that, considering the balance between detection accuracy and Fig 13, indicating that the model’s performance in identifying
speed, the YOLOv8s, YOLOv9s, and YOLOv11s models the location, size, and category of textile defects is improving.
significantly outperform other models.Using the improved The consistency between training loss and validation loss,
DA-YOLOv8s model, which is superior to all the models it both showing a downward trend, suggests that the DA-
is compared with, there is a 4.2% increase in [email protected] and YOLOv8s model has good generalization capabilities.

11228 VOLUME 13, 2025


W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

FIGURE 13. Box_loss, Cls_loss, and Dfl_loss metrics.

2) ABLATION STUDY
In the Backbone, the DCNv2 module is used to replace all
original C2f modules of YOLOv8 for ablation experiments,
[email protected] and mAP have improved 2.3% and 2.6% respec- FIGURE 14. Comparison chart of training accuracy across different
tively, indicating that this modification effectively enhances epochs.
the feature extraction capability for small objects.At the same
time, the FPS has decreased to 351.92 FPS, and GFLOPS
has increased, but this does not affect its requirements for
industrial applications.

TABLE 6. Ablation study results comparison.

FIGURE 15. Detection results of the DA-YOLOv8s model on the test


dataset.

In the Neck network, a PSA self-attention module is


added after each C2f module connected to the detection Subsequently, we conducted pairwise combination exper-
head, reducing information loss in both channel and spatial iments of DCNv2, PSA, and the small object detection
dimensions and increasing feature fusion capability. We also head SOHead on the YOLOv8s benchmark model, which
conducted ablation experiments, where [email protected] and mAP all showed improvements over individual experiments. After
have improved 1.1% and 1.3% respectively.The decrease in applying all three to the model and undergoing 100 epochs
FPS and the increase in GFLOPS do not affect the feasibility of iterative training, [email protected] and mAP were improved
of its application. by 4.2% and 3.8% respectively, as shown in Table 6 and
In the Head, we added a small object detection head 4 × 4, Fig 14. However, the detection speed FTP evaluation metric
which requires an additional structure consisting of Conv, decreased by 52.2%, and GFLOPS also increased. But given
Concat, C2f, and PSA in the Neck network to connect to this YOLO’s excellent performance in detection speed, DA-
detection head. The experiment shows that [email protected] and YOLOv8s can still achieve 257.38 FPS, which meets the
mAP have improved 0.7% and 0.9% respectively. The FPS requirements of industrial applications.The partial detection
still reached 391.72, which does not affect the feasibility of results of the final DA-YOLOv8s on the validation set are
its application. shown in Fig 14.

VOLUME 13, 2025 11229


W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

V. CONCLUSION AND FUTURE PROSPECTS [14] F. Xu, Y. Liu, B. Zi, and L. Zheng, ‘‘Application of deep learning for
A textile defect detection algorithm based on an improved defect detection of paint film,’’ in Proc. 6th Int. Conf. Intell. Comput.
Signal Process. (ICSP), Xi’an, China, Apr. 2021, pp. 1118–1121, doi:
YOLOv8 algorithm is proposed to address the issues of 10.1109/ICSP51882.2021.9408956.
low detection accuracy and poor real-time performance [15] B. Zhao, M.-R. Dai, P. Li, and X.-N. Ma, ‘‘Data mining in railway defect
of traditional methods. Experimental results show that image based on object detection technology,’’ in Proc. Int. Conf. Data
Mining Workshops (ICDMW), Beijing, China, Nov. 2019, pp. 814–819,
improving the C2f to DCNv2 in the YOLOv8s baseline doi: 10.1109/ICDMW.2019.00120.
network can enhance the feature extraction capability of the [16] Y. Zhang, Z. Zhang, K. Fu, and X. Luo, ‘‘Adaptive defect detection
network, incorporating the self-attention mechanism PSA for 3-D printed lattice structures based on improved faster R-CNN,’’
IEEE Trans. Instrum. Meas., vol. 71, 2022, Art. no. 5020509, doi:
can increase the feature fusion capability on the channel 10.1109/TIM.2022.3200362.
and spatial dimensions, and adding detection heads can [17] X. Gao, M. Jian, M. Hu, M. Tanniru, and S. Li, ‘‘Faster multi-defect
improve the detection ability for small targets. Through these detection system in shield tunnel using combination of FCN and faster
RCNN,’’ Adv. Struct. Eng., vol. 22, no. 13, pp. 2907–2921, May 2019.
innovations, compared to the basic model, DA-YOLOv8s has [18] B. Wei, K. Hao, X. Tang, and L. Ren, ‘‘Fabric defect detection based on
improved by 4.2% and 3.8% respectively in [email protected] and faster RCNN,’’ in Proc. Artif. Int. Fashion Textiles Conf., Shanghai, China,
mAP. The textile defect detection algorithm designed in Oct. 2018, pp. 45–51.
this paper has high real-time performance and accuracy, [19] M. An, S. Wang, L. Zheng, and X. Liu, ‘‘Fabric defect detection using deep
learning: An improved faster R-approach,’’ in Proc. Int. Conf. Comput. Vis.,
meeting the practical application scenarios of enterprises.The Image Deep Learn. (CVIDL), Chongqing, China, Jul. 2020, pp. 319–324,
detection results are presented in Fig 15. doi: 10.1109/CVIDL51233.2020.00-78.
Textile defect detection is mostly dominated by small [20] M. Chen, L. Yu, C. Zhi, R. Sun, S. Zhu, Z. Gao, Z. Ke, M. Zhu, and
Y. Zhang, ‘‘Improved faster R-CNN for fabric defect detection based on
objects, with the detection ability for small targets as a Gabor filter with genetic algorithm optimization,’’ Comput. Ind., vol. 134,
research focus for the next step. Jan. 2022, Art. no. 103551, doi: 10.1016/j.compind.2021.103551.
[21] Z. Liu, W. Wu, X. Gu, S. Li, L. Wang, and T. Zhang, ‘‘Application of
combining YOLO models and 3D GPR images in road detection and
REFERENCES maintenance,’’ Remote Sens., vol. 13, no. 6, p. 1081, Mar. 2021.
[1] L. Tong, W. K. Wong, and C. K. Kwong, ‘‘Fabric defect detection [22] X. Liao, S. Lv, D. Li, Y. Luo, Z. Zhu, and C. Jiang, ‘‘YOLOv4-MN3
for apparel industry: A nonlocal sparse representation approach,’’ IEEE for PCB surface defect detection,’’ Appl. Sci., vol. 11, no. 24, p. 11701,
Access, vol. 5, pp. 5947–5964, 2017. Dec. 2021, doi: 10.3390/app112411701.
[2] A. Rasheed, B. Zafar, A. Rasheed, N. Ali, M. Sajid, S. H. Dar, U. Habib, [23] S. Teng, Z. Liu, and X. Li, ‘‘Improved YOLOv3-based bridge surface
T. Shehryar, and M. T. Mahmood, ‘‘Fabric defect detection using computer defect detection by combining high-and low-resolution feature images,’’
vision techniques: A comprehensive review,’’ Math. Problems Eng., Buildings, vol. 12, no. 8, p. 1225, Aug. 2022.
vol. 2020, pp. 1–24, Nov. 2020. [24] Z. Cong, X. Li, and Z. Huang, ‘‘Research on brake pad surface defect
[3] W. Y. Zhang, J. Zhang, Y. Hou, and S. Geng, ‘‘MWGR: A new method detection method based on deep learning,’’ in Proc. Int. Conf. Adv. Electr.
for real-time detection of cord fabric defects,’’ in Proc. Int. Conf. Adv. Eng. Comput. Appl. (AEECA), Dalian, China, Aug. 2023, pp. 813–818,
Mech. Syst., Tokyo, Japan, Sep. 2012, pp. 458–461. doi: 10.1109/AEECA59734.2023.00149.
[4] D. Zhu, R. Pan, W. Gao, and J. Zhang, ‘‘Yarn-dyed fabric defect detection [25] X. Yue, Q. Wang, L. He, Y. Li, and D. Tang, ‘‘Research on tiny target
based on autocorrelation function and GLCM,’’ Autex Res. J., vol. 15, no. 3, detection technology of fabric defects based on improved YOLO,’’ Appl.
pp. 226–232, Sep. 2015. Sci., vol. 12, no. 13, p. 6823, Jul. 2022.
[5] M. Guan, Z. Zhong, and Y. Rui, ‘‘Automatic defect segmentation for [26] Y. Ji and L. Di, ‘‘Textile defect detection based on multi-proportion
plain woven fabric images,’’ in Proc. Int. Conf. Commun., Inf. Syst. spatial attention mechanism and channel memory feature fusion network,’’
Comput. Eng. (CISCE), Haikou, China, Jul. 2019, pp. 465–468, doi: IET Image Process., vol. 18, no. 2, pp. 412–427, Feb. 2024, doi:
10.1109/CISCE.2019.00108. 10.1049/ipr2.12957.
[6] G.-H. Hu, Q.-H. Wang, and G.-H. Zhang, ‘‘Unsupervised defect detection [27] R. Wang, R. Shivanna, D. Cheng, S. Jain, D. Lin, L. Hong, and E. Chi,
in textiles based on Fourier analysis and wavelet shrinkage,’’ Appl. Opt., ‘‘DCN v2: Improved deep & cross network and practical lessons for Web-
vol. 54, no. 10, p. 2963, Apr. 2015. scale learning to rank systems,’’ in Proc. Web Conf., New York, NY, USA,
[7] L. Yihong and Z. Xiaoyi, ‘‘Fabric defect detection with optimal Gabor Jun. 2021, pp. 1785–1797, doi: 10.1145/3442381.3450078.
wavelet based on radon,’’ in Proc. IEEE Int. Conf. Power, Intell. [28] H. Liu, F. Liu, X. Fan, and D. Huang, ‘‘Polarized self-attention: Towards
Comput. Syst. (ICPICS), Shenyang, China, Sep. 2020, pp. 788–793, doi: high-quality pixel-wise regression,’’ 2021, arXiv:2107.00782.
10.1109/ICPICS50287.2020.9202242. [29] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based
[8] J. Xiang, R. Pan, and W. Gao, ‘‘Yarn-dyed fabric defect detection based on learning applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11,
an improved autoencoder with Fourier convolution,’’ Textile Res. J., vol. 93, pp. 2278–2324, Nov. 1998.
nos. 5–6, pp. 1153–1165, Mar. 2023. [30] R. Wang, B. Fu, G. Fu, and M. Wang, ‘‘Deep & cross network for ad
[9] A. M. Kamoona, A. K. Gostar, A. Bab-Hadiashar, and R. Hosein- click predictions,’’ in Proc. ADKDD, New York, NY, USA, Aug. 2017,
nezhad, ‘‘Point pattern feature-based anomaly detection for manufacturing pp. 2278–2324, doi: 10.1109/5.726791.
defects, in the random finite set framework,’’ IEEE Access, vol. 9, [31] M.-H. Guo, T.-X. Xu, J. Liu, Z.-N. Liu, P.-T. Jiang, T. Mu, S. Zhang,
pp. 158672–158681, 2021, doi: 10.1109/ACCESS.2021.3130261. R. R. Martin, M. Cheng, and S. Hu, ‘‘Attention mechanisms in computer
[10] F. Alghanim, M. Azzeh, A. El-Hassan, and H. Qattous, ‘‘Software vision: A survey,’’ Comput. Vis. Media, vol. 8, no. 3, pp. 331–368,
defect density prediction using deep learning,’’ IEEE Access, vol. 10, Mar. 2022, doi: 10.1007/s41095-022-0271-y.
pp. 114629–114641, 2022, doi: 10.1109/ACCESS.2022.3217480. [32] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
[11] S. Mei, Y. Wang, and G. Wen, ‘‘Automatic fabric defect detection L. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ 2017,
with a multi-scale convolutional denoising autoencoder network model,’’ arXiv:1706.03762.
Sensors, vol. 18, no. 4, p. 1064, Apr. 2018. [33] C. Feng, Y. Zhong, Y. Gao, M. R. Scott, and W. Huang, ‘‘TOOD:
[12] J. Jing, H. Ma, and H. Zhang, ‘‘Automatic fabric defect detection using a Task-aligned one-stage object detection,’’ in Proc. IEEE/CVF Int. Conf.
deep convolutional neural network,’’ Coloration Technol., vol. 135, no. 3, Comput. Vis. (ICCV), Montreal, QC, Canada, Oct. 2021, pp. 3490–3499,
pp. 213–223, Mar. 2019. doi: 10.1109/ICCV48922.2021.00349.
[13] S. Ma, R. Zhang, Y. Dong, Y. H. Feng, and G. Zhang, ‘‘A defect [34] X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang,
detection algorithm of denim fabric based on cascading feature extraction ‘‘Generalized focal loss: Learning qualified and distributed bounding
architecture,’’ J. Inf. Process. Syst., vol. 19, no. 1, pp. 109–117, Feb. 2023, boxes for dense object detection,’’ in Adv. Neural Inf. Proc. Syst., vol. 33,
doi: 10.3745/JIPS.04.0265. Jan. 2020, pp. 21002–21012.

11230 VOLUME 13, 2025


W. Song et al.: Textile Defect Detection Algorithm Based on the Improved YOLOv8

[35] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards JIAHUI ZHANG received the B.S. degree in
real-time object detection with region proposal networks,’’ IEEE Trans. computer science and technology from Hangzhou
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017, doi: Dianzi University, China, in 2021, and the M.S.
10.1109/TPAMI.2016.2577031. degree in computer science and engineering from
[36] Z. Cai and N. Vasconcelos, ‘‘Cascade R-CNN: Delving into high the University at Buffalo (SUNY), USA, in 2023.
quality object detection,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern He is currently working as a Teaching Assistant
Recognit., Salt Lake City, UT, USA, Jun. 2018, pp. 6154–6162. with Zhejiang Industry Polytechnic College. His
[37] J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improvement,’’
research interests include machine learning, deep
2018, arXiv:1804.02767.
learning, and computer vision.
[38] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, ‘‘YOLOX: Exceeding YOLO
series in 2021,’’ 2021, arXiv:2107.08430.
[39] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng,
W. Nie, Y. Li, B. Zhang, Y. Liang, L. Zhou, X. Xu, X. Chu, X. Wei,
and X. Wei, ‘‘YOLOv6: A single-stage object detection framework for
industrial applications,’’ 2022, arXiv:2209.02976.
[40] C.-Y. Wang, I.-H. Yeh, and H.-Y. Mark Liao, ‘‘YOLOv9: Learning what
you want to learn using programmable gradient information,’’ 2024,
arXiv:2402.13616.
[41] A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding,
‘‘YOLOv10: Real-time end-to-end object detection,’’ 2024,
arXiv:2405.14458.
[42] R. Khanam and M. Hussain, ‘‘YOLOv11: An overview of the key
architectural enhancements,’’ 2024, arXiv:2410.17725. MEILIAN ZHENG received the Ph.D. degree
[43] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, ‘‘Repvgg: from Zhejiang University, in June 2008. She is
Making VGG-style ConvNets great again,’’ in Proc. IEEE/CVF Conf. currently working as an Associate Professor with
Comput. Vis. Pattern Recognit. (CVPR), Nashville, TN, USA, Jun. 2021, the School of Management, Zhejiang University
pp. 13728–13737, doi: 10.1109/CVPR46437.2021.01352. of Technology, and the Vice Director of Zhe-
jiang Hithink RoyalFlush Artificial Intelligence
Research Institute. Her research interests include
organizational behavior, innovation management,
digital fusion, and data analytics.
WENFEI SONG received the B.S. and M.S.
degrees in information science from Beijing Nor-
mal University, in 2001 and 2004, respectively.
Since 2004, she has been working as a Fac-
ulty Member with the Department of Computer
Science, Zhejiang Industry Polytechnic College,
where she is currently working as an Associate
Professor. She has authored four textbooks and
published more than ten papers. Her research
interests include computer vision technology and
intelligent information processing.

XIAOMING LI received the master’s degree


from Xinjiang University, in 2008, and the Ph.D.
degree from Tianjin University, in March 2020.
DU LANG received the B.S. degree in artificial He is currently pursuing the Ph.D. degree with
intelligence from the School of Information Sci- the College of International Business, Zhejiang
ence and Engineering, Northeastern University, Yuexiu University. He is currently working as a
China, in 2023. He is currently working with Zhe- Professor with the College of International Busi-
jiang Industry Polytechnic College. His research ness, Zhejiang Yuexiu University, and the Deputy
interests include machine learning, deep learning, Director of Shaoxing Key Laboratory of Intelligent
and industrial control. Monitoring and Prevention of Smart City. His
research interests include human behavior dynamics and multi-layer network
local community detection.

VOLUME 13, 2025 11231

You might also like