0% found this document useful (0 votes)
43 views17 pages

Sensors 23 06558 v2

The document discusses a new algorithm called LSD-YOLOv5 for detecting surface defects on steel strips. It aims to improve upon existing methods by achieving higher accuracy while maintaining a lightweight model structure with fewer parameters and faster detection speed. The algorithm introduces several modifications to the YOLOv5 model including a shallow feature enhancement module, coordinate attention in the bottleneck layers, and a smaller bidirectional feature pyramid network for efficient feature fusion. An evaluation shows the new model reduces parameters by 61.5% and improves detection speed by 28.7% while also increasing detection accuracy by 2.4% compared to the original YOLOv5s model.

Uploaded by

devagyaiitk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views17 pages

Sensors 23 06558 v2

The document discusses a new algorithm called LSD-YOLOv5 for detecting surface defects on steel strips. It aims to improve upon existing methods by achieving higher accuracy while maintaining a lightweight model structure with fewer parameters and faster detection speed. The algorithm introduces several modifications to the YOLOv5 model including a shallow feature enhancement module, coordinate attention in the bottleneck layers, and a smaller bidirectional feature pyramid network for efficient feature fusion. An evaluation shows the new model reduces parameters by 61.5% and improves detection speed by 28.7% while also increasing detection accuracy by 2.4% compared to the original YOLOv5s model.

Uploaded by

devagyaiitk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

sensors

Article
LSD-YOLOv5: A Steel Strip Surface Defect Detection
Algorithm Based on Lightweight Network and Enhanced
Feature Fusion Mode
Huan Zhao , Fang Wan, Guangbo Lei *, Ying Xiong, Li Xu, Chengzhi Xu and Wen Zhou

School of Computer Science, Hubei University of Technology, Wuhan 430068, China


* Correspondence: [email protected]

Abstract: In the field of metallurgy, the timely and accurate detection of surface defects on metallic
materials is a crucial quality control task. However, current defect detection approaches face chal-
lenges with large model parameters and low detection rates. To address these issues, this paper
proposes a lightweight recognition model for surface damage on steel strips, named LSD-YOLOv5.
First, we design a shallow feature enhancement module to replace the first Conv structure in the
backbone network. Second, the Coordinate Attention mechanism is introduced into the MobileNetV2
bottleneck structure to maintain the lightweight nature of the model. Then, we propose a smaller
bidirectional feature pyramid network (BiFPN-S) and combine it with Concat operation for efficient
bidirectional cross-scale connectivity and weighted feature fusion. Finally, the Soft-DIoU-NMS
algorithm is employed to enhance the recognition efficiency in scenarios where targets overlap. Com-
pared with the original YOLOv5s, the LSD-YOLOv5 model achieves a reduction of 61.5% in model
parameters and a 28.7% improvement in detection speed, while improving recognition accuracy by
2.4%. This demonstrates that the model achieves an optimal balance between detection accuracy and
speed, while maintaining a lightweight structure.

Keywords: surface defect detection; YOLOv5s; Stem block; MobileNetV2 bottleneck; multi-scale
Citation: Zhao, H.; Wan, F.; Lei, G.;
feature fusion
Xiong, Y.; Xu, L.; Xu, C.; Zhou, W.
LSD-YOLOv5: A Steel Strip Surface
Defect Detection Algorithm Based on
Lightweight Network and Enhanced
1. Introduction
Feature Fusion Mode. Sensors 2023, Steel strips are a crucial product in the steel industry and serve as a foundational
23, 6558. https://doi.org/10.3390/ material in areas such as bridge engineering, shipbuilding, and automobile manufactur-
s23146558 ing [1]. The quality of steel strips directly impacts the performance and lifespan of various
Academic Editor: Gilbert-Rainer
infrastructure systems. However, during production and transportation, the surface of
Gillich
steel strips may present multiple defects. These defects not only compromise the quality
of the steel strips, but also contribute to inaccuracies in subsequent processing steps [2,3].
Received: 15 June 2023 The timely and accurate identification of defects is an effective way to improve the quality
Revised: 6 July 2023 and efficiency of steel strip production. Therefore, defect detection in industrial production
Accepted: 19 July 2023
carries significant practical value. Steel strip surface damage types are complex and diverse,
Published: 20 July 2023
and these defects may not be readily discernible [4], thereby posing certain challenges
to detection.
The techniques employed for identifying surface damage on steel strips can be catego-
Copyright: © 2023 by the authors.
rized into three groups, manual detection, automated detection, and artificial-intelligence-
Licensee MDPI, Basel, Switzerland.
based detection. Initially, defect detection was mainly accomplished through manual labor,
This article is an open access article
which necessitated prolonged periods of high-intensity labor for workers on the production
distributed under the terms and line. This not only resulted in inefficiencies, but also increased the likelihood of wrong
conditions of the Creative Commons and missed detections [5]. As a result, the quality of the steel strips could not be well-
Attribution (CC BY) license (https:// guaranteed. With the development of automatic detection technology, eddy current testing,
creativecommons.org/licenses/by/ infrared thermography, magnetic flux leakage testing, ultrasonic testing, etc., have become
4.0/). commonly used in industrial production. Ghanei et al. [6] utilized eddy current testing to

Sensors 2023, 23, 6558. https://doi.org/10.3390/s23146558 https://www.mdpi.com/journal/sensors


Sensors 2023, 23, 6558 2 of 17

determine the martensite percentage of dual-phase steels and evaluate their mechanical
properties. Keo et al. [7] employed a microwave excitation system coupled with infrared
thermography for detecting vertical reinforcements in concrete. Zhang et al. [8] utilized an
open-ended rectangular waveguide based on microwave non-destructive testing for detect-
ing defects in thick-coated steel plates. However, these methods still have the limitations
of material and the inability to accurately classify defects, making it difficult to recognize
defects accurately and efficiently.
In recent years, artificial intelligence techniques have been widely used in defect
detection. Current methods for detecting damage can be generally classified into two main
categories: traditional machine-learning-based object detection [9] and deep-learning-based
object detection [10]. For instance, Hussain et al. [11] proposed an object recognition model
based on intelligent deep learning and an improved whale optimization algorithm (WOA).
In their study, a data augmentation approach was first employed to address the imbalance
in object classes. Then, the DenseNet201 network was enhanced, and an improved WOA
was proposed to select the best features. The application of these methodologies has led to a
substantial enhancement in the accuracy of the model. The traditional methods extract the
damage features manually [12,13], such as Local Binary Pattern (LBP), Histogram Oriented
Gradient (HOG), Grey-level Co-occurrence matrix (GLCM), etc., followed by the classifica-
tion, which improves the efficiency of damage identification to some extent. Gola et al. [14]
extracted textural features and morphological parameters from steel structures, utilizing
a support vector machine algorithm to classify the microstructure of low-carbon steels.
Luo et al. [15] proposed a new generalized completed local binary model framework for fea-
ture extraction. Furthermore, Ashour et al. [16] extracted multidirectional shearlet features
from hot-rolled steel strip images, followed by Gray-level Co-occurrence Matrix (GLCM)
calculations. However, these methods are susceptible to external environmental influences,
such as lighting conditions and backgrounds. Moreover, the accuracy of recognition heavily
depends on feature engineering, resulting in poor robustness and model generalization.
Compared with traditional methods, deep learning detection methods can automat-
ically learn features from raw data with higher accuracy and efficiency [17,18], and they
exhibit increased resistance to external interference. Deep learning detection algorithms
can be broadly categorized into two main categories. One is a two-stage algorithm based on
candidate regions. For example, Zhou et al. [19] improved the Fast R-CNN model’s ability
to detect surface diseases on steel strips, combining a novel residual atrous spatial pyramid
pooling module with the feature pyramid network to enhance multiscale feature fusion.
Akhyar et al. [20] introduced deformable convolution, deformable RoI pooling, and guided
anchoring RPN to optimize the cascade R-CNN algorithm. Selamet et al. [21] proposed a
metal surface defect detection model that combines the Faster R-CNN algorithm with the
shape from the shading method. The other is a regression-based single-stage algorithm.
For instance, Guo et al. [22] introduced the TRANS structure into the backbone network
and detection head of YOLOv5, aiming to combine features and global information effec-
tively. Zhang et al. [23] detected and classified damaged hot-rolled steel strips using a
network model based on YOLOv5s. Li et al. [24] improved the performance of the YOLOv5
algorithm by incorporating dense multiscale weighted feature fusion and ASPF mod-
ules. The above studies illustrate the significant advancements in deep learning detection
algorithms and their broad applicability in the industrial production of steel strips.
Surface defect detection on steel strips is a critical application of object detection.
However, current research primarily concentrates on enhancing the precision of surface
defect classification [25,26], while simultaneously considering the recognition speed and
lightweight design of the model can be challenging. Liu et al. [27] developed an end-to-end
multiscale contextual detection model for identifying steel strip damage with multiple
scales and complex backgrounds. Although this model can achieve real-time detection of
steel strip damage, its effectiveness in detecting defects with irregular shapes and unclear
boundaries is limited. Li et al. [28] proposed a surface defect detection approach based on
YOLOv4. However, the detection model is complex and not ideal for implementation in
Sensors 2023, 23, 6558 3 of 17

devices with limited resources. Tian et al. [29] utilized key point estimation to determine the
defect centers, which optimized the detection speed of the model. Nonetheless, the model’s
performance is suboptimal when detecting ambiguous defects. Zhou et al. [30] proposed a
lightweight detection mode, which performs well in terms of real-time and lightweight.
However, this structure is prone to wrong and missed detections when detecting steel strip
images with overlapping targets. Liu et al. [31] designed a TruingDet algorithm based on
Fast R-CNN. They strengthened the ability of the detection model to identify and localize
damage, but failed to address the issue of balancing classification and regression effectively.
Although the current methods for object detection have achieved some success, they
still face the following challenges: (1) The speed of classification is limited by the complex
network structure and large computational volume, leading to high latency. Furthermore,
powerful computer hardware is required for model training, making it challenging to
deploy on devices with limited memory and computation resources. (2) The detection
accuracy is generally low for poorly characterized defects and small-scale targets. There
are instances of missed detections for defects that are closely spaced and mutually oc-
cluded. To address the above challenges, we designed a practical defect detection method
with fewer parameters to achieve a good balance between detection accuracy and speed.
YOLOv5 is an open-source object detection model that addresses the issue of recognition
and detection in industrial scenarios, offering both speed and compactness. This study aims
to enhance the YOLOv5 model by reducing its complexity, boosting detection speed, and
simultaneously improving accuracy. In this work, the major contributions are as follows:
• We developed a lightweight steel strip surface defect detection model, LSD-YOLOv5.
• We proposed a new, efficient feature extraction network by integrating the R-Stem
module and CA-MbV2 module into the backbone network. This has led to a significant
reduction in model parameters, while also improving the speed of detection.
• A smaller bidirectional feature pyramid network (BiFPN-S) was implemented in the
model to effectively integrate feature information at multiple scales.
• We improved the recognition efficiency of overlapping targets by employing the
Soft-DIoU-NMS prediction frame screening algorithm.
The rest of this paper is organized as follows. Section 2 details the proposed model LSD-
YOLOv5 for steel strip surface defect detection. Section 3 evaluates the experimental results
and compares our proposed model with state-of-the-art methods. Section 4 concludes this
paper and describes the directions for future work.

2. Materials and Methods


2.1. Overall Framework of Steel Strip Defect Detection
The timely and efficient identification of surface defects is essential for improving the
quality of steel strip production. However, current detection methods are limited in their
ability to balance speed and accuracy while maintaining a lightweight model. To solve
these issues, we proposed a lightweight network model, LSD-YOLOv5, for detecting steel
strip surface damage. First, we found that using a simple convolution operation in the
front end of the backbone network to extract the original image features may result in
the loss of significant surface defect feature information. To enhance feature extraction,
we proposed a shallow feature enhancement module called R-Stem. Second, the original
network model exhibits a high parameter count and slow detection speed. Therefore,
we designed a new CA-MbV2 structure that integrates the Coordinate Attention mech-
anism [32] into the lightweight MobileNetV2 bottleneck to achieve real-time detection
while maintaining a lightweight structure. However, we found that the model exhibits low
detection accuracy and high false detection rates for small-scale targets in practical strip
defect identification. Our analysis was that the feature pyramid in the original model using
the PANet structure [33] failed to fuse the features effectively. Consequently, we proposed
a smaller bidirectional feature pyramid network (BiFPN-S), enabling more effective multi-
scale feature fusion. Finally, to enhance the identification of overlapping targets, we used
Soft-DIoU-NMS as a prediction frame screening algorithm.
Sensors 2023, 23, 6558 4 of 17

2.2. Dataset Processing


In this study, we utilize the open-source GC10-DET dataset [34], which is accessible
on GitHub (Website online: https://github.com/lvxiaoming2019/GC10-DET-Metallic-
Surface-Defect-Datasets (accessed on 24 February 2020)). GC10-DET is a dataset of steel
strip surface defects collected in a real-world manufacturing environment, with a total of
2312 gray-level images. The dataset comprises ten types of defects, including Punching
(Pu), Welding line (Wl), Crescent gap (Cg), Water spot (Ws), Oil spot (Os), Silk spot (Ss),
Inclusion (In), Rolled pit (Rp), Crease (Cr), and Waist folding (Wf). In terms of data scale
and defect diversity, the GC10-DET dataset demonstrates superior performance compared
with other datasets focusing on steel strip defects. The size, shape, and location of defects
within this dataset exhibit variability, and there are no fixed patterns for defects of the
same type. Furthermore, the number and types of defects present in each image are not
constrained. Therefore, this dataset is closer to the benchmark of the actual scene. Sample
images from the GC10-DET dataset are illustrated in Figure 1.

(a) Punching (b) Welding line (c) Crescent gap (d) Water spot (e) Oil spot

(f) Silk spot (g) Inclusion (h) Rolled pit (i) Crease (j) Waist folding

Figure 1. An example of the GC10-DET steel strip surface defect dataset.

The performance of machine vision in defect detection is often hampered by external


environmental factors, such as the shooting angle and distance, as well as lighting irregular-
ities. These factors impact the quality of acquired images and diminish the overall detection
accuracy. To mitigate the above challenges, we carried out a preprocessing step in which
we eliminated some of the non-conforming defect images in the GC10-DET dataset. A total
of 2136 gray-level steel strip images contained in the GC10-DET dataset were utilized for
analysis. The dataset was split randomly into three subsets: training set, validation set,
and test set in the ratio of 8:1:1 to ensure optimal training results. The dataset includes
labeling information that details both the category and location of defects. To enhance the
efficiency of the defect recognition model training, the 1709 raw images in the training set
were resized to 640 × 640.

2.3. The Proposed LSD-YOLOv5


The architecture of the LSD-YOLOv5 model is depicted in Figure 2. The LSD-YOLOv5
network comprises four primary modules: input, backbone, neck, and head. In the back-
bone network, the R-Stem module optimizes the capacity to capture low-level features,
while the CBS module assists the CA-MbV2 module in feature extraction. The CA-MbV2
module minimizes the model parameters while boosting detection speed. In the neck
network, the BiFPN-S module fuses the extracted feature information and generates three
feature maps with different scales. Subsequently, the head detects and classifies objects by
Sensors 2023, 23, 6558 5 of 17

utilizing the generated feature maps. The post-processing phase of the model utilizes the
Soft-DIoU-NMS algorithm for refined prediction boxes.

9 8 7 6 5 4 3 2 1 0
Backbone

640  640
Input
CA- CA- CA- CA-
SPPF MbV2 CBS MbV2 CBS MbV2 CBS MbV2 CBS R-Stem

10 11 12 13 14 15 16
Concat_ Concat_
CBS = Conv BN SILU
CBS Upsample BiFPN-S
C3 CBS Upsample BiFPN-S
Neck

Bottlen
23 22 21 20 19 18 17 eck = CBS CBS Add

Concat_ Concat_
C3 BiFPN-S
CBS C3 CBS C3
BiFPN-S Bottlen
C3 = CBS
eck
Concat CBS

CBS
Conv Conv Conv
Max Max Max
=
Head

SPPF CBS
Pool Pool Pool

Concat

20  20
40  40
CBS
80  80

Figure 2. The structure of the LSD-YOLOv5 network.

2.3.1. The Backbone Network of LSD-YOLOv5


As depicted in Figure 3, we designed a new R-Stem module with reduced parameters
and computation to replace the initial Conv structure in the backbone network. The R-Stem
module can effectively enrich the features of steel strip defects and boost the overall gener-
alization performance of the model [35], while the computational cost is almost unchanged.
Based on the Stem module, we added a batch normalization operation and SiLU activation
function after each Convolutional structure. The R-Stem module merges information from
multiple channels for information fusion and splits the downsampling operation of the
feature map into two branches. Specifically, one branch employs convolution while the
other utilizes the maximum pooling operation. The resulting features from both branches
are integrated and then passed through another convolution operation. The utilization
of the R-Stem module accelerates model convergence, greatly mitigates the overfitting
problem, and enhances the feature expression of the model.
Max Pool
(22,2)

Input Output
(6406403) (16016032)

CBS CBS
(33,32,2) (11,32,1)

CBS CBS
(11,16,1) (33,32,2)

Figure 3. The structure of the R-Stem.


Sensors 2023, 23, 6558 6 of 17

The backbone network of YOLOv5 contains four C3 structures, resulting in a large


number of model parameters and slow detection. Therefore, it is crucial to develop a
lightweight feature extraction network. MobileNetV2 [36] is a convolutional neural network
architecture designed specifically for resource-constrained mobile or embedded devices.
It optimizes memory usage and execution speed, while maintaining high accuracy with
minimal computational costs and parameters. This architecture performs equally well in
resource-limited settings, considerably reducing the number of required operations and
memory demands. The fundamental module of MobileNetV2 is the bottleneck depthwise
separable convolution with residual connections, as depicted in Figure 4. A compressed
representation of low dimensionality is first input and subsequently expanded to a higher
dimension, which is then filtered using a lightweight depthwise convolution. Finally, the
acquired high-dimensional features are remapped to the low-dimensional space through a
linear convolution.

Conv 1  1, Dwise 3  3, Conv 1  1,


Relu6 Relu6 Linear

Bottleneck Bottleneck
Input Output

Figure 4. The bottleneck of MobileNetV2 structure.

Coordinate Attention (CA) is an efficient attention mechanism designed for lightweight


networks, providing flexibility and low overhead. It can be easily integrated with various
classical modules. The CA mechanism can capture cross-channel information, direction-
aware, and position-aware information simultaneously, which greatly assists in accurately
locating and identifying the target of interest. Therefore, we incorporated the CA mech-
anism into the MobileNetV2 bottleneck to better extract relevant information from the
features. The CA mechanism encodes channel relationships and long-range dependencies
by utilizing precise location information in two steps: coordinate information embedding
and coordinate attention generation. The structure of the CA attention mechanism is shown
in Figure 5. Given an input feature map X, we perform two separate one-dimensional
pooling operations along the horizontal and vertical coordinates, denoted by ( H, 1) and
(1, W ), respectively, to encode each channel. The output of the c-th channel at the height
h is shown in Equation (1). The output of the c-th channel at the width w is shown in
Equation (2).
1
W 0≤∑
Zch (h) = xc (h, i ), (1)
i <W

1
Zcw (w) =
H ∑ xc ( j, w), (2)
0≤ j < H

to generate coordinate attention, the feature maps aggregated through the aforementioned
equations are combined, and then a shared 1 × 1 convolutional transform function F1 is
applied to the concatenated feature maps to obtain feature representation f , as shown in
Equation (3).
f = δ( F1 ([zh , zw ])), (3)
where f ∈ RC/r×(w+h) is the intermediate feature map that captures the spatial information
along the horizontal and vertical coordinates, r denotes the scale of downsampling, and δ
refers to the Sigmoid activation function. The feature map f is separated into two tensors
Sensors 2023, 23, 6558 7 of 17

along the spatial dimension and the feature maps f h and f w are converted into tensors with
the same number of channels as the input X through two 1 × 1 convolutions, respectively,
to obtain Equations (4) and (5).
gh = σ ( Fh ( f h )), (4)

gw = σ ( Fw ( f w )), (5)
where gh and gw are the attention weights in the height and width directions of the feature
map. The final output of the CA module can be expressed by Equation (6):

yc (i, j) = xc (i, j) × gch (i ) × gcw ( j). (6)

The structure of the CA-MbV2 module is shown in Figure 6. We proposed a highly


efficient and lightweight network for feature extraction. Specifically, we replaced the first
Conv structure in the backbone network with the R-Stem module, introduced the CA-MbV2
structure, and kept the SPPF module intact. The proposed network not only guarantees high
detection accuracy, but also has a lightweight structure that satisfies real-time requirements
for detecting surface defects on steel strips.
C / r 1 (W  H ) C / r 1 (W  H )
C  H 1 C  H 1
C  H 1

Conv2d

Sigmod
BatchNorm+Non-linear
Input Output C  H W
Concat + Conv2d

X Avg Pool

Sigmod
Y Avg Pool
Conv2d
C  H W
C 1W

C 1W C 1W

Figure 5. The structure of the Coordinate Attention mechanism.

Conv 1  1, Dwise 3  3, Conv 1  1,


Relu6 Relu6 Linear

Input Output
X
Y

Coordinate Attention

Figure 6. The CA-MbV2 module structure.

2.3.2. The Feature Pyramid Network of LSD-YOLOv5


We found that the original model exhibited sub-optimal detection performance for
small targets. The primary cause is the degradation of semantic details associated with
small targets resulting from the multiple convolutions executed by the neural network.
The Path Aggregation Network (PANet) achieves multi-scale feature fusion through upsam-
pling and downsampling operations. However, PANet’s computational cost is significant,
and its feature fusion approach simply merges different input features without explicit dif-
ferentiation or weighting. Bi-directional Feature Pyramid Network (BiFPN) is a lightweight
network architecture designed for application scenarios where computational or memory
resources are limited [37]. It can effectively capture multi-scale feature information and
Sensors 2023, 23, 6558 8 of 17

improve the accuracy of detection. Therefore, to attain a better balance of surface defect
information across different scales of steel strips, we designed an augmented feature fusion
network based on BiFPN, called BiFPN-S. The structure of BiFPN-S is shown in Figure 7.
We streamlined the quintuplet input strata of BiFPN-S to a trio of input strata, to integrate
with the YOLOv5 framework. BiFPN shares one weight for all channels in each stratum
of the feature map, which impedes the network’s ability to acquire multi-scale encoding.
To address this, we introduced a separate CA attention mechanism in each prediction
branch to differentiate the importance of various channels within the same feature stratum.
We replaced PANet with BiFPN-S and combined Concat with BiFPN-S at layers 16, 20, 24,
and 28 in the network architecture, which we referred to as Concat_BiFPN-S. The utilization
of BiFPN-S results in the improved recognition ability of the model for multi-scale targets
and enhanced recognition rate of small targets with surface defects on steel strips.
P5out
P5in

Predict
P4td P4out
P4in

P3in P3out

Coordinate Attention

Figure 7. The structure of the BiFPN-S.

2.3.3. The Non-Maximum Suppression of LSD-YOLOv5


During the inference stage of object detection, multiple prediction boxes can be gen-
erated for a single detection target, but only one accurate prediction box is required to be
retained. YOLOv5 utilizes the traditional non-maximum suppression (NMS) approach
to eliminate redundant prediction boxes. However, this algorithm exhibits suboptimal
performance when overlapping or closely spaced defects exist on the steel strip surface.
Due to the mutual occlusion of targets resulting in a significant overlapping region of
prediction boxes, NMS may regard the occluded instance bounding boxes as redundant
information and remove them from the final detection results. This results in diminished
recall and missed detection, rendering it unsuitable for steel strip defect identification
models that require precise detection.
The DIoU-NMS algorithm [38] has alleviated the issue of suboptimal detection of
occluded targets through the traditional NMS approach to some extent. However, closely
spaced and mutually occluding defects still can result in missed detections. To address the
issues, we improved the DIoU-NMS algorithm to obtain Soft-DIoU-NMS. The Soft-DIoU-
NMS algorithm can compute the overlap between bounding boxes with greater precision,
thereby more effectively retaining the key information of target detection results. This
enhances the detection performance of occluded targets and diminishes the rate of missed
detection. The Soft-DIoU-NMS algorithm is depicted as follows:
(
Si , IoU − R DIoU ( M, Bi ) < ξ
Si = , (7)
Si (1 − IoU ( M, Bi )), IoU − R DIoU ( M, Bi ) ≥ ξ

ρ2 (b, b gt )
R DIoU = , (8)
c2
where Si represents the classification score, R DIoU denotes the penalty term as shown in
Figure 8, M signifies the highest scoring prediction box, Bi represents other prediction boxes,
Sensors 2023, 23, 6558 9 of 17

and ξ indicates the threshold value for NMS. Equation (8) defines R DIoU , and d = ρ(b, b gt )
denotes the Euclidean distance between the central points of M and Bi . According to
Equation (7), when the IoU value between the highest confidence prediction box and other
prediction boxes is below the threshold value, the other prediction boxes remain unaltered;
otherwise, the confidence of other prediction boxes decays until it falls below the threshold
value, at which point the detection box is removed.

Bi

Figure 8. The calculation process of R DIoU .

3. Experiments and Results


3.1. Experimental Environment and Parameter Setting
The experimental setup in this paper utilized an Intel Core i9-11900K @3.50GHz CPU,
NVIDIA GeForce RTX3060 GPU, and Ubuntu 18.04 64-bit operating system. The exper-
imental environment was built using the PyTorch framework, CUDNN 8.2, and CUDA
11.3. The model was trained using the Stochastic Gradient Descent (SGD) optimizer with a
weight decay of 0.0005 and a momentum of 0.937. In this study, the initial learning rate
was set to 0.001 and the minimum learning rate was set to 0.01 times the initial learning
rate. The warm-up method is a prevalent learning rate optimization strategy commonly
employed to enhance the stability of the deeper models. In the initial training phase,
warm-up training was conducted for 5 epochs to update the learning rate of the model.
After the warm-up phase, the cosine annealing algorithm was employed to dynamically
regulate the learning rate variation. The batch size was set to 32 and the training process
was conducted for a total of 200 epochs.
When working with a limited dataset, data augmentation techniques can be employed
to increase the number of training samples as well as the diversity, thereby bolstering the
robustness of the model. We used the Mosaic method to process images of surface damage
on steel strips. By randomly cropping, scaling, and arranging 4 images into a single image
and then inputting it into the network for training, we enriched the dataset and increased
the number of small samples.

3.2. Evaluation Metrics


To evaluate the performance of the detection model, precision, recall, mean average
precision (mAP), model parameters (Paramas), and detection speed (FPS) are commonly
used as metrics. Precision is defined as the ratio of correctly predicted positive sam-
ples to the total number of predicted positive samples [39]. The recall is defined as the
ratio of correctly predicted positive samples to the total number of actual positive sam-
ples [40]. The mathematical expressions for computing precision and recall are presented
in Equations (9) and (10), respectively.
Sensors 2023, 23, 6558 10 of 17

TP
Precision = , (9)
TP + FP
TP
Recall = , (10)
TP + FN
where TP represents the number of positive samples predicted as positive samples, TN is
the number of negative samples predicted as negative samples, FP denotes the number of
negative samples predicted as positive samples, and FN indicates the number of positive
samples predicted as negative samples.
AP denotes the mean detection accuracy for each defect class, while mAP represents
the mean detection accuracy across all defect classes. The equations for calculating AP and
mAP are expressed in Equations (11) and (12), respectively. The increase in the number of
model parameters can lead to a corresponding increase in both the size of the model file
and memory usage. The FPS metric indicates the number of images that can be processed
by the model per second and is used to evaluate whether the model meets the real-time
detection requirements.
Z 1
AP = P(r )dr, (11)
0

n
1
mAP =
n ∑ APi . (12)
i =1

3.3. Ablation Experiments and Analysis


To validate the effectiveness of each module incorporated into our proposed LSD-
YOLOv5 model, we conducted ablation experiments using the GC10-DET dataset and
employed YOLOv5s in version 6.1 of YOLOv5 as the baseline algorithm for comparison.

3.3.1. Ablation Experiment of LSD-YOLOv5


In Section 2, we proposed the lightweight steel strip surface defect detection model
LSD-YOLOv5. We conducted an ablation study by incrementally adding improvement mod-
ules. Table 1 presents the ablation results for each module. Bold indicates the best metric.

Table 1. Ablation study on LSD-YOLOv5.

Model Precision (%) mAP (%) Recall (%) Params (M) Inference Time (ms)
YOLOv5s(baseline) 67.6 65.5 62.3 7.04 14.2
+MobileNetV2 64.1 63.8 61.4 1.25 9.3
+CA-MbV2 66.3 64.7 62.6 1.83 9.9
+R-Stem+CA-MbV2 67.7 65.9 63.5 1.98 10.1
+R-Stem+CA-MbV2+BiFPN-S 68.9 67.2 64.9 2.71 10.7
LSD-YOLOv5 69.8 67.9 66.8 2.71 11.1

First, the original YOLOv5 backbone network was replaced by a lightweight Mo-
bileNetV2 structure. As shown in Table 1, the accuracy of the network model was 63.8%,
representing a slight decrease of 1.7% compared with the original model. However, the
model parameters and inference time were significantly reduced by 82.2% and 34.5%,
respectively. This indicates that the network has a more lightweight architecture and faster
detection capabilities, making it feasible for deployment on devices with limited memory
and computation resources. Second, a new feature extraction network was constructed
by incorporating the CA mechanism into the lightweight MobileNetV2 architecture. Com-
pared with using the MobileNetV2 architecture as the backbone network, the mAP of the
model was improved by 0.9%. This demonstrates that incorporating the CA attention mech-
anism makes the backbone network more focused on extracting useful information from
the features. Subsequently, the R-Stem was implemented on the backbone network with
a 1.2% improvement in mAP, while maintaining a minimal increase in model parameters
Sensors 2023, 23, 6558 11 of 17

and inference time. Moreover, the CA mechanism was placed in each prediction branch of
the BiFPN and combined this new module with the Concat operation in the network. As
presented in Table 1, the mAP of the model improved by 1.3%, demonstrating that BiFPN-S
can better merge feature information for identifying steel strip surface damage. Finally, we
utilized Soft-DIoU-NMS to refine the prediction boxes in the post-processing process of
defect detection. This resulted in an improvement of 0.7% and 1.9% in mAP and Recall of
the model, respectively, which reduced the missed detection rate of overlapping targets
and improved the detection accuracy.

3.3.2. Visualization Results on Ablation Experiment


To more intuitively observe the experimental results presented in Table 1, we have
provided some visual representations of the detection results in Figures 9–11. Model 1
refers to the original YOLOv5s model, while Model 2 integrates the R-Stem and CA-MbV2
modules. Model 3 incorporates the BiFPN-S module based on Model 2, and Model 4
represents our proposed LSD-YOLOv5 model. We conducted a comparative analysis of the
detection results with four models in three distinct scenarios.

(b) Model 1 (d) Model 3

(a) Original Image

(c) Model 2 (e) Model 4

Figure 9. Detection results of inconspicuous defects. (a) Original image sample; (b) Model 1 detection
results; (c) Model 2 detection results; (d) Model 3 detection results; (e) Model 4 detection results.

From Model 1 to Model 4, there was a gradual increase in the number of detectable
defects. And there were almost no missed inspections in the Model 4. As shown in Figure 9,
compared with Model 1 detection, Model 2 was optimized by adding the R-Stem and
CA-MbV2 modules to improve the detection accuracy for relatively faint, inconspicuous
defects. As indicated in Figure 10, we performed a series of experiments to evaluate
the performance of the four models in the scenario of small damage detection. Model 3
exhibited a substantial enhancement in the detection performance compared to the two
preceding models. As depicted in Figure 10d, it accurately identified relatively small
defects in the images, effectively improving the detection performance for multi-scale
targets. However, Model 3 was susceptible to confusion when there were overlapping or
closely located defects as shown in Figure 11. By using the Soft-DIoU-NMS algorithm in
Model 4, as indicated in Figure 11e, the model could identify and accurately locate targets
in proximity or overlap, thereby reducing the rate of missed detections.
Sensors 2023, 23, 6558 12 of 17

(b) Model 1 (d) Model 3

(a) Original Image

(c) Model 2 (e) Model 4

Figure 10. Detection results of small defects. (a) Original image sample; (b) Model 1 detection results;
(c) Model 2 detection results; (d) Model 3 detection results; (e) Model 4 detection results.

(b) Model 1 (d) Model 3

(a) Original Image

(c) Model 2 (e) Model 4

Figure 11. Detection results of overlapping defects. (a) Original image sample; (b) Model 1 detection
results; (c) Model 2 detection results; (d) Model 3 detection results; (e) Model 4 detection results.

3.4. Comparative Experiments and Analysis


To further demonstrate the reliability and validity of our proposed model, we com-
pared LSD-YOLOv5 with popular algorithms in object detection including SSD [41], Faster
R-CNN [42], YOLOv4, YOLOv5s, and YOLOv7. The experimental results are presented in
Figure 12 and Table 2. In industrial settings, the task of detecting surface defects on steel
strips must consider not only accuracy but also the optimization of detection efficiency and
model weight. Therefore, in this section of the discussion, Precision, mAP, Recall, model
parameters (Paramas), floating-point operations (Flops), and frames per second (FPS) are
used as comprehensive evaluation metrics for the detection model.
Sensors 2023, 23, 6558 13 of 17

Table 2. Comparison of detection results using different detection algorithms.

Model Precision (%) mAP (%) Recall (%) Params (M) Flops (G) FPS
Faster R-CNN 68.6 68.2 67.8 63.57 263.5 12.6
SSD 53.3 51.6 52.1 27.32 39.2 54.7
YOLOv4 60.2 57.5 56.3 52.36 121.3 41.5
YOLOv5s 67.6 65.5 62.3 7.04 15.9 70
YOLOv7 62.0 61.3 60.6 37.21 103.6 56.3
LSD-YOLOv5 69.8 67.9 66.8 2.71 9.1 90.1

(a) SSD (b) Faster R-CNN

(c) YOLOv4 (d) YOLOv5s

(e) YOLOv7 (f) LSD-YOLOv5

Figure 12. Comparison of visualization results using different detection algorithms.

Figure 12 shows the visualization of the recognition effects, where the labels on the
anchor boxes indicate the confidence scores of the model detection. The visualization
shows that our proposed model has a high confidence score in the damaged area and
the recognition result is accurate. Furthermore, the model exhibits excellent detection
performance for inconspicuous defects, such as “Rolled pit” (marked as “8_yahen”), and
small-scale defects, such as “Oil spot” (marked as “5_youban”). As shown in Table 2, the
Faster R-CNN algorithm had the highest mAP and Recall of 68.2% and 67.8%. However, its
model parameters and computational demands of 63.57 M and 263.5 G were considerably
larger than other detection models. Additionally, the Faster R-CNN model had an FPS of
12.6, which falls short of real-time detection requirements. The mAP achieved by YOLOv7
was inferior to that of YOLOv5s, and the model entailed higher parameters and compu-
Sensors 2023, 23, 6558 14 of 17

tational demands, rendering it unsuitable for deployment in application scenarios with


limited resources, such as mobile devices, etc. The LSD-YOLOv5 model proposed in this
paper exhibited strong capabilities in terms of detection accuracy and speed. Specifically,
the mAP of our proposed model was 67.9%, representing an improvement of 2.4% over
the YOLOv5s model. Moreover, our proposed model achieved an FPS of 90.1, represent-
ing a 28.7% increase compared to the previous 70 FPS. This enhancement has resulted in
increased efficiency and suitability for real-time detection in steel strip production environ-
ments. Finally, LSD-YOLOv5 had significant advantages in terms of model parameters and
computational complexity, with only 2.71 M parameters and 9.1 G computational require-
ments. Compared with the original model, the model parameters and computation were
reduced by 61.5% and 42.8%, highlighting the effectiveness of our approach in achieving a
lightweight network architecture. In summary, LSD-YOLOv5 delivers enhanced accuracy
in detecting defects, with a lightweight model structure designed to meet the real-time
demands of industrial detection.

3.5. Performance Analysis


Figure 13 illustrates the comparison of evaluation indexes for different models.
In Figure 13a, the mAP of our proposed model LSD-YOLOv5 was 67.9%, which was
2.4% higher than the original model. The convergence speed and stability of the model can
provide a better assessment of the model’s performance. As shown in Figure 13, the loss
curve had a larger drop at the beginning of the model training, indicating that the initial
hyperparameter setting was reasonable. All other parameters held constant, the improved
model had a steeper convergence curve in the first period. After a period of training, the
loss curve gradually leveled off and the model converged. It can be seen that the improved
YOLOv5s model has a more stable convergence state and higher convergence efficiency
than the original model. Finally, as depicted in Figure 13b, the bounding box loss curve
of the LSD-YOLOv5 model converged significantly faster than that of the original model,
which can be attributed to the effectiveness of the CA attention mechanism in mitigating
the effects of extraneous features and enhancing the extraction of relevant information.

(a) (b)

Figure 13. Cont.


Sensors 2023, 23, 6558 15 of 17

(c) (d)

Figure 13. Comparison of evaluation indexes under different models. (a) mAP curve. (b) Bounding
box loss curve. (c) Confidence loss curve. (d) Classification loss curve.

4. Conclusions
In this paper, we have proposed an optimal LSD-YOLOv5 model for recognizing
surface damage on steel strips in real-world scenarios, which is characterized by fewer
parameters and lower computational requirements. The model strikes an ideal balance
between detection accuracy and speed, while maintaining a lightweight structure. First,
to improve the feature extraction capability of shallow networks, the R-Stem module was
proposed to replace the first Conv structure in the backbone network. And, we designed a
new feature extraction network called CA-MbV2, which integrates the Coordinate Attention
mechanism into the bottleneck structure of MobileNetV2. CA-MbV2 has significantly
reduced the model parameters and improved the detection speed. Second, we introduced
BiFPN-S into the neck layer and placed a lightweight attention module in each prediction
branch, enhancing the model’s ability to modulate various scales of damage. Finally, Soft-
DIoU-NMS was utilized as a prediction frame screening algorithm to minimize the number
of missed detections of overlapping targets.
The effectiveness of the proposed method has been demonstrated by conducting
extensive evaluations and ablation studies on the GC10-DET dataset. In comparison with
the original YOLOv5s model, the proposed LSD-YOLOv5 model has achieved a reduction
of 61.5% in parameters, a decrease of 42.8% in computation, and a 2.4% improvement in
accuracy, making it more suitable for meeting the lightweight requirements of steel strip
surface defect detection. This study has the potential to provide insights into lightweight
and real-time methods for detecting metal surface defects in industrial settings, and may
establish a basis for the implementation of industrial automation. In the future, we will
consider exploring the adoption of more efficient data augmentation techniques, such as
Generative Adversarial Networks (GAN), to enhance the recognition and generalization
capabilities of our models.

Author Contributions: Conceptualization, H.Z. and F.W.; methodology, H.Z.; software, H.Z. and
G.L.; validation, H.Z.; formal analysis, Y.X and L.X.; investigation, F.W., G.L. and C.X.; resources, L.X.
and W.Z.; data curation, H.Z. and G.L.; writing—original draft preparation, H.Z.; writing—review
and editing, F.W. and G.L.; visualization, H.Z.; supervision, Y.X. and C.X.; project administration,
W.Z.; funding acquisition, L.X. and W.Z. All authors have read and agreed to the published version
of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China (Grant
No. 62202147) and the Science and Technology Research Project of Education Department of Hubei
Province (Grant No. B2021070).
Institutional Review Board Statement: Not applicable.
Sensors 2023, 23, 6558 16 of 17

Informed Consent Statement: Not applicable.


Data Availability Statement: Publicly available datasets were analyzed in this study. These data can
be found here: https://github.com/lvxiaoming2019/GC10-DET-Metallic-Surface-Defect-Datasets
(accessed on 24 February 2020).
Acknowledgments: We extend our sincere appreciation to the funders of this project and all the
teams and individuals who contributed to this study.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Wen, X.; Shan, J.; He, Y.; Song, K. Steel Surface Defect Recognition: A Survey. Coatings 2022, 13, 17. [CrossRef]
2. Luo, Q.; Fang, X.; Su, J.; Zhou, J.; Zhou, B.; Yang, C.; Liu, L.; Gui, W.; Tian, L. Automated visual defect classification for flat steel
surface: A survey. IEEE Trans. Instrum. Meas. 2020, 69, 9329–9349. [CrossRef]
3. Wang, H.; Li, Z.; Wang, H. Few-shot steel surface defect detection. IEEE Trans. Instrum. Meas. 2021, 71, 1–12. [CrossRef]
4. Tang, B.; Chen, L.; Sun, W.; Lin, Z.K. Review of surface defect detection of steel products based on machine vision. IET Image
Process. 2023, 17, 303–322. [CrossRef]
5. Zhao, W.; Song, K.; Wang, Y.; Liang, S.; Yan, Y. FaNet: Feature-aware Network for Few Shot Classification of Strip Steel Surface
Defects. Measurement 2023, 208, 112446. [CrossRef]
6. Ghanei, S.; Kashefi, M.; Mazinani, M. Eddy current nondestructive evaluation of dual phase steel. Mater. Des. 2013, 50, 491–496.
[CrossRef]
7. Keo, S.A.; Brachelet, F.; Breaban, F.; Defer, D. Steel detection in reinforced concrete wall by microwave infrared thermography.
NDT E Int. 2014, 62, 172–177. [CrossRef]
8. Zhang, H.; Gao, B.; Tian, G.Y.; Woo, W.L.; Bai, L. Metal defects sizing and detection under thick coating using microwave NDT.
NDT E Int. 2013, 60, 52–61. [CrossRef]
9. Wang, A.; Sha, M.; Liu, L.; Chu, M. A new process industry fault diagnosis algorithm based on ensemble improved binary-tree
SVM. Chin. J. Electron. 2015, 24, 258–262. [CrossRef]
10. Hussain, N.; Khan, M.A.; Tariq, U.; Kadry, S.; Yar, M.A.E.; Mostafa, A.M.; Alnuaim, A.A.; Ahmad, S. Multiclass Cucumber Leaf
Diseases Recognition Using Best Feature Selection. Comput. Mater. Contin. 2022, 70, 3281–3294. [CrossRef]
11. Hussain, N.; Khan, M.A.; Kadry, S.; Tariq, U.; Mostafa, R.R.; Choi, J.I.; Nam, Y. Intelligent deep learning and improved whale
optimization algorithm based framework for object recognition. Hum. Cent. Comput. Inf. Sci 2021, 11, 2021.
12. Chu, M.; Gong, R. Invariant feature extraction method based on smoothed local binary pattern for strip steel surface defect. ISIJ
Int. 2015, 55, 1956–1962. [CrossRef]
13. Wang, Y.; Xia, H.; Yuan, X.; Li, L.; Sun, B. Distributed defect recognition on steel surfaces using an improved random forest
algorithm with optimal multi-feature-set fusion. Multimed. Tools Appl. 2018, 77, 16741–16770. [CrossRef]
14. Gola, J.; Webel, J.; Britz, D.; Guitar, A.; Staudt, T.; Winter, M.; Mücklich, F. Objective microstructure classification by support
vector machine (SVM) using a combination of morphological parameters and textural features for low carbon steels. Comput.
Mater. Sci. 2019, 160, 186–196. [CrossRef]
15. Luo, Q.; Sun, Y.; Li, P.; Simpson, O.; Tian, L.; He, Y. Generalized completed local binary patterns for time-efficient steel surface
defect classification. IEEE Trans. Instrum. Meas. 2018, 68, 667–679. [CrossRef]
16. Ashour, M.W.; Khalid, F.; Abdul Halin, A.; Abdullah, L.N.; Darwish, S.H. Surface defects classification of hot-rolled steel strips
using multi-directional shearlet features. Arab. J. Sci. Eng. 2019, 44, 2925–2932. [CrossRef]
17. Zhang, S.; Zhang, Q.; Gu, J.; Su, L.; Li, K.; Pecht, M. Visual inspection of steel surface defects based on domain adaptation and
adaptive convolutional neural network. Mech. Syst. Signal Process. 2021, 153, 107541. [CrossRef]
18. Chen, X.; Lv, J.; Fang, Y.; Du, S. Online detection of surface defects based on improved YOLOV3. Sensors 2022, 22, 817. [CrossRef]
[PubMed]
19. Zhou, X.; Wei, M.; Li, Q.; Fu, Y.; Gan, Y.; Liu, H.; Ruan, J.; Liang, J. Surface Defect Detection of Steel Strip with Double Pyramid
Network. Appl. Sci. 2023, 13, 1054. [CrossRef]
20. Akhyar, F.; Liu, Y.; Hsu, C.Y.; Shih, T.K.; Lin, C.Y. FDD: A deep learning–based steel defect detectors. Int. J. Adv. Manuf. Technol.
2023, 126, 1093–1107. [CrossRef] [PubMed]
21. Selamet, F.; Cakar, S.; Kotan, M. Automatic detection and classification of defective areas on metal parts by using adaptive fusion
of faster R-CNN and shape from shading. IEEE Access 2022, 10, 126030–126038. [CrossRef]
22. Guo, Z.; Wang, C.; Yang, G.; Huang, Z.; Li, G. Msft-yolo: Improved yolov5 based on transformer for detecting defects of steel
surface. Sensors 2022, 22, 3467. [CrossRef] [PubMed]
23. Zhang, Y.; Wang, W.; Li, Z.; Shu, S.; Lang, X.; Zhang, T.; Dong, J. Development of a cross-scale weighted feature fusion network
for hot-rolled steel surface defect detection. Eng. Appl. Artif. Intell. 2023, 117, 105628. [CrossRef]
24. Li, G.; Zhao, S.; Zhou, M.; Li, M.; Shao, R.; Zhang, Z.; Han, D. YOLO-RFF: An Industrial Defect Detection Method Based on
Expanded Field of Feeling and Feature Fusion. Electronics 2022, 11, 4211. [CrossRef]
Sensors 2023, 23, 6558 17 of 17

25. Kou, X.; Liu, S.; Cheng, K.; Qian, Y. Development of a YOLO-V3-based model for detecting defects on steel strip surface.
Measurement 2021, 182, 109454. [CrossRef]
26. Zhao, W.; Chen, F.; Huang, H.; Li, D.; Cheng, W. A new steel defect detection algorithm based on deep learning. Comput. Intell.
Neurosci. 2021, 2021, 5592878. [CrossRef]
27. Liu, R.; Huang, M.; Gao, Z.; Cao, Z.; Cao, P. MSC-DNet: An efficient detector with multi-scale context for defect detection on strip
steel surface. Measurement 2023, 209, 112467. [CrossRef]
28. Li, M.; Wang, H.; Wan, Z. Surface defect detection of steel strips based on improved YOLOv4. Comput. Electr. Eng. 2022,
102, 108208. [CrossRef]
29. Tian, R.; Jia, M. DCC-CenterNet: A rapid detection method for steel surface defects. Measurement 2022, 187, 110211. [CrossRef]
30. Zhou, W.; Hong, J. FHENet: Lightweight Feature Hierarchical Exploration Network for Real-Time Rail Surface Defect Inspection
in RGB-D Images. IEEE Trans. Instrum. Meas. 2023, 72, 1–8. [CrossRef]
31. Liu, Z.; Tang, R.; Duan, G.; Tan, J. TruingDet: Towards high-quality visual automatic defect inspection for mental surface. Opt.
Lasers Eng. 2021, 138, 106423. [CrossRef]
32. Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 13713–13722.
33. Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768.
34. Lv, X.; Duan, F.; Jiang, J.J.; Fu, X.; Gan, L. Deep metallic surface defect detection: The new benchmark and detection network.
Sensors 2020, 20, 1562. [CrossRef] [PubMed]
35. Wang, R.J.; Li, X.; Ling, C.X. Pelee: A real-time object detection system on mobile devices. Adv. Neural Inf. Process. Syst. 2018, 31,
5278.
36. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018;
pp. 4510–4520.
37. Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790.
38. Zhang, D.Y.; Luo, H.S.; Wang, D.Y.; Zhou, X.G.; Li, W.F.; Gu, C.Y.; Zhang, G.; He, F.M. Assessment of the levels of damage caused
by Fusarium head blight in wheat using an improved YoloV5 method. Comput. Electron. Agric. 2022, 198, 107086. [CrossRef]
39. Farady, I.; Kuo, C.C.; Ng, H.F.; Lin, C.Y. Hierarchical Image Transformation and Multi-Level Features for Anomaly Defect
Detection. Sensors 2023, 23, 988. [CrossRef] [PubMed]
40. He, Y.; Su, Y.; Wang, X.; Yu, J.; Luo, Y. An improved method MSS-YOLOv5 for object detection with balancing speed-accuracy.
Front. Phys. 2023, 10, 1349. [CrossRef]
41. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of
the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings,
Part I 14; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37.
42. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural
Inf. Process. Syst. 2015, 28, 91–99. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like