DDSNet Deep Dual-Branch Networks For Surface Defect Segmentation
DDSNet Deep Dual-Branch Networks For Surface Defect Segmentation
Abstract— Semantic segmentation of surface defects is essential manual inspection, which could be more efficient and costly.
to ensure product quality in intelligent manufacturing. However, In addition, traditional methods based on texture analysis
due to the diversity and complexity of industrial scenarios and lack robustness for diverse tasks. With the rapid development
defects, existing defect semantic segmentation methods still suf-
fer from inconsistent intraclass and indistinguishable interclass of deep learning technologies, convolutional neural networks
segmentation results. To overcome these problems, we propose a (CNNs) are widely used in image detection, replacing tra-
new dual-branch surface defect semantic segmentation network, ditional handcrafted feature extraction methods. CNN-based
DDSNet. First, we integrate semantic and border information methods have been extensively applied in various industrial
to enrich the feature representation of defects and solve the fields, such as railway track defects [1], steel strips defects [2],
problem of indistinguishable interclass segmentation results.
Next, we introduce a global and local feature fusion (GLF) magnetic tile defects [3], road cracks [4], printed circuit board
module based on similarity metrics to guide the network in defects [5], and mobile phone screens defects [6]. However,
further refining and highlighting the detail feature on defects to these methods, mainly based on object detection, cannot
solve the problem of inconsistent intraclass segmentation results. remarkably highlight the specific location of defects, such as
In addition, to enrich the surface defect segmentation datasets, minor scratches or cracks on steel surfaces. Therefore, they
we collect datasets of steel foil surface defects, Ste-Seg, and
aluminum block surface defects, Alu-Seg. Experimental results cannot satisfy high-precision requirements. Methods based on
for five datasets of semantic segmentation of defects show that semantic segmentation have been widely utilized in defect
DDSNet outperforms the state-of-the-art methods in terms of detection to detect minor defects accurately.
mIoU (NEU-Seg: 85.12%, MT-Defect: 76.51%, MSD: 91.82%, Surface defect detection methods based on CNNs can be
Ste-Seg: 90.01%, and Alu-Seg: 84.77%). All our experiments divided into three main categories, including pixel-level defect
were conducted on a NVIDIA GTX 3060Ti. The dataset and
code are available at https://github.com/QinLi-STUDY/DDSNet. segmentation methods, region-level defect detection methods,
and image-level defect classification methods. Among them,
Index Terms— Boundary features, deep learning, feature
image-level defect classification can classify the defects in the
fusion, semantic segmentation, surface defect detection.
image [7], [8], [9], [10], but the position of the defects is
not located. Region-level defect detection can only roughly
I. I NTRODUCTION locate the position of the defects through the bounding box
and cannot recognize the size of the defects [11], [12], [13],
S URFACE defect detection aims to identify abnormal
regions on the surface of materials and workpieces. Hence,
high precision is often required in many industrial applications.
[14]. Compared with region-level defect detection methods and
image-level defect classification methods, pixel-level defect
However, traditional defect detection methods mainly rely on segmentation methods can accurately segment defect regions
and provide defect location and type information [1], [15],
Manuscript received 11 April 2024; revised 7 June 2024; [16], [17]. Therefore, in this article, we will continue to study
accepted 28 June 2024. Date of publication 15 July 2024; date of
current version 24 July 2024. This work was supported by the Special Project pixel-level defect segmentation methods.
for Industrial Foundation Reconstruction and High Quality Development of Since the appearance of FCN [18], semantic segmentation
Manufacturing Industry under Grant TC230A076-13. The Associate Editor has been widely applied in various fields. In the field of USVs,
coordinating the review process was Dr. Ferdinanda Ponci. (Zhenyu Yin
and Li Qin contributed equally to this work.) (Corresponding authors: Yang et al. [19] proposed a method for ship detection and
Zhenyu Yin; Guangjie Han.) waterway channel segmentation, which achieved significant
Zhenyu Yin, Li Qin, Xiaoqiang Shi, Feiqing Zhang, and Guangyuan results. In the field of medicine, Singh et al. [20] proposed
Xu are with the Shenyang Institute of Computing Technology, University
of Chinese Academy of Sciences, Beijing 100049, China, also with a method for heart segmentation to achieve effective seg-
the Shenyang Institute of Computing Technology, Chinese Academy of mentation of the left atrium. Similarly, in the field of defect
Sciences, Shenyang 110168, China, and also with the Liaoning Key detection, a series of pixel-level defect segmentation methods
Laboratory of Domestic Industrial Control Platform Technology on Basic
Hardware and Software, Shenyang 110168, China (e-mail: congmy@ have been proposed to improve the localization accuracy of
163.com; [email protected]; [email protected]; defects [21], [22], [23]. These networks follow the design
[email protected]; [email protected]). rule of UNet [24], as shown in Fig. 1(a), which is a typical
Guangjie Han is with the Department of Internet of Things Engineering,
Hohai University, Changzhou 213022, China (e-mail: hanguangjie@ encoder–decoder structure. The encoder encodes and com-
gmail.com). presses the features, and the decoder restores the resolution of
Yuanguo Bi is with the School of Computer Science and Engineering, the compressed feature map to the size of the input image. The
Northeastern University, Shenyang 110167, China (e-mail: biyuanguo@
mail.neu.edu.cn). encoding and decoding stages perform feature fusion through
Digital Object Identifier 10.1109/TIM.2024.3427806 concatenation. Although this structure can effectively improve
1557-9662 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024
Fig. 3. Hard examples in surface defect semantic segmentation. The first row
shows the problem of inconsistent intraclass segmentation results. The second
row shows the problem of indistinguishable interclass segmentation results.
(a) Input images. (b) Ground truth images. (c) BiSeNetV1 [25]. (d) DDSNet.
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024
indistinguishable interclass segmentation results and incon- the pyramid feature fusion module, which allows adequate
sistent intraclass segmentation results during detection. information to be propagated from the low-resolution fused
For example, DDRNet [38] and BiSeNet [25] belong to feature map to the high-resolution fused feature map. Experi-
dual-branch network structures. BiSeNet [25] designed a ments performed on four datasets show state-of-the-art results.
spatial branch to preserve spatial positional information by Zhu et al. [41] proposed an improved feature space pyramid
maintaining a higher resolution and generating high-resolution pool (ASPP) module to extend the acceptance domain (RF) of
feature maps. It also designed a fast downsampling semantic low-level features and then introduced a global attention mod-
branch to obtain a large receptive field. This method achieved a ule for multilevel feature fusion on the multibranch structure of
mIoU of 75.8 on the Cityscapes dataset. DDRNet [38] aimed the decoder to enhance the effectiveness of the features. The
to balance resolution and inference speed, so it designed a accuracy can be improved to 98.54%, 99.82%, and 99.79%
deep dual-resolution network. It includes a high-resolution on three representative defect datasets: CrackForest, Kolektor,
spatial branch with 1/8 resolution and a low-resolution seman- and RSDD. Liu et al. [42] proposed a multishuffle-block
tic branch with multiple downsampling. The low-resolution dilated convolution module to fuse multiscale defect features
and high-resolution branches are paired at different stages to capture the feature information of tiny defects. Experimental
to fuse spatial and semantic information comprehensively. results show that the method achieves state-of-the-art results
This method achieved a mIoU of 80.4 on the Cityscapes on four surface defect datasets. Cheng et al. [43] proposed
dataset. Although the network mentioned above structures a multiscale residual fusion module (MSRFM) to generate
have achieved significant results in real-time segmentation, feature mapping with minimal background mode interference,
their performance in defect segmentation is relatively poor. so as to further extract defect prototypes, and then obtain
This is because it is difficult to effectively supervise the deep accurate prediction results by calculating the distance map
network in learning defect edge information. Moreover, when between the feature mappings and the defect prototypes. The
fusing high-resolution and low-resolution features, direct use method achieved mIoU of 83.49% and 80.12% in defect
of upsampling factors greater than two may result in the loss transfer detection between two background pattern wafers.
of small-scale defect information. As a result, when dealing Feature fusion can integrate multiple-scale defect informa-
with defects with low contrast and high local similarity, there tion and effectively improve low-contrast defect recognition.
are problems of inconsistent intraclass and indistinguishable Therefore, in this study, we progressively integrate multiscale
interclass segmentation. features to enrich the representation of defect features. At the
Qu et al. [39] found that deep supervision of high-level same time, we also introduce a method of similarity mea-
features at each stage can highlight defective regions and surement to allow areas with significant differences between
obtain state-of-the-art results on three public datasets. There- global and local features to obtain higher activation response,
fore, we propose a new dual-branch network called DDSNet, highlighting defect areas and improving defect segmentation
consisting of edge and semantic branches. This structure accuracy.
introduces the GLF module to extract and fuse global and
local information from features at different scales. We also III. P ROPOSED M ETHOD
designed a novel U-shaped semantic branch to ensure it can
In this section, we first outline the overall architecture of
adapt to multiscale defects and provide more information to
the proposed model for defect semantic segmentation. Then,
the edge branch. In addition, we introduce multiple auxiliary
we present the details of GLF. Finally, we introduce an effec-
supervision heads in both the edge and semantic branches to
tive auxiliary training strategy to enhance model segmentation
enhance the network’s ability to perceive edge information and
accuracy.
handle defects at different scales.
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316
Fig. 5. Overview of the basic architecture of our proposed DDSNet. PAPPM represents the context aggregation module, proposed by PIDNet [44].
Stage
semantic and edge branches to enhance the model’s ability where X1/8 i represents the output of the ith stage in the border
to extract semantic and edge information. The EdgeHead is j
branch. X1/16 represents the output of the jth stage of the first
connected with features in the edge branch through the edge layer network in the semantic branch (3 ≥ k ≥ 1, 3 ≥ j ≥ 1).
auxiliary supervision task to learn boundary information. The 2) Semantic Branch: To enhance the expressive ability of
semantic ground-truth generation generates binary boundary the network for defect characterization to address the issue
ground-truth values to guide the learning of boundary features. of inconsistent intraclass segmentation results, we design a
The boundary loss is calculated using BoundaryCrossEntropy novel U-shaped architecture inspired by UNet [24]. In this
to measure the difference between the predicted boundary architecture, we design it as a three-stage network since
values and the boundary ground-truth values. In the semantic fewer network layers can effectively retain more detailed
branch, we use the GLF module to fuse global and local information and make the network more lightweight. Then,
information between different-scale features and connect the we replace regular convolution blocks in the encoder and
SegHead with features through the semantic auxiliary seg- decoder with residual bottleneck blocks, which can improve
mentation task to learn multiscale semantic information. The computational efficiency and enhance feature representation
semantic seg loss is calculated using OhenCrossEntropy to capability. Finally, we employ the GLF to replace concate-
measure the difference between the predicted semantic values nation to fuse the feature information from different stages to
and the semantic ground-truth values. Hence, the entire frame- utilize global and local features in the feature space effectively.
work consists of three loss parts: the final segmentation loss The feature fusion process in the semantic branch network can
Lossout , the auxiliary segmentation loss LossOCE , and the edge be formulated as follows:
auxiliary supervision loss LossBCE . In the following, we will
provide detailed explanations of the designs of the edge branch X21/16 = Conv GLF X 1/16
1
, X 1/32
1
(2)
and the semantic branch.
X21/32 1
, 1
= Conv GLF X 1/32 X 1/64 (3)
1) Border Branch: In order to preserve more detail infor-
X31/16 X 1/16 ,
2 2
mation, we let the border branch generate feature maps = Conv GLF X 1/32 (4)
with a resolution of 1/8 of the input image resolution. It is
important to note that the border branch does not involve any where X ri represents the output of the ith stage in the layer of
downsampling operation and has a one-to-one correspondence resolution r in the semantic branch.
with the semantic branch to establish the mutual relationship Compared to other dual-branch networks, our designed
between border information and semantic information. The network can extract and preserve more semantic information
corresponding relationship can be written as and border information, efficiently handle multiscale features,
and enhance long-range information transmission. In addition,
GLF promotes the fusion of features at different levels and
Stagei Stagei j
X1/8 = X1/8 + bilinear Conv1×1 X1/16 (1)
enriches the representation of defect features.
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316
TABLE I
B RIEF OVERVIEW OF D IFFERENT F EATURE F USION M ETHODS
l = LocalContext(m)
= BN(Conv1×1 (ReLU(BN(Conv1×1 (m))))). (14)
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024
segmentation results. Inspired by [51], we add boundary aluminum blocks. Therefore, we have recollected and
auxiliary supervision tasks at each stage of the border branch annotated a dataset of 1600 aluminum block defects with
network to guide the deep network in extracting border a resolution of 640 × 480. It includes four types of
information. This binary segmentation task effectively cap- defects: hole, scratch, fold, and dirt. Pixel-level annota-
tures feature information of defect boundaries. As shown in tions were labeled by labelme.
Fig. 7(b), we use the canny operator to process the semantic 5) Ste-Seg: Steel foil is one of the important products of
ground truth (SG) in semantic segmentation tasks to obtain the Benxi Steel Group, but defects such as holes, scratches,
boundary ground truth (BG) in the edge detection task. The and folds are prone to occur during the production
pixels in BG are only 0 and 1, where 0 represents nonedge process. In order to improve the efficiency and accuracy
positions, and 1 represents edge positions. We use the dilation of steel foil defect detection, we collected a dataset
operation in image morphology to thicken the edge positions for steel foil surface defects. The dataset consists of
and enable the network to learn border information better in 1684 images with a resolution of 1920 × 1080 pixels.
BG. Finally, the network is supervised by BG to generate Due to the limited number of defect samples available
feature maps that contain edge details. from the industrial environment, some defect samples
During the training stage, the semantic segmentation head are made by ourselves.
utilizes OhemCrossEntropy Loss, while the boundary detec-
tion head employs BoundaryCrossEntropy Loss. Hence, the
overall loss of the network consists of three components B. Implementation Details
3
X We conducted experiments for this study based on PyTorch.
Loss = λ1 Lossout + λ2 LossOCE (Psi , SG) The training was implemented on an NVIDIA GeForce RTX
i=1 3060Ti GPU. During the training process, to solve the possible
3
X over-fitting problem, we first performed data enhancement
LossBCE Pbj , BG
+ λ3 (16) on the input images, including random horizontal flipping,
j=1 random resizing and random cropping, and photometric defor-
where Loss is the final loss value. Lossout is the segmentation mation to enhance the model’s generalization ability. Then,
loss. Psi is the semantic prediction value of each layer in the we adopted SGD as the optimizer and introduced a polynomial
semantic branch. LossOCE is the loss value of each semantic decay strategy (PolyLR) aiming at suppressing the model
auxiliary segmentation task in the semantic branch. Pbj is complexity using regularization, in which the initial learning
the boundary prediction value of each stage in the border rate was set to 0.001, the momentum to 0.9, and the weight
branch. LossBCE is the loss value of each auxiliary boundary decay of the polynomial decay strategy was set to 0.0005.
detection task in the border branch. To optimize the model’s To address the sample imbalance problem, we adopted a class
performance, we conducted numerous experiments, building weight adjustment strategy, in which the loss contributions of
upon previous works [38], [44], [52], and finally determined different classes are weighted. At the same time, we introduced
the weight combination of λ1 = 1, λ2 = 0.4, and λ3 = 20. the online hard example mining (OHEM) strategy, which
focuses on the harder-to-classify samples during training by
IV. E XPERIMENTS using OhemCrossEntropy as the loss function, which further
improves the model’s ability to adapt to complex scenarios
In this section, we first introduce the dataset and imple- and mitigates the risk of over-fitting. The hyperparameters α
mentation details. Then, we analyze the impact of the GLF and β in (10) and (11) are set to 2. The batch size is set to
module of our proposed method and the training strategy on 8. The total number of iterations for the experiment is set to
the accuracy of the MSD [6] test set. Finally, we compare our 80k. The variation curves of mIoU, mAcc, and loss during the
algorithm with other methods. training process are shown in Fig. 8.
A. Datasets
1) MSD: This dataset [6] consists of 1200 images and C. Ablation Study
contains three types of defects: oil, start, and scratch. In this section, we will introduce ablation experiments to
Each image has a resolution of 1920 × 1080. validate the effectiveness of our method. Our model is trained
2) NEU-Seg: This dataset [32] includes three typical on the training set of NEU-Seg [32] and evaluated on the
defects on the surface of hot-rolled strip steel: inclusion, test set. Furthermore, we visualize the segmentation results of
patch, and scratch. Each image has a resolution of 200 × some data to provide a more intuitive demonstration of the
200, and each category includes 300 images. superiority of our method.
3) MT-Defect: This dataset [3] consists of 392 defect 1) Effectiveness of Improvements on UNet: In the semantic
images and 952 nondefect images. It includes five types branch, to validate the effectiveness of our series of improve-
of defects: blowhole, break, crack, fray, and uneven, with ments to UNet, we analyzed the impact of each improvement
image resolutions ranging from 105 × 283 to 388 × 516. on the network performance, as shown in Table II. With only
4) Alu-Seg: Aluminum blocks are representative industrial a slight impact on accuracy, our improvements significantly
products, but there is a scarcity of datasets specifi- reduce the number of parameters and computation of the
cally designed for surface defect segmentation tasks in model.
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316
Fig. 8. Performance of model training on MSD. (a) Comparisons of mIoU. (b) Comparisons of mAcc. (c) Performance of loss.
TABLE II
A BLATION E XPERIMENT ON THE NEU-S EG T EST S ET OF I MPROVEMENTS
ON UN ET. UN ET-5S TAGES R EPRESENTS THE UN ET W ITH F IVE
S TAGES , UN ET-3S TAGES R EPRESENTS THE UN ET W ITH T HREE
S TAGES , UN ET-3S TAGES -RB R EPRESENTS THE UN ET
W ITH T HREE S TAGES , AND U SES R ESIDUAL
B OTTLENECK B LOCKS
Fig. 9. Visualization results of GLF on the NEU-Seg test set. (a) Input image
TABLE III and ground truth. (b) Layer1 without GLF, Layer2 without GLF. (c) Layer1
A BLATION E XPERIMENT ON NEU-S EG T EST S ET OF GLF with GLF, Layer2 without GLF. (d) Layer1 without GLF, Layer2 with GLF.
M ODULE . L AYER 1∼L AYER 2 R EPRESENTS W HETHER TO (e) Layer1 with GLF, Layer2 with GLF.
U SE THE GLF M ODULE IN THE S EMANTIC B RANCH TABLE IV
N ETWORK AT T HAT L AYER
A BLATION E XPERIMENT ON NEU-S EG T EST S ET OF S EMANTIC
AUXILIARY TASKS . L AYER 1∼L AYER 3 R EPRESENTS W HETHER
TO U SE THE S EMANTIC AUXILIARY S EGMENTATION
TASK IN THE S EMANTIC B RANCH N ETWORK
AT T HAT L AYER
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024
TABLE V TABLE VI
A BLATION S TUDY ON NEU-S EG T EST S ET OF B OUNDARY C OMPARISONS ON THE T EST S ET OF NEU-S EG . T HESE M ARKED
AUXILIARY TASKS . S TAGE 1∼S TAGE 3 R EPRESENTS W ITH * I NDICATE T HAT THE ACCURACY OF S OME M ETHODS
W HETHER TO U SE THE B OUNDARY AUXILIARY A RE M EASURED BY [6], W HILE THE R EST OF
S EGMENTATION TASK IN THE B OUNDARY THE ACCURACY VALUES W ERE M EASURED
B RANCH N ETWORK AT T HAT S TAGE ON O UR P LATFORM
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316
TABLE VIII
C OMPARISONS ON THE T EST S ET OF MT-D EFECT [3]. T HESE M ARKED
W ITH * I NDICATE T HAT THE ACCURACY OF S OME M ETHODS A RE
M EASURED BY [6], W HILE THE R EST OF THE ACCURACY VALUES
W ERE M EASURED ON O UR P LATFORM
Fig. 12. Visualization of feature maps on the NEU-Seg test set. (a) Input
images. (b) ConvNext [59]. (c) SegNext [60]. (d) DDSNet.
TABLE VII
C OMPARISONS ON THE T EST S ET OF MSD [6]. T HESE M ARKED
W ITH * I NDICATE T HAT THE ACCURACY OF S OME M ETHODS
A RE M EASURED BY [6], W HILE THE R EST OF
THE ACCURACY VALUES W ERE M EASURED
ON O UR P LATFORM
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024
Fig. 13. Comparison of semantic segmentation results on the MSD test set. (a) Input images. (b) Ground truth images. (c) Results based on ConvNext [59].
(d) Results based on SegNext [60]. (e) Results based on DDSNet.
TABLE IX
C OMPARISONS ON THE T EST S ET OF A LU -S EG . A LL THE ACCURACY
VALUES OF M ETHODS W ERE M EASURED ON O UR P LATFORM
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316
TABLE XI
P ERFORMANCE OF D IFFERENT VALUES OF α AND β IN GLF ON NEU-S EG
TABLE XIII
P ERFORMANCE OF GLF IN D IFFERENT M ODELS ON NEU-S EG
TABLE XIV
P ERFORMANCE OF GLF IN D IFFERENT M ODELS ON COCO2014
Fig. 17. Visualization of heat maps on NEU-Seg for different values of α
and β. (a) Images. (b) α and β are set to 1. (c) α and β are set to 2. (d) α
and β are set to 3.
D. Limitations of DDSNet
We propose a new dual-branch semantic segmentation net-
work to extract the semantic contextual information and border
information of defects. Although it improves the segmentation
accuracy of defects, there are still some limitations. To achieve small, it can lead to overfitting of the model, failing to
better segmentation accuracy, a large number of data samples cover most of the features and variations in the data space.
are required for training. Specifically, when the dataset is Therefore, DDSNet achieves excellent segmentation accuracy
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316
[25] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “BiseNet: Bilateral [48] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
segmentation network for real-time semantic segmentation,” in Proc. image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
Eur. Conf. Comput. Vis. (ECCV), Sep. 2018, pp. 325–341. (CVPR), Jun. 2016, pp. 770–778.
[26] Z. Li et al., “Complementation-reinforced network for integrated [49] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,
reconstruction and segmentation of pulmonary gas MRI with high “Feature pyramid networks for object detection,” in Proc. IEEE Conf.
acceleration,” Med. Phys., vol. 51, no. 1, pp. 378–393, Jan. 2024. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 936–944.
[27] Y. Yang, Y. He, H. Guo, Z. Chen, and L. Zhang, “Semantic segmen- [50] C. Yu, C. Gao, J. Wang, G. Yu, C. Shen, and N. Sang, “BiSeNet
tation supervised deep-learning algorithm for welding-defect detection v2: Bilateral network with guided aggregation for real-time semantic
of new energy batteries,” Neural Comput. Appl., vol. 34, no. 22, segmentation,” Int. J. Comput. Vis., vol. 129, no. 11, pp. 3051–3068,
pp. 19471–19484, Nov. 2022, doi: 10.1007/s00521-022-07474-0. Nov. 2021.
[28] R. Neven and T. Goedemé, “A multi-branch U-Net for steel surface [51] M. Fan et al., “Rethinking BiSeNet for real-time semantic segmentation,”
defect type and severity segmentation,” Metals, vol. 11, no. 6, p. 870, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
May 2021. Jun. 2021, pp. 9711–9720.
[29] X. Li, W. Wang, X. Hu, and J. Yang, “Selective kernel networks,” [52] X. Shi et al., “BSSNet: A real-time semantic segmentation network for
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2019, road scenes inspired from AutoEncoder,” IEEE Trans. Circuits Syst.
pp. 510–519. Video Technol., vol. 34, no. 5, pp. 3424–3438, May 2024.
[30] H. Zhang et al., “ResNeSt: Split-attention networks,” in Proc. IEEE/CVF [53] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing
Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2022, network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
pp. 2735–2745. Jul. 2017, pp. 6230–6239.
[31] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation [54] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam,
networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 8, “Encoder-decoder with atrous separable convolution for semantic
pp. 2011–2023, Aug. 2020. image segmentation,” in Computer Vision—ECCV. Cham, Switzerland:
[32] Y. Bao et al., “Triplet-graph reasoning network for few-shot metal Springer, 2018, pp. 833–851.
generic surface defect segmentation,” IEEE Trans. Instrum. Meas., [55] X. Li, Z. Zhong, J. Wu, Y. Yang, Z. Lin, and H. Liu, “Expectation-
vol. 70, pp. 1–11, 2021. maximization attention networks for semantic segmentation,” in
[33] E. Liu, K. Chen, Z. Xiang, and J. Zhang, “Conductive particle detec- Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019,
tion via deep learning for ACF bonding in TFT-LCD manufacturing,” pp. 9166–9175.
J. Intell. Manuf., vol. 31, no. 4, pp. 1037–1049, Apr. 2020. [56] H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, “ICNet for real-time semantic
[34] S. Deng, R. Gao, Y. Wang, W. Mao, and W. Zheng, “Structure of a segmentation on high-resolution images,” in Proc. Eur. Conf. Comput.
semantic segmentation-based defect detection network for laser cladding Vis. (ECCV), Sep. 2018, pp. 405–420.
infrared images,” Meas. Sci. Technol., vol. 34, no. 8, Aug. 2023, [57] T. Wu, S. Tang, R. Zhang, J. Cao, and Y. Zhang, “CGNet: A light-weight
Art. no. 085601. context guided network for semantic segmentation,” IEEE Trans. Image
[35] F. Luo, Y. Cui, and Y. Liao, “MVRA-UNet: Multi-view residual attention Process., vol. 30, pp. 1169–1179, 2021.
U-Net for precise defect segmentation on magnetic tile surface,” IEEE [58] R. P. K. Poudel, S. Liwicki, and R. Cipolla, “Fast-SCNN: Fast semantic
Access, vol. 11, pp. 135212–135221, 2023. segmentation network,” in Proc. Brit. Mach. Vis. Conf., Feb. 2019,
[36] A. F. Kamanli, “A novel multi-scale cross-patch attention with pp. 1–23.
dilated convolution (MCPAD-UNET) for metallic surface defect detec- [59] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A
tion,” Signal, Image Video Process., vol. 18, no. 1, pp. 485–494, ConvNet for the 2020s,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Feb. 2024. Recognit. (CVPR), Jun. 2022, pp. 11966–11976.
[37] Z. Ling, A. Zhang, D. Ma, Y. Shi, and H. Wen, “Deep Siamese semantic [60] M.-H. Guo, C.-Z. Lu, Q. Hou, Z. Liu, M.-M. Cheng, and S.-M. Hu,
segmentation network for PCB welding defect detection,” IEEE Trans. “Segnext: Rethinking convolutional attention design for semantic seg-
Instrum. Meas., vol. 71, pp. 1–11, 2022. mentation,” in Proc. Adv. Neural Inf. Process. Syst., vol. 35, 2022,
[38] H. Pan, Y. Hong, W. Sun, and Y. Jia, “Deep dual-resolution networks pp. 1140–1156.
for real-time and accurate semantic segmentation of traffic scenes,” [61] W. Yu et al., “MetaFormer is actually what you need for vision,” in Proc.
IEEE Trans. Intell. Transp. Syst., vol. 24, no. 3, pp. 3448–3460, IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022,
Mar. 2023. pp. 10809–10819.
[39] Z. Qu, C. Cao, L. Liu, and D.-Y. Zhou, “A deeply supervised convo- [62] Z. Xu, D. Wu, C. Yu, X. Chu, N. Sang, and C. Gao, “Sctnet:
lutional neural network for pavement crack detection with multiscale Single-branch CNN with transformer semantic information for real-
feature fusion,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 9, time segmentation,” in Proc. AAAI Conf. Artif. Intell., vol. 38, 2024,
pp. 4890–4899, Sep. 2022. pp. 6378–6386.
[40] P. Lu, J. Jing, and Y. Huang, “MRD-Net: An effective CNN-based seg- [63] J. Wang et al., “Rtformer: Efficient design for real-time semantic
mentation network for surface defect detection,” IEEE Trans. Instrum. segmentation with transformer,” in Proc. Adv. Neural Inf. Process. Syst.,
Meas., vol. 71, pp. 1–12, 2022. vol. 35, 2022, pp. 7423–7436.
[41] J. Zhu, G. He, and P. Zhou, “MFNet: A novel multilevel feature fusion [64] Q. Wan, Z. Huang, J. Lu, G. Yu, and L. Zhang, “SeaFormer++:
network with multibranch structure for surface defect detection,” IEEE Squeeze-enhanced axial transformer for mobile visual recognition,”
Trans. Instrum. Meas., vol. 72, pp. 1–11, 2023. 2023, arXiv:2301.13156.
[42] T. Liu, Z. He, Z. Lin, G.-Z. Cao, W. Su, and S. Xie, “An adaptive image [65] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo,
segmentation network for surface defect detection,” IEEE Trans. Neural “Segformer: Simple and efficient design for semantic segmentation with
Netw. Learn. Syst., vol. 1, no. 1, pp. 1–14, Jan. 2022. transformers,” Comput. Vis. Pattern Recognit., vol. 1, no. 1, pp. 1–24,
[43] J. Cheng, G. Wen, X. He, X. Liu, Y. Hu, and S. Mei, “Achieving the May 2021.
defect transfer detection of semiconductor wafer by a novel prototype [66] J. Zhang, K. Yang, A. Constantinescu, K. Peng, K. Muller, and
learning-based semantic segmentation network,” IEEE Trans. Instrum. R. Stiefelhagen, “Trans4Trans: Efficient transformer for transparent
Meas., vol. 73, pp. 1–12, 2024. object and semantic scene segmentation in real-world navigation
[44] J. Xu, Z. Xiong, and S. P. Bhattacharyya, “PIDNet: A real-time assistance,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 10,
semantic segmentation network inspired by PID controllers,” in Proc. pp. 19173–19186, Oct. 2022.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, [67] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for
pp. 19529–19539. dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
[45] N. Zhang, J. Li, Y. Li, and Y. Du, “Global attention pyramid network Oct. 2017, pp. 2999–3007.
for semantic segmentation,” in Proc. Chin. Control Conf., Jul. 2019, [68] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards
pp. 1–26. real-time object detection with region proposal networks,” IEEE
[46] W. Yuan, S. Wang, X. Li, M. Unoki, and W. Wang, “A skip attention Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149,
mechanism for monaural singing voice separation,” IEEE Signal Pro- Jun. 2017.
cess. Lett., vol. 26, no. 10, pp. 1481–1485, Oct. 2019. [69] Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional
[47] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Training very deep one-stage object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.
networks,” in Proc. Neural Inf. Process. Syst., Dec. 2015, pp. 1–23. (ICCV), Oct. 2019, pp. 9626–9635.
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024
Zhenyu Yin received the B.S., M.Sc., and Ph.D. Xiaoqiang Shi is currently pursuing the master’s
degrees from Northeastern University, Shenyang, degree in applied computer science and technol-
China, in 2001, 2004, and 2007, respectively, all in ogy with the University of Chinese Academy of
computer science. Sciences, Beijing, China.
He was a Post-Doctoral Researcher with the His main research interests include deep learning,
Shenyang Institute of Computing Technology, semantic segmentation, object detection, and related
Chinese Academy of Sciences, Shenyang, and the fields.
Shenyang Institute of Automation, Shenyang, from
October 2008 to October 2011. He is currently a
Professor with the Shenyang Institute of Comput-
ing Technology, Chinese Academy of Sciences. His
research interests include the Industrial Internet of Things, machine learning,
artificial intelligence, industrial embedded systems, FPGA/SoC, numerical
control systems, and functional safety.
Dr. Yin served as the Secretary of Subcommittee 3 on safe control systems Feiqing Zhang received the bachelor’s degree in
for machinery and Technical Committee 231 on electrical systems of industrial engineering from Lanzhou University of Technol-
machinery of standardization administration of China. ogy, Lanzhou, China, in June 2017, and the Doctor’s
degree in engineering from the University of Chinese
Academy of Sciences, Beijing, in June 2024.
Her research interests include the IIOT, edge
computing, and deep learning.
Li Qin is currently pursuing the master’s degree
with the School of Computer Science and Technol-
ogy, University of Chinese Academy of Sciences,
Beijing, China.
His main research interests include deep learning,
semantic segmentation, autonomous driving, and
related fields.
Guangyuan Xu received the Bachelor of Science
from Jilin University, Changchun, China, in July
2015. He is currently pursuing the Ph.D. degree
in applied computer science and technology with
the University of Chinese Academy of Sciences,
Beijing, China.
His research interests include the IIOT, edge
computing, and deep learning.
Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.