0% found this document useful (0 votes)
20 views16 pages

DDSNet Deep Dual-Branch Networks For Surface Defect Segmentation

The document presents DDSNet, a novel dual-branch network designed for semantic segmentation of surface defects in industrial manufacturing. It addresses issues of inconsistent intraclass and indistinguishable interclass segmentation results by integrating semantic and border information, along with a global and local feature fusion module. Experimental results demonstrate that DDSNet outperforms existing methods across multiple datasets, including newly introduced datasets for steel foil and aluminum block defects.

Uploaded by

brily0213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views16 pages

DDSNet Deep Dual-Branch Networks For Surface Defect Segmentation

The document presents DDSNet, a novel dual-branch network designed for semantic segmentation of surface defects in industrial manufacturing. It addresses issues of inconsistent intraclass and indistinguishable interclass segmentation results by integrating semantic and border information, along with a global and local feature fusion module. Experimental results demonstrate that DDSNet outperforms existing methods across multiple datasets, including newly introduced datasets for steel foil and aluminum block defects.

Uploaded by

brily0213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL.

73, 2024 2525316

DDSNet: Deep Dual-Branch Networks for Surface


Defect Segmentation
Zhenyu Yin , Li Qin , Guangjie Han , Fellow, IEEE, Xiaoqiang Shi , Feiqing Zhang , Guangyuan Xu ,
and Yuanguo Bi , Member, IEEE

Abstract— Semantic segmentation of surface defects is essential manual inspection, which could be more efficient and costly.
to ensure product quality in intelligent manufacturing. However, In addition, traditional methods based on texture analysis
due to the diversity and complexity of industrial scenarios and lack robustness for diverse tasks. With the rapid development
defects, existing defect semantic segmentation methods still suf-
fer from inconsistent intraclass and indistinguishable interclass of deep learning technologies, convolutional neural networks
segmentation results. To overcome these problems, we propose a (CNNs) are widely used in image detection, replacing tra-
new dual-branch surface defect semantic segmentation network, ditional handcrafted feature extraction methods. CNN-based
DDSNet. First, we integrate semantic and border information methods have been extensively applied in various industrial
to enrich the feature representation of defects and solve the fields, such as railway track defects [1], steel strips defects [2],
problem of indistinguishable interclass segmentation results.
Next, we introduce a global and local feature fusion (GLF) magnetic tile defects [3], road cracks [4], printed circuit board
module based on similarity metrics to guide the network in defects [5], and mobile phone screens defects [6]. However,
further refining and highlighting the detail feature on defects to these methods, mainly based on object detection, cannot
solve the problem of inconsistent intraclass segmentation results. remarkably highlight the specific location of defects, such as
In addition, to enrich the surface defect segmentation datasets, minor scratches or cracks on steel surfaces. Therefore, they
we collect datasets of steel foil surface defects, Ste-Seg, and
aluminum block surface defects, Alu-Seg. Experimental results cannot satisfy high-precision requirements. Methods based on
for five datasets of semantic segmentation of defects show that semantic segmentation have been widely utilized in defect
DDSNet outperforms the state-of-the-art methods in terms of detection to detect minor defects accurately.
mIoU (NEU-Seg: 85.12%, MT-Defect: 76.51%, MSD: 91.82%, Surface defect detection methods based on CNNs can be
Ste-Seg: 90.01%, and Alu-Seg: 84.77%). All our experiments divided into three main categories, including pixel-level defect
were conducted on a NVIDIA GTX 3060Ti. The dataset and
code are available at https://github.com/QinLi-STUDY/DDSNet. segmentation methods, region-level defect detection methods,
and image-level defect classification methods. Among them,
Index Terms— Boundary features, deep learning, feature
image-level defect classification can classify the defects in the
fusion, semantic segmentation, surface defect detection.
image [7], [8], [9], [10], but the position of the defects is
not located. Region-level defect detection can only roughly
I. I NTRODUCTION locate the position of the defects through the bounding box
and cannot recognize the size of the defects [11], [12], [13],
S URFACE defect detection aims to identify abnormal
regions on the surface of materials and workpieces. Hence,
high precision is often required in many industrial applications.
[14]. Compared with region-level defect detection methods and
image-level defect classification methods, pixel-level defect
However, traditional defect detection methods mainly rely on segmentation methods can accurately segment defect regions
and provide defect location and type information [1], [15],
Manuscript received 11 April 2024; revised 7 June 2024; [16], [17]. Therefore, in this article, we will continue to study
accepted 28 June 2024. Date of publication 15 July 2024; date of
current version 24 July 2024. This work was supported by the Special Project pixel-level defect segmentation methods.
for Industrial Foundation Reconstruction and High Quality Development of Since the appearance of FCN [18], semantic segmentation
Manufacturing Industry under Grant TC230A076-13. The Associate Editor has been widely applied in various fields. In the field of USVs,
coordinating the review process was Dr. Ferdinanda Ponci. (Zhenyu Yin
and Li Qin contributed equally to this work.) (Corresponding authors: Yang et al. [19] proposed a method for ship detection and
Zhenyu Yin; Guangjie Han.) waterway channel segmentation, which achieved significant
Zhenyu Yin, Li Qin, Xiaoqiang Shi, Feiqing Zhang, and Guangyuan results. In the field of medicine, Singh et al. [20] proposed
Xu are with the Shenyang Institute of Computing Technology, University
of Chinese Academy of Sciences, Beijing 100049, China, also with a method for heart segmentation to achieve effective seg-
the Shenyang Institute of Computing Technology, Chinese Academy of mentation of the left atrium. Similarly, in the field of defect
Sciences, Shenyang 110168, China, and also with the Liaoning Key detection, a series of pixel-level defect segmentation methods
Laboratory of Domestic Industrial Control Platform Technology on Basic
Hardware and Software, Shenyang 110168, China (e-mail: congmy@ have been proposed to improve the localization accuracy of
163.com; [email protected]; [email protected]; defects [21], [22], [23]. These networks follow the design
[email protected]; [email protected]). rule of UNet [24], as shown in Fig. 1(a), which is a typical
Guangjie Han is with the Department of Internet of Things Engineering,
Hohai University, Changzhou 213022, China (e-mail: hanguangjie@ encoder–decoder structure. The encoder encodes and com-
gmail.com). presses the features, and the decoder restores the resolution of
Yuanguo Bi is with the School of Computer Science and Engineering, the compressed feature map to the size of the input image. The
Northeastern University, Shenyang 110167, China (e-mail: biyuanguo@
mail.neu.edu.cn). encoding and decoding stages perform feature fusion through
Digital Object Identifier 10.1109/TIM.2024.3427806 concatenation. Although this structure can effectively improve
1557-9662 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024

Fig. 1. Illustration of architectures of UNet [24], BiSeNetV1 [25],


and our proposed approach. (a) UNet [24], which introduced a typical
encoder–decoder structure to capture more semantic context information.
(b) BiSeNetV1 [25], which is a classical dual-branch network. (c) Our
proposed method.

Fig. 3. Hard examples in surface defect semantic segmentation. The first row
shows the problem of inconsistent intraclass segmentation results. The second
row shows the problem of indistinguishable interclass segmentation results.
(a) Input images. (b) Ground truth images. (c) BiSeNetV1 [25]. (d) DDSNet.

within classes and indistinguishable interclass segmentation


results between classes by enhancing the ability of the model
to express features of the same class of defects and to perceive
defect borders. Specifically, DDSNet is a dual-branch network
structure, as shown in Fig. 1(c). The network starts with
a backbone network and is divided into two subnetworks:
a border branch network and a semantic branch network.
Fig. 2. Small defect examples in surface defect semantic segmentation. In order to solve the problem of indistinguishable interclass
(a) Input images. (b) Ground truth images. (c) UNet [24]. (d) DDSNet.
segmentation results, we fuse semantic features and border
features in the border branch to enrich the feature information
the accuracy of defect segmentation, it is prone to losing of defects and distinguish different defects. In order to solve
some detail information during training and cannot be easily the problem of inconsistent intraclass segmentation results,
recovered [25]. It is easy to ignore the defects of tiny sizes, we propose a global and local feature fusion (GLF) module
as shown in Fig. 2, which is less capable of recognizing the based on similarity metrics. Compared with existing feature
inclusions of smaller sizes and shallower scratches. Therefore, fusion methods (SKNet [29], ResNeSt [30], UNet [24], and
in order to be able to obtain more defect information, the SENet [31]), GLF enhances the ability of model to repre-
researchers proposed a series of dual-branch networks [26], sent defect features by highlighting abnormal areas through
[27], [28]. A classical structure is BiSeNetV1 [25], as shown in similarity measurement when fusing local and global features.
Fig. 1(b), which can improve the network’s ability to recognize Aluminum blocks and steel foil are both important industrial
subtle defects by using multiple branches to extract multiscale products of Benxi Steel Group, but defects such as holes,
feature information. Although this structure better preserves scratches, and folds are prone to occur in production. In order
the detail information of defects, it still faces challenges when to improve the segmentation accuracy of aluminum blocks
dealing with defects with high local similarity and low contrast and steel foils and to enrich the dataset for surface defect
with the background. In this case, the defect segmentation task segmentation, we collected and annotated the surface defect
may suffer from inconsistent intraclass and indistinguishable dataset Alu-Seg for aluminum blocks. Then, we collected
interclass segmentation results. As shown in the first row of the surface defect dataset Ste-Seg for steel foils from the
Fig. 3, the problem of inconsistent intraclass segmentation actual industrial environment through our image acquisition
results occurs when regions with significant appearance dif- instrument, as shown in Fig. 4. It consists of an industrial
ferences share the same semantic labels. Due to the different camera, light source, material, and server.
depths of scratches, the appearance of defects can change To validate the effectiveness of our method, we evaluated
significantly, which leads to the part of defects belonging to DDSNet on Alu-Seg and Ste-Seg datasets and conducted
scratches being incorrectly categorized as background. The extensive experiments on public datasets, including MSD [6],
problem of indistinguishable interclass segmentation results MT-Defect [3], and NEU-Seg [32]. We also performed abla-
occurs when visually similar regions have different semantic tion experiments to understand each module of DDSNet better.
labels, as shown in the second row of Fig. 3. Certain back- The main contributions are summarized as follows.
ground regions are highly similar to inclusions and incorrectly 1) We propose a steel foil surface defect dataset, Ste-Seg,
categorized as patch defects. and an aluminum block surface defect dataset, Alu-Seg.
To address the above problems, we propose a new surface The availability of these datasets is expected further
defect segmentation network, DDSNet, which aims to solve to advance the research on surface defect segmentation
the problem of inconsistent intraclass segmentation results techniques.

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316

achieved on two datasets. Luo et al. [35] proposed a multiview


residual attention UNet-based method, which can recognize
similar feature defects and segment tiny defects by introducing
Gaussian residual attention convolution (GRAC) blocks in a
low-contrast environment, thus solving the problems of low
contrast and difficult detection of defective regions under
the influence of background noise in the segmentation of
defects on the surface of magnetic tiles. The mIoU in the
tile surface defect dataset is 87.7%. Kamanli [36] proposed
a dilated convolution-based multiscale cross-patch attention
(MCPAD-UNet) method to solve the problem of accurate
and automatic defect segmentation in surface defect detection
Fig. 4. Image recognition instrument. (1) Camera. (2) Source light. (3) of industrial products by utilizing a dilated convolutional
Material. (4) Server. cross-patch attention module to capture channel spatial data
and integrating multiscale features to enhance semantic infor-
mation. A dice score of 95.3% is achieved on the steel defect
2) We propose a novel dual-branch surface defect semantic detection dataset.
segmentation network, DDSNet, to address the issue Although these UNet-based methods can effectively
of inconsistent intraclass and indistinguishable inter- improve defect segmentation accuracy, they are prone to
class segmentation results in surface defect semantic losing detail information during training and cannot be easily
segmentation tasks. recovered [25]. Therefore, to retain more detail information,
3) We propose an efficient feature fusion module, GLF, we propose a dual-branch network structure for the semantic
which effectively fuses the global and local features of segmentation of defects.
defects in the feature space and can be easily integrated
into other models. B. Dual-Branch-Based Methods
4) Extensive experiments of DDSNet on MSD [6], MT-
In the defect segmentation task, defects usually possess
Defect [3], NEU-Seg [32], Alu-Seg, and Ste-Seg datasets
different sizes of scales, and the dual-branch network can
show that DDSNet achieves state-of-the-art performance
better adapt to these differences by learning features of
compared to other well-known pixel-level segmentation
different scales separately, improving the model’s ability to
methods.
detect and segment various complex defects. Yang et al. [27]
The rest of this article is structured as follows. In Section II,
proposed a dual-branch network structure for welding defect
we review previous work related to our proposed method.
detection, which adapts to the size variation of defects by
In Section III, we introduce the proposed model for defect
acquiring their spatial and contextual information and realizes
semantic segmentation in detail. In Section IV, we analyze
accurate segmentation of welding defects at the pixel level.
the experimental results and compare our proposed framework
On the weld image dataset, this method achieved a mIoU of
with the state-of-the-art methods. In Section V, we discuss the
86.704%. Zhang et al. [6] proposed a dual-branch network that
selection of parameters in GLF, the generalization of GLF, and
enhanced the coding of boundary details and semantic context
the limitations of DDSNet. In Section VI, we conclude this
by introducing two auxiliary tasks, as well as dealing with
article and present some future studies of this work.
the local similarity problem of surface defects by utilizing
the global context up-sampling (GCU) module. Excellent
II. R ELATED W ORK detection results were achieved on three datasets. Ling et al.
This section shows the related work in defect segmentation, [37] proposed a semantic segmentation of minor defects in
including UNet-based methods, dual-branch-based methods, PCB welding is realized by combining similarity metrics with
and feature fusion modules. coding and decoding semantic segmentation network using
two encoders with shared weights in a two-branch network
structure to solve the problems of overfitting and detection
A. UNet-Based Methods of defects with small sizes and irregular shapes in joint defect
UNet [24] is a classical encoder–decoder structure for detection. Yang et al. [15] proposed a dual-branch network that
semantic segmentation, which is also widely used in defective solved the problem of recognition and segmentation of minor
semantic segmentation. Liu et al. [33] proposed a deep residual size scratches by utilizing the multiscale features. Experiments
network based on UNet, which solves the challenging problem conducted on three scratch datasets show that the method
of automatic detection of conductive particle quality by learn- has higher accuracy than the current mainstream methods.
ing particle features and achieves high detection accuracy and However, for defects with high similarity and low contrast, the
recall rate, which is far better than previous work. Deng et al. existing methods are prone to the problems of indistinguish-
[34] solved the problems that the original network cannot able interclass segmentation results and inconsistent intraclass
achieve end-to-end output, poor segmentation of complex segmentation results during detection.
objects, and efficient segmentation of defective regions of However, for defects with high similarity and low con-
the fused cladding layer. Excellent detection results were trast, the existing methods are prone to the problems of

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024

indistinguishable interclass segmentation results and incon- the pyramid feature fusion module, which allows adequate
sistent intraclass segmentation results during detection. information to be propagated from the low-resolution fused
For example, DDRNet [38] and BiSeNet [25] belong to feature map to the high-resolution fused feature map. Experi-
dual-branch network structures. BiSeNet [25] designed a ments performed on four datasets show state-of-the-art results.
spatial branch to preserve spatial positional information by Zhu et al. [41] proposed an improved feature space pyramid
maintaining a higher resolution and generating high-resolution pool (ASPP) module to extend the acceptance domain (RF) of
feature maps. It also designed a fast downsampling semantic low-level features and then introduced a global attention mod-
branch to obtain a large receptive field. This method achieved a ule for multilevel feature fusion on the multibranch structure of
mIoU of 75.8 on the Cityscapes dataset. DDRNet [38] aimed the decoder to enhance the effectiveness of the features. The
to balance resolution and inference speed, so it designed a accuracy can be improved to 98.54%, 99.82%, and 99.79%
deep dual-resolution network. It includes a high-resolution on three representative defect datasets: CrackForest, Kolektor,
spatial branch with 1/8 resolution and a low-resolution seman- and RSDD. Liu et al. [42] proposed a multishuffle-block
tic branch with multiple downsampling. The low-resolution dilated convolution module to fuse multiscale defect features
and high-resolution branches are paired at different stages to capture the feature information of tiny defects. Experimental
to fuse spatial and semantic information comprehensively. results show that the method achieves state-of-the-art results
This method achieved a mIoU of 80.4 on the Cityscapes on four surface defect datasets. Cheng et al. [43] proposed
dataset. Although the network mentioned above structures a multiscale residual fusion module (MSRFM) to generate
have achieved significant results in real-time segmentation, feature mapping with minimal background mode interference,
their performance in defect segmentation is relatively poor. so as to further extract defect prototypes, and then obtain
This is because it is difficult to effectively supervise the deep accurate prediction results by calculating the distance map
network in learning defect edge information. Moreover, when between the feature mappings and the defect prototypes. The
fusing high-resolution and low-resolution features, direct use method achieved mIoU of 83.49% and 80.12% in defect
of upsampling factors greater than two may result in the loss transfer detection between two background pattern wafers.
of small-scale defect information. As a result, when dealing Feature fusion can integrate multiple-scale defect informa-
with defects with low contrast and high local similarity, there tion and effectively improve low-contrast defect recognition.
are problems of inconsistent intraclass and indistinguishable Therefore, in this study, we progressively integrate multiscale
interclass segmentation. features to enrich the representation of defect features. At the
Qu et al. [39] found that deep supervision of high-level same time, we also introduce a method of similarity mea-
features at each stage can highlight defective regions and surement to allow areas with significant differences between
obtain state-of-the-art results on three public datasets. There- global and local features to obtain higher activation response,
fore, we propose a new dual-branch network called DDSNet, highlighting defect areas and improving defect segmentation
consisting of edge and semantic branches. This structure accuracy.
introduces the GLF module to extract and fuse global and
local information from features at different scales. We also III. P ROPOSED M ETHOD
designed a novel U-shaped semantic branch to ensure it can
In this section, we first outline the overall architecture of
adapt to multiscale defects and provide more information to
the proposed model for defect semantic segmentation. Then,
the edge branch. In addition, we introduce multiple auxiliary
we present the details of GLF. Finally, we introduce an effec-
supervision heads in both the edge and semantic branches to
tive auxiliary training strategy to enhance model segmentation
enhance the network’s ability to perceive edge information and
accuracy.
handle defects at different scales.

C. Feature Fusion Modules A. Overall Architecture for DDSNet


Most defect semantic segmentation methods initially focus Border information and semantic information are crucial
on high-level features and ignore the low-level feature infor- for improving the accuracy of semantic segmentation of sur-
mation. However, low-level features are also meaningful. This face defects. However, existing networks cannot fully extract
area investigates how to feature fuse high-level and low-level these types of information. Therefore, we propose a new
feature information. Lu et al. [40] proposed the MFEF module dual-branch network, DDSNet, to extract these two types of
to enhance and fuse deep-level feature mappings in the back- information effectively. The overall architecture diagram of the
bone network to address the problem of defects on the surface DDSNet proposed for surface defect segmentation is shown
of different products with different shapes, sizes, and locations in Fig. 5. In the proposed framework, we use the residual
and real-time end-to-end defect segmentation. Experimental bottleneck block for all convolution operations to improve the
results on three datasets (14 industrial products) show that the framework’s efficiency. DDSNet consists mainly of a semantic
method outperforms other methods regarding generalization branch and an edge branch. The semantic branch is used to
and accuracy. Dong et al. [2] proposed a pyramid feature extract multiscale semantic information of defects, while the
fusion and global contextual attention network method to solve edge branch is used to extract and preserve the edge informa-
the problem of large interclass defect differences in surface tion of defects. The edge information and semantic information
defect detection in industrial production by extracting multi- are fused through PAPPM [44]. During the training phase,
scale features and fusing the features to five resolutions using multiple auxiliary supervision tasks are added to both the

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316

Fig. 5. Overview of the basic architecture of our proposed DDSNet. PAPPM represents the context aggregation module, proposed by PIDNet [44].

Stage
semantic and edge branches to enhance the model’s ability where X1/8 i represents the output of the ith stage in the border
to extract semantic and edge information. The EdgeHead is j
branch. X1/16 represents the output of the jth stage of the first
connected with features in the edge branch through the edge layer network in the semantic branch (3 ≥ k ≥ 1, 3 ≥ j ≥ 1).
auxiliary supervision task to learn boundary information. The 2) Semantic Branch: To enhance the expressive ability of
semantic ground-truth generation generates binary boundary the network for defect characterization to address the issue
ground-truth values to guide the learning of boundary features. of inconsistent intraclass segmentation results, we design a
The boundary loss is calculated using BoundaryCrossEntropy novel U-shaped architecture inspired by UNet [24]. In this
to measure the difference between the predicted boundary architecture, we design it as a three-stage network since
values and the boundary ground-truth values. In the semantic fewer network layers can effectively retain more detailed
branch, we use the GLF module to fuse global and local information and make the network more lightweight. Then,
information between different-scale features and connect the we replace regular convolution blocks in the encoder and
SegHead with features through the semantic auxiliary seg- decoder with residual bottleneck blocks, which can improve
mentation task to learn multiscale semantic information. The computational efficiency and enhance feature representation
semantic seg loss is calculated using OhenCrossEntropy to capability. Finally, we employ the GLF to replace concate-
measure the difference between the predicted semantic values nation to fuse the feature information from different stages to
and the semantic ground-truth values. Hence, the entire frame- utilize global and local features in the feature space effectively.
work consists of three loss parts: the final segmentation loss The feature fusion process in the semantic branch network can
Lossout , the auxiliary segmentation loss LossOCE , and the edge be formulated as follows:
auxiliary supervision loss LossBCE . In the following, we will
provide detailed explanations of the designs of the edge branch X21/16 = Conv GLF X 1/16
1
, X 1/32
1

(2)
and the semantic branch.
X21/32 1
, 1

= Conv GLF X 1/32 X 1/64 (3)
1) Border Branch: In order to preserve more detail infor-
X31/16 X 1/16 ,
2 2

mation, we let the border branch generate feature maps = Conv GLF X 1/32 (4)
with a resolution of 1/8 of the input image resolution. It is
important to note that the border branch does not involve any where X ri represents the output of the ith stage in the layer of
downsampling operation and has a one-to-one correspondence resolution r in the semantic branch.
with the semantic branch to establish the mutual relationship Compared to other dual-branch networks, our designed
between border information and semantic information. The network can extract and preserve more semantic information
corresponding relationship can be written as and border information, efficiently handle multiscale features,
and enhance long-range information transmission. In addition,
GLF promotes the fusion of features at different levels and
  
Stagei Stagei j
X1/8 = X1/8 + bilinear Conv1×1 X1/16 (1)
enriches the representation of defect features.

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024

B. Global and Local Feature Fusion Module


The feature fusion module is crucial for improving the
accuracy of defect segmentation. Besides the commonly
used element-wise summation and concatenation methods,
researchers have proposed several other practical approaches.
For instance, SKNet [29] introduces a dynamic selection
mechanism to choose convolution kernels of the correspond-
ing scales adaptively for multiscale feature fusion. ResNest
[30] fuses cross-channel feature information by applying
channel attention mechanisms on different network branches.
SENet [31] learns and utilizes global feature information of
features through SE blocks, and when performing feature
fusion, it selectively emphasizes useful features and suppresses
irrelevant features. Global attention upsamples (GAU) [45] and L J
Fig. 6. (a) GLF module, where represents Add operation, represents
skip attention (SA) [46] have been proposed to use high-level Mul and Add operation. (b) Design of global attention module. (c) Design
features as guidance to modulate the low-level features in of the local attention module. 1 × 1Conv is a 1 × 1 standard convolution
long skip connections. highway networks first introduced a kernel, which does not change the size of the feature map, only the number
of channels of the feature map.
selection mechanism in short skip connections [47]. However,
these methods fail to consider effectively utilizing global and
local features to enhance the express capability of the model low ∈ R B×C×H ×W ; m is the preliminary fusion feature map
and ignore the similarities and differences in features. Thus, m ∈ R B×C×H ×W ; g is the global attention weight, g ∈
we propose a GLF module based on similarity metrics. R B×C×1×1 ; l is the local attention weight, l ∈ R B×C×H ×W ;
As shown in Fig. 6(a), the GLF consists of three submod- highenhance is the enhanced deep feature map, highenhance ∈
ules: the global attention module, the local attention module, R B×C×H ×W ; lowenhance is the enhanced shallow feature map
and the similarity metric module. Fig. 6(b) and (c) shows the lowenhance ∈ R B×C×H ×W ; ω represents the fusion result of g
design of the global and local attention modules. The GLF and l; and α and β are used to scale the range of feature
utilizes the global attention module to generate global features, values, which helps the network better handle these features
the local attention module to generate local features, and then and alleviates the problem of gradients vanishing or exploding
the similarity metric module to generate the difference features during training. For the values of α, β, and the portability of
between the global features and the local features, and finally the GLF, we will discuss in Section V.
fuses the difference features with the input features using The global attention module is designed to obtain weights
the Mul and Add operations. The input features consist of for each channel in the feature map, reflecting the importance
the output of the shallow network, referred to as low, and the of each channel in the input feature. The proposed global
output of the deep network, referred to as high. Specifically, attention module first employs an adaptive average pooling
the GLF first utilizes bilinear interpolation to upsample the operation to squeeze the spatial dimensions of the input
high features to the same size as the low features and then fuse feature. Then, it performs a point-wise convolution operation
them. The fused results are passed into the global attention on the squeezed feature using a 1 × 1 convolution kernel.
module and local attention module to obtain the global feature Subsequently, the BN and ReLU operations are applied to
g and the local feature l. The obtained g and l are then passed obtain the weights for each channel. The computation process
through the similarity metric module to generate the disparity of the global attention module can be written as
features activated. Finally, the dissimilarity features are fused
with the input features by Mul and Add operations to enhance g = GlobalContext(Global(m))
the feature representation. The overall process can be written = BN(Conv1×1 (ReLU(BN(Conv1×1 (Global(m)))))). (13)
as follows:
The local attention module is designed to extract
high′ = bilinear(high) (5) fine-grained features from the preliminary fused feature map
m = low + high′ (6) m. This module is necessary because when extracting global
g = GlobalContext(m) (7) context features, an extreme method is employed, primarily
focusing on the dominant features within the entire feature
l = LocalContext(m) (8)
map, neglecting numerous fine-grained features. Therefore,
ω = metric(g, l) (9) relying solely on the global attention module is not optimal
highenhance = α · high′ · (1 − ω) (10) for feature extraction. To address this issue, we propose the
lowenhance = β · low · ω (11) local attention module based on the global attention module,
aiming to alleviate the difficulty in extracting fine-grained
out = highenhance + lowenhance (12)
feature information. In this regard, we utilize a 1 × 1 con-
′ ′
where high is the deep input feature map, high ∈ R B×C×H ×W ; volution kernel to perform point-wise convolution operations
high′ is the feature map upsampled by bilinear interpolation, on the preliminary fused feature map m, followed by BN
high′ ∈ R B×C×H ×W ; low is the shallow input feature map, and ReLU operations to obtain the local attention weights.

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316

TABLE I
B RIEF OVERVIEW OF D IFFERENT F EATURE F USION M ETHODS

The computation process of the local attention module can be


written as

l = LocalContext(m)
= BN(Conv1×1 (ReLU(BN(Conv1×1 (m))))). (14)

The similarity metric module measures the semantic dif-


ference between global and local features. Considering that
global features focus on the features of the image as a
whole compared to local features, they are not sensitive to
the features of certain minor defects. In order to highlight
anomalies and defective regions, we use this module to assign
smaller weights to regions with higher similarity between
global and local features and, conversely, higher weights to
regions with lower similarity. With this module, the network Fig. 7. (a) Semantic auxiliary segmentation head. (b) Edge segmentation
will obtain high activation responses for regions with high head. 3 × 3Conv is a 3 × 3 convolution kernel, which is used to extract
semantic information. 1 × 1Conv is a 1 × 1 convolution kernel. According
feature differences or changes in global and local features to the different tasks of the two, 1 × 1Conv is used to adjust the number of
and vice versa, thus facilitating the detection of anomalous channels.
regions or defects. The similarity metric operation is defined
as follows: require more prior knowledge to be provided at design
PD time to specify the optimization objective of the network.
gi × li
metric(g, l) = 1 − qP i=0 P (15)
2 D 2 D 2 C. Auxiliary Training Strategy
i=0 gi × i=0 li
During the training phase, applying suitable auxiliary super-
where g is the global feature, l is the local feature, and D is vised tasks can bolster the segmentation accuracy of the
the channel number of feature maps. model without compromising the inference speed. BiSeNetV2
From Table I, the difference between GLF and other fusion [50] and STDC [51] have implemented model optimization
methods can be further seen. A and B are feature maps from by integrating auxiliary supervised tasks at various stages
different network layers with the same dimensions and sizes. throughout the network. In our dual-branch structure network,
G is the global attention module. There are two main differ- the semantic branch functions to extract semantic contextual
ences in these methods besides the different implementations information, and the border branch is tasked with encoding
of G. border information. Thus, to enhance the extraction capabili-
1) Level of Attention Perception: Linear methods do not ties of the model concerning semantic and border information,
perceive context. Feature refinement and modulation are we add semantic auxiliary supervised tasks into the semantic
nonlinear but can only partially perceive context, so most branch and boundary auxiliary supervised tasks into the border
utilize high-level feature maps. Fully context-aware branch, as shown in Fig. 5.
methods utilize two input feature maps for bootstrapping To encode more semantic information to enhance the
but at the cost of initial integration problems such as express ability of the model for the same defect features,
vanishing gradients, gradient explosion, and network we insert semantic auxiliary supervised tasks at different stages
stability problems. of the semantic branch network to guide the network to encode
2) Fusion Strategies: Compared to refinement and modu- semantic contextual information at various scales. The design
lation, the soft selection method will limit the sum of of the semantic auxiliary segmentation head is illustrated in
the weights of the two feature maps to 1 and has high Fig. 7(a).
flexibility and generalization ability to adjust the weight To enhance the perceptibility of the model for defect bor-
size dynamically. In contrast, refinement and modulation ders and address the problem of indistinguishable interclass

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024

segmentation results. Inspired by [51], we add boundary aluminum blocks. Therefore, we have recollected and
auxiliary supervision tasks at each stage of the border branch annotated a dataset of 1600 aluminum block defects with
network to guide the deep network in extracting border a resolution of 640 × 480. It includes four types of
information. This binary segmentation task effectively cap- defects: hole, scratch, fold, and dirt. Pixel-level annota-
tures feature information of defect boundaries. As shown in tions were labeled by labelme.
Fig. 7(b), we use the canny operator to process the semantic 5) Ste-Seg: Steel foil is one of the important products of
ground truth (SG) in semantic segmentation tasks to obtain the Benxi Steel Group, but defects such as holes, scratches,
boundary ground truth (BG) in the edge detection task. The and folds are prone to occur during the production
pixels in BG are only 0 and 1, where 0 represents nonedge process. In order to improve the efficiency and accuracy
positions, and 1 represents edge positions. We use the dilation of steel foil defect detection, we collected a dataset
operation in image morphology to thicken the edge positions for steel foil surface defects. The dataset consists of
and enable the network to learn border information better in 1684 images with a resolution of 1920 × 1080 pixels.
BG. Finally, the network is supervised by BG to generate Due to the limited number of defect samples available
feature maps that contain edge details. from the industrial environment, some defect samples
During the training stage, the semantic segmentation head are made by ourselves.
utilizes OhemCrossEntropy Loss, while the boundary detec-
tion head employs BoundaryCrossEntropy Loss. Hence, the
overall loss of the network consists of three components B. Implementation Details
3
X We conducted experiments for this study based on PyTorch.
Loss = λ1 Lossout + λ2 LossOCE (Psi , SG) The training was implemented on an NVIDIA GeForce RTX
i=1 3060Ti GPU. During the training process, to solve the possible
3
X over-fitting problem, we first performed data enhancement
LossBCE Pbj , BG

+ λ3 (16) on the input images, including random horizontal flipping,
j=1 random resizing and random cropping, and photometric defor-
where Loss is the final loss value. Lossout is the segmentation mation to enhance the model’s generalization ability. Then,
loss. Psi is the semantic prediction value of each layer in the we adopted SGD as the optimizer and introduced a polynomial
semantic branch. LossOCE is the loss value of each semantic decay strategy (PolyLR) aiming at suppressing the model
auxiliary segmentation task in the semantic branch. Pbj is complexity using regularization, in which the initial learning
the boundary prediction value of each stage in the border rate was set to 0.001, the momentum to 0.9, and the weight
branch. LossBCE is the loss value of each auxiliary boundary decay of the polynomial decay strategy was set to 0.0005.
detection task in the border branch. To optimize the model’s To address the sample imbalance problem, we adopted a class
performance, we conducted numerous experiments, building weight adjustment strategy, in which the loss contributions of
upon previous works [38], [44], [52], and finally determined different classes are weighted. At the same time, we introduced
the weight combination of λ1 = 1, λ2 = 0.4, and λ3 = 20. the online hard example mining (OHEM) strategy, which
focuses on the harder-to-classify samples during training by
IV. E XPERIMENTS using OhemCrossEntropy as the loss function, which further
improves the model’s ability to adapt to complex scenarios
In this section, we first introduce the dataset and imple- and mitigates the risk of over-fitting. The hyperparameters α
mentation details. Then, we analyze the impact of the GLF and β in (10) and (11) are set to 2. The batch size is set to
module of our proposed method and the training strategy on 8. The total number of iterations for the experiment is set to
the accuracy of the MSD [6] test set. Finally, we compare our 80k. The variation curves of mIoU, mAcc, and loss during the
algorithm with other methods. training process are shown in Fig. 8.

A. Datasets
1) MSD: This dataset [6] consists of 1200 images and C. Ablation Study
contains three types of defects: oil, start, and scratch. In this section, we will introduce ablation experiments to
Each image has a resolution of 1920 × 1080. validate the effectiveness of our method. Our model is trained
2) NEU-Seg: This dataset [32] includes three typical on the training set of NEU-Seg [32] and evaluated on the
defects on the surface of hot-rolled strip steel: inclusion, test set. Furthermore, we visualize the segmentation results of
patch, and scratch. Each image has a resolution of 200 × some data to provide a more intuitive demonstration of the
200, and each category includes 300 images. superiority of our method.
3) MT-Defect: This dataset [3] consists of 392 defect 1) Effectiveness of Improvements on UNet: In the semantic
images and 952 nondefect images. It includes five types branch, to validate the effectiveness of our series of improve-
of defects: blowhole, break, crack, fray, and uneven, with ments to UNet, we analyzed the impact of each improvement
image resolutions ranging from 105 × 283 to 388 × 516. on the network performance, as shown in Table II. With only
4) Alu-Seg: Aluminum blocks are representative industrial a slight impact on accuracy, our improvements significantly
products, but there is a scarcity of datasets specifi- reduce the number of parameters and computation of the
cally designed for surface defect segmentation tasks in model.

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316

Fig. 8. Performance of model training on MSD. (a) Comparisons of mIoU. (b) Comparisons of mAcc. (c) Performance of loss.

TABLE II
A BLATION E XPERIMENT ON THE NEU-S EG T EST S ET OF I MPROVEMENTS
ON UN ET. UN ET-5S TAGES R EPRESENTS THE UN ET W ITH F IVE
S TAGES , UN ET-3S TAGES R EPRESENTS THE UN ET W ITH T HREE
S TAGES , UN ET-3S TAGES -RB R EPRESENTS THE UN ET
W ITH T HREE S TAGES , AND U SES R ESIDUAL
B OTTLENECK B LOCKS

Fig. 9. Visualization results of GLF on the NEU-Seg test set. (a) Input image
TABLE III and ground truth. (b) Layer1 without GLF, Layer2 without GLF. (c) Layer1
A BLATION E XPERIMENT ON NEU-S EG T EST S ET OF GLF with GLF, Layer2 without GLF. (d) Layer1 without GLF, Layer2 with GLF.
M ODULE . L AYER 1∼L AYER 2 R EPRESENTS W HETHER TO (e) Layer1 with GLF, Layer2 with GLF.
U SE THE GLF M ODULE IN THE S EMANTIC B RANCH TABLE IV
N ETWORK AT T HAT L AYER
A BLATION E XPERIMENT ON NEU-S EG T EST S ET OF S EMANTIC
AUXILIARY TASKS . L AYER 1∼L AYER 3 R EPRESENTS W HETHER
TO U SE THE S EMANTIC AUXILIARY S EGMENTATION
TASK IN THE S EMANTIC B RANCH N ETWORK
AT T HAT L AYER

2) Effectiveness of GLF: To verify the effectiveness of the


GLF module, we conducted corresponding ablation experi-
ments on the semantic branch network, as shown in Table III.
By successfully improving mIoU from 83.87% to 84.32%,
it demonstrates that the GLF module can better fuse global
and local information in the feature space. To further showcase 4) Effectiveness of Boundary Auxiliary Supervision Tasks:
the superiority of the GLF module, we visualized the semantic To guide deep network learning and preserve border infor-
segmentation results with and without the GLF module on the mation of defects for enhanced defect border perception,
NEU-Seg test set. As shown in Fig. 9, the model’s ability to we introduced boundary auxiliary detection heads into each
recognize the defective area can be enhanced by introducing stage of the border branch network. As shown in Table V,
the GLF. we conducted corresponding ablation experiments for bound-
3) Effectiveness of Semantic Auxiliary Segmentation Tasks: ary auxiliary supervision tasks in stages 1–3 of the border
To confirm the effectiveness of semantic auxiliary segmenta- branch and successfully improved mIoU from 84.68% to
tion tasks in surface defect segmentation, we inserted semantic 85.12%. To further validate the effectiveness of guiding deep
auxiliary segmentation tasks at different stages of the seman- and shallow network learning for defect border information,
tic branch network to guide the network in encoding more we visualized partial semantic segmentation results on the
semantic context information at various scales. As shown in NEU-Seg dataset, as shown in Fig. 10. The model’s ability
Table IV, we conducted corresponding ablation experiments to detect defective edges can be effectively improved by
on the auxiliary segmentation heads of each layer in the introducing boundary auxiliary segmentation tasks.
semantic branch network and successfully improved mIoU
from 84.32% to 84.68%. This result further validates the D. Compare With State-of-the-Arts
superiority of introducing semantic auxiliary segmentation We compare the proposed method with other state-of-the-
tasks. art methods on three typical surface defect segmentation

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024

TABLE V TABLE VI
A BLATION S TUDY ON NEU-S EG T EST S ET OF B OUNDARY C OMPARISONS ON THE T EST S ET OF NEU-S EG . T HESE M ARKED
AUXILIARY TASKS . S TAGE 1∼S TAGE 3 R EPRESENTS W ITH * I NDICATE T HAT THE ACCURACY OF S OME M ETHODS
W HETHER TO U SE THE B OUNDARY AUXILIARY A RE M EASURED BY [6], W HILE THE R EST OF
S EGMENTATION TASK IN THE B OUNDARY THE ACCURACY VALUES W ERE M EASURED
B RANCH N ETWORK AT T HAT S TAGE ON O UR P LATFORM

Fig. 10. Visualization results of boundary auxiliary tasks on the NEU-Seg


test set. (a) Input image and ground truth. (b) Without boundary auxiliary
tasks. (c) With one boundary auxiliary task. (d) With two boundary auxiliary
tasks. (e) With three boundary auxiliary tasks.

datasets. These methods include FCN [18], PSPNet [53],


DeepLabV3+ [54], EMANet [55], FPN [49], ICNet [56],
CGNet [57], STDC1 [51], STDC2 [51], BiSeNetV1 [25],
BiSeNetV2 [50], Fast-SCNN [58], DDRNet [38], FDSNet [6],
ConvNext [59], SegNext [60], PoolFormer [61], PIDNet [44],
SCTNet [62], RTFormer [63], SeaFormer [64], SegFormer
[65], and Trans4Trans [66]. The comparison results on the
NEU-Seg dataset are listed in Table VI. Our method achieved
a mIoU of 85.12%, with 0.97G FLOPs. By comparing with
DeepLabV3+ [54], DDSNet outperforms it by 2.16% mIoU,
with less computation. Some qualitative results of DDSNet
are shown in Figs. 11 and 12. Compared with other methods,
Fig. 11. Comparison of semantic segmentation results on the NEU-Seg
DDSNet has good recognition ability for inclusion, patch, and test set. (a) Input images. (b) Ground truth images. (c) Results based on
scratch, and its segmentation results are closer to ground truth. ConvNext [59]. (d) Results based on SegNext [60]. (e) Results based on
Regarding the MSD dataset, the comparison results are DDSNet.
listed in Table VII. We can find that the proposed DDSNet
also achieves better segmentation accuracy than other state-of- shows some qualitative results of DDSNet. On the MT-Defect
the-art methods, achieving 91.82% mIoU. By comparing with dataset, compared to other methods, DDSNet also shows good
SegNext [60], DDSNet outperforms it by 1.59% mIoU. Some generalization ability, especially for unevenness and breaks,
qualitative results of DDSNet are shown in Fig. 13. Compared which are more difficult defects to detect.
to other methods, DDSNet still has good detection ability for Regarding the Alu-Seg dataset, the comparison results are
low-contrast oil and small-size scratches and does not have listed in Table IX. We can find that the proposed DDSNet
the problem of inconsistent intraclass and indistinguishable achieves 84.77% mIoU. By comparing with FPN [49],
interclass segmentation results. DDSNet outperforms it by 0.23% mIoU. Some qualitative
Regarding the MT-Defect dataset, the comparison results are detection results are shown in Fig. 15.
listed in Table VIII. On this challenging dataset, the proposed Regarding the Ste-Seg dataset, the comparison results are
DDSNet also achieves the best segmentation accuracy of listed in Table X. We can find that the proposed DDSNet
76.51% mIoU, outperforming other state-of-the-art methods. achieves 90.01% mIoU. By comparing with PoolFormer [61],
By comparing with ConvNext [59], our DDSNet outperforms DDSNet outperforms it by 4.23% mIoU. Some qualitative
it by 2.1% mIoU, with fewer computational costs. Fig. 14 detection results are shown in Fig. 16.

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316

TABLE VIII
C OMPARISONS ON THE T EST S ET OF MT-D EFECT [3]. T HESE M ARKED
W ITH * I NDICATE T HAT THE ACCURACY OF S OME M ETHODS A RE
M EASURED BY [6], W HILE THE R EST OF THE ACCURACY VALUES
W ERE M EASURED ON O UR P LATFORM

Fig. 12. Visualization of feature maps on the NEU-Seg test set. (a) Input
images. (b) ConvNext [59]. (c) SegNext [60]. (d) DDSNet.

TABLE VII
C OMPARISONS ON THE T EST S ET OF MSD [6]. T HESE M ARKED
W ITH * I NDICATE T HAT THE ACCURACY OF S OME M ETHODS
A RE M EASURED BY [6], W HILE THE R EST OF
THE ACCURACY VALUES W ERE M EASURED
ON O UR P LATFORM

are smaller, the amplification effect will be weaker, which will


cause the model to exhibit flatter or more conservative charac-
teristics when fusing the original features and residuals. This
may help the model remain stable when feature differences are
minor, but it may also limit its performance when it strongly
needs to differentiate between feature sources. If α and β
are large, this will further amplify the effect of differences.
In this way, the model will show more aggressive behavior
in deciding which feature channel to rely on, which may
help improve model performance when feature differences are
significant. However, too much weight amplification may lead
to a decrease in training stability. Some visualization results
are shown in Fig. 17.

B. Different Attention-Type in DDSNet


In order to further explore the impact of different attention
modules on the performance of DDSNet, we replaced GLF
with different attention-type modules and conducted compar-
ative experiments on NEU-Seg. As shown in Table XII, our
proposed GLF module exhibits more prominent performance
compared to other attention modules.

C. Generalization of the GLF


V. D ISCUSSIONS As shown in Fig. 5, GLF has a relatively simple structure,
does not have strict size restrictions on the fused features,
A. Values of α and β in GLF and can handle the problem of feature fusion at different
In (10) and (11), α and β are mainly used to amplify the scales. Suppose it needs to be integrated into other networks.
proportion of features to allow the network to obtain a better In that case, only the number of channels needs to be adjusted,
response. In order to be able to determine the values of α as shown in Fig. 18. GLF can not only effectively improve
and β, we conducted several sets of experiments, as shown in the segmentation accuracy of the model but also reduce the
Table XI. During the experiments, we found that if α and β number of parameters in the model, as shown in Table XIII.

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024

Fig. 13. Comparison of semantic segmentation results on the MSD test set. (a) Input images. (b) Ground truth images. (c) Results based on ConvNext [59].
(d) Results based on SegNext [60]. (e) Results based on DDSNet.

TABLE IX
C OMPARISONS ON THE T EST S ET OF A LU -S EG . A LL THE ACCURACY
VALUES OF M ETHODS W ERE M EASURED ON O UR P LATFORM

Fig. 14. Comparison of semantic segmentation results on the MT-Defect


test set. (a) Input images. (b) Ground truth images. (c) Results based on
ConvNext [59]. (d) Results based on SegNext [60]. (e) Results based on
DDSNet.

Fig. 15. Comparison of semantic segmentation results on the Alu-Seg


test set. (a) Input images. (b) Ground truth images. (c) Results based on
ConvNext [59]. (d) Results based on SegNext [60]. (e) Results based on
DDSNet.

Fig. 16. Comparison of semantic segmentation results on the Ste-Seg


test set. (a) Input images. (b) Ground truth images. (c) Results based on
ConvNext [59]. (d) Results based on SegNext [60]. (e) Results based on
In addition, to validate the effectiveness of GLF in other DDSNet.
scenarios, we integrate it into several classical object detection
methods (RetinaNet [67], Faster-RCNN [68], and FCOS [69])
and conducted tests on the COCO2014 dataset. We used SGD
as the optimizer, set the epoch to 24, the batch size to 2, the weight decay to 0.0005. As demonstrated in Table XIV, GLF
learning rate was set to 0.01, the momentum to 0.9, and the still performed excellently in complex scenarios.

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316

TABLE X TABLE XII


C OMPARISONS ON THE T EST S ET OF S TE -S EG . A LL THE ACCURACY P ERFORMANCE OF D IFFERENT ATTENTION -T YPE IN DDSN ET
VALUES OF M ETHODS W ERE M EASURED ON O UR P LATFORM ON NEU-S EG

TABLE XI
P ERFORMANCE OF D IFFERENT VALUES OF α AND β IN GLF ON NEU-S EG

Fig. 18. Migration of GLF to other networks. (a) BiSeNetV1.


(b) BiSeNetV1_GLF. (c) STDC1. (d) STDC1_GLF.

TABLE XIII
P ERFORMANCE OF GLF IN D IFFERENT M ODELS ON NEU-S EG

TABLE XIV
P ERFORMANCE OF GLF IN D IFFERENT M ODELS ON COCO2014
Fig. 17. Visualization of heat maps on NEU-Seg for different values of α
and β. (a) Images. (b) α and β are set to 1. (c) α and β are set to 2. (d) α
and β are set to 3.

D. Limitations of DDSNet
We propose a new dual-branch semantic segmentation net-
work to extract the semantic contextual information and border
information of defects. Although it improves the segmentation
accuracy of defects, there are still some limitations. To achieve small, it can lead to overfitting of the model, failing to
better segmentation accuracy, a large number of data samples cover most of the features and variations in the data space.
are required for training. Specifically, when the dataset is Therefore, DDSNet achieves excellent segmentation accuracy

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024

[2] H. Dong, K. Song, Y. He, J. Xu, Y. Yan, and Q. Meng, “PGA-


Net: Pyramid feature fusion and global context attention network for
automated surface defect detection,” IEEE Trans. Ind. Informat., vol. 16,
no. 12, pp. 7448–7458, Dec. 2020.
[3] Y. Huang, C. Qiu, Y. Guo, X. Wang, and K. Yuan, “Surface defect
saliency of magnetic tile,” in Proc. IEEE 14th Int. Conf. Autom. Sci.
Eng., Aug. 2018, pp. 612–617.
[4] W. Zhai, J. Zhu, Y. Cao, and Z. Wang, “A generative adversarial
network based framework for unsupervised visual surface inspection,”
in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Apr. 2018,
pp. 1283–1287.
[5] W. Shi, Z. Lu, W. Wu, and H. Liu, “Single-shot detector with enriched
semantics for PCB tiny defect detection,” J. Eng., vol. 2020, no. 13,
pp. 366–372, Jul. 2020.
[6] J. Zhang, R. Ding, M. Ban, and T. Guo, “FDSNeT: An accurate real-time
surface defect segmentation network,” in Proc. ICASSP-IEEE Int. Conf.
Acoust., Speech Signal Process. (ICASSP), May 2022, pp. 3803–3807.
[7] I. Aydin, E. Akin, and M. Karakose, “Defect classification based on
deep features for railway tracks in sustainable transportation,” Appl. Soft
Comput., vol. 111, Nov. 2021, Art. no. 107706.
[8] Y. He, K. Song, H. Dong, and Y. Yan, “Semi-supervised defect classifi-
cation of steel surface based on multi-training and generative adversarial
Fig. 19. Examples of some failure cases. (a) Input images. (b) Ground truth network,” Opt. Lasers Eng., vol. 122, pp. 294–302, Nov. 2019.
images. (c) DDSNet. [9] Y. Zhao, K. Hao, H. He, X. Tang, and B. Wei, “A visual long-short-
term memory based integrated CNN model for fabric defect image
classification,” Neurocomputing, vol. 380, pp. 259–270, Mar. 2020.
[10] B. Lu, M. Zhang, and B. Huang, “Deep adversarial data augmentation
on the MSD, Alu-Seg, NEU-Seg, and Ste-Seg datasets, and for fabric defect classification with scarce defect data,” IEEE Trans.
it performs poorly on the MT-Defect dataset, as shown in Instrum. Meas., vol. 71, pp. 1–13, 2022.
Fig. 19. [11] N. Zeng, P. Wu, Z. Wang, H. Li, W. Liu, and X. Liu, “A small-
sized object detection oriented multi-scale feature fusion approach with
application to defect detection,” IEEE Trans. Instrum. Meas., vol. 71,
VI. C ONCLUSION pp. 1–14, 2022.
[12] Y. Gao, J. Lin, J. Xie, and Z. Ning, “A real-time defect detection method
In this article, we collect and annotate an aluminum blocks for digital signal processing of industrial inspection applications,” IEEE
dataset, Alu-Seg, and a steel foil dataset, Ste-Seg, to advance Trans. Ind. Informat., vol. 17, no. 5, pp. 3450–3459, May 2021.
the field of surface defect segmentation. In addition, to address [13] J. Zhang, J. Jing, P. Lu, and S. Song, “Improved MobileNetV2-SSDLite
for automatic fabric defect detection system based on cloud-edge com-
the issues of inconsistent intraclass segmentation results and puting,” Measurement, vol. 201, Sep. 2022, Art. no. 111665.
indistinguishable interclass segmentation results during the [14] Y. He, K. Song, Q. Meng, and Y. Yan, “An end-to-end steel surface
actual surface defect detection process, we propose a novel defect detection approach via fusing multiple hierarchical features,”
IEEE Trans. Instrum. Meas., vol. 69, no. 4, pp. 1493–1504, Apr. 2020.
surface defect segmentation network called DDSNet. Specifi-
[15] L. Yang, F. Zhou, and L. Wang, “A scratch detection method based on
cally, to enhance the ability of the model to express features of deep learning and image segmentation,” IEEE Trans. Instrum. Meas.,
the same class defects to resolve the problem of inconsistent vol. 71, pp. 1–12, 2022.
intraclass segmentation results, we design a new semantic [16] Q. Zhou, Z. Situ, S. Teng, H. Liu, W. Chen, and G. Chen, “Automatic
sewer defect detection and severity quantification based on pixel-
branch based on UNet and propose a GLF module based on level semantic segmentation,” Tunnelling Underground Space Technol.,
similarity metrics to guide the network in extracting richer vol. 123, May 2022, Art. no. 104403.
semantic information of defects. To improve the perceptibility [17] M. Mehta and C. Shao, “Federated learning-based semantic segmenta-
of the model for defect borders to address the problem of tion for pixel-wise defect detection in additive manufacturing,” J. Manuf.
Syst., vol. 64, pp. 197–210, Jul. 2022.
indistinguishable interclass segmentation results, we integrate [18] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
semantic and border information of defects and use a more for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern
effective auxiliary training strategy to guide the network in Recognit. (CVPR), Jun. 2015, pp. 3431–3440.
extracting more comprehensive defect border information. [19] X. Yang et al., “A joint ship detection and waterway segmentation
method for environment-aware of USVs in canal waterways,” IEEE
Moreover, to further explore the effectiveness of our method, Trans. Autom. Sci. Eng., vol. 1, no. 1, pp. 1–13, Jul. 2024.
we conducted extensive experiments on MSD, MT-Defects, [20] K. R. Singh, A. Sharma, and G. K. Singh, “MADRU-Net: Multiscale
NEU-Seg, Alu-Seg, and Ste-Seg datasets and achieved the attention-based cardiac MRI segmentation using deep residual U-Net,”
best segmentation accuracy. In the future, we will further IEEE Trans. Instrum. Meas., vol. 73, pp. 1–13, 2024.
[21] J. Chen, Y. Wen, Y. A. Nanehkaran, D. Zhang, and A. Zeb, “Multiscale
apply DDSNet to defect scenarios such as roads and bridges. attention networks for pavement defect detection,” IEEE Trans. Instrum.
In addition, we will explore the application of this method Meas., vol. 72, pp. 1–12, 2023.
in other domains, such as image classification and object [22] M. Li, B. Peng, J. Liu, and D. Zhai, “RBNet: An ultrafast rendering-
detection. based architecture for railway defect segmentation,” IEEE Trans.
Instrum. Meas., vol. 72, pp. 1–8, 2023.
[23] J. Zhou et al., “Toward TR-PCB bubble detection via an efficient
R EFERENCES attention segmentation network and dynamic threshold,” IEEE Trans.
Instrum. Meas., vol. 72, pp. 1–12, 2023.
[1] D. Zhang, K. Song, J. Xu, Y. He, M. Niu, and Y. Yan, “MCnet: Multiple [24] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-
context information segmentation network of no-service rail surface works for biomedical image segmentation,” in Proc. 18th Int. Conf. Med.
defects,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–9, 2021. Image Comput. Comput.-Assist. Intervent., 2015, pp. 234–241.

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
YIN et al.: DDSNet: DEEP DUAL-BRANCH NETWORKS FOR SURFACE DEFECT SEGMENTATION 2525316

[25] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “BiseNet: Bilateral [48] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
segmentation network for real-time semantic segmentation,” in Proc. image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
Eur. Conf. Comput. Vis. (ECCV), Sep. 2018, pp. 325–341. (CVPR), Jun. 2016, pp. 770–778.
[26] Z. Li et al., “Complementation-reinforced network for integrated [49] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,
reconstruction and segmentation of pulmonary gas MRI with high “Feature pyramid networks for object detection,” in Proc. IEEE Conf.
acceleration,” Med. Phys., vol. 51, no. 1, pp. 378–393, Jan. 2024. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 936–944.
[27] Y. Yang, Y. He, H. Guo, Z. Chen, and L. Zhang, “Semantic segmen- [50] C. Yu, C. Gao, J. Wang, G. Yu, C. Shen, and N. Sang, “BiSeNet
tation supervised deep-learning algorithm for welding-defect detection v2: Bilateral network with guided aggregation for real-time semantic
of new energy batteries,” Neural Comput. Appl., vol. 34, no. 22, segmentation,” Int. J. Comput. Vis., vol. 129, no. 11, pp. 3051–3068,
pp. 19471–19484, Nov. 2022, doi: 10.1007/s00521-022-07474-0. Nov. 2021.
[28] R. Neven and T. Goedemé, “A multi-branch U-Net for steel surface [51] M. Fan et al., “Rethinking BiSeNet for real-time semantic segmentation,”
defect type and severity segmentation,” Metals, vol. 11, no. 6, p. 870, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
May 2021. Jun. 2021, pp. 9711–9720.
[29] X. Li, W. Wang, X. Hu, and J. Yang, “Selective kernel networks,” [52] X. Shi et al., “BSSNet: A real-time semantic segmentation network for
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2019, road scenes inspired from AutoEncoder,” IEEE Trans. Circuits Syst.
pp. 510–519. Video Technol., vol. 34, no. 5, pp. 3424–3438, May 2024.
[30] H. Zhang et al., “ResNeSt: Split-attention networks,” in Proc. IEEE/CVF [53] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing
Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2022, network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
pp. 2735–2745. Jul. 2017, pp. 6230–6239.
[31] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation [54] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam,
networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 8, “Encoder-decoder with atrous separable convolution for semantic
pp. 2011–2023, Aug. 2020. image segmentation,” in Computer Vision—ECCV. Cham, Switzerland:
[32] Y. Bao et al., “Triplet-graph reasoning network for few-shot metal Springer, 2018, pp. 833–851.
generic surface defect segmentation,” IEEE Trans. Instrum. Meas., [55] X. Li, Z. Zhong, J. Wu, Y. Yang, Z. Lin, and H. Liu, “Expectation-
vol. 70, pp. 1–11, 2021. maximization attention networks for semantic segmentation,” in
[33] E. Liu, K. Chen, Z. Xiang, and J. Zhang, “Conductive particle detec- Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019,
tion via deep learning for ACF bonding in TFT-LCD manufacturing,” pp. 9166–9175.
J. Intell. Manuf., vol. 31, no. 4, pp. 1037–1049, Apr. 2020. [56] H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, “ICNet for real-time semantic
[34] S. Deng, R. Gao, Y. Wang, W. Mao, and W. Zheng, “Structure of a segmentation on high-resolution images,” in Proc. Eur. Conf. Comput.
semantic segmentation-based defect detection network for laser cladding Vis. (ECCV), Sep. 2018, pp. 405–420.
infrared images,” Meas. Sci. Technol., vol. 34, no. 8, Aug. 2023, [57] T. Wu, S. Tang, R. Zhang, J. Cao, and Y. Zhang, “CGNet: A light-weight
Art. no. 085601. context guided network for semantic segmentation,” IEEE Trans. Image
[35] F. Luo, Y. Cui, and Y. Liao, “MVRA-UNet: Multi-view residual attention Process., vol. 30, pp. 1169–1179, 2021.
U-Net for precise defect segmentation on magnetic tile surface,” IEEE [58] R. P. K. Poudel, S. Liwicki, and R. Cipolla, “Fast-SCNN: Fast semantic
Access, vol. 11, pp. 135212–135221, 2023. segmentation network,” in Proc. Brit. Mach. Vis. Conf., Feb. 2019,
[36] A. F. Kamanli, “A novel multi-scale cross-patch attention with pp. 1–23.
dilated convolution (MCPAD-UNET) for metallic surface defect detec- [59] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A
tion,” Signal, Image Video Process., vol. 18, no. 1, pp. 485–494, ConvNet for the 2020s,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Feb. 2024. Recognit. (CVPR), Jun. 2022, pp. 11966–11976.
[37] Z. Ling, A. Zhang, D. Ma, Y. Shi, and H. Wen, “Deep Siamese semantic [60] M.-H. Guo, C.-Z. Lu, Q. Hou, Z. Liu, M.-M. Cheng, and S.-M. Hu,
segmentation network for PCB welding defect detection,” IEEE Trans. “Segnext: Rethinking convolutional attention design for semantic seg-
Instrum. Meas., vol. 71, pp. 1–11, 2022. mentation,” in Proc. Adv. Neural Inf. Process. Syst., vol. 35, 2022,
[38] H. Pan, Y. Hong, W. Sun, and Y. Jia, “Deep dual-resolution networks pp. 1140–1156.
for real-time and accurate semantic segmentation of traffic scenes,” [61] W. Yu et al., “MetaFormer is actually what you need for vision,” in Proc.
IEEE Trans. Intell. Transp. Syst., vol. 24, no. 3, pp. 3448–3460, IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022,
Mar. 2023. pp. 10809–10819.
[39] Z. Qu, C. Cao, L. Liu, and D.-Y. Zhou, “A deeply supervised convo- [62] Z. Xu, D. Wu, C. Yu, X. Chu, N. Sang, and C. Gao, “Sctnet:
lutional neural network for pavement crack detection with multiscale Single-branch CNN with transformer semantic information for real-
feature fusion,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 9, time segmentation,” in Proc. AAAI Conf. Artif. Intell., vol. 38, 2024,
pp. 4890–4899, Sep. 2022. pp. 6378–6386.
[40] P. Lu, J. Jing, and Y. Huang, “MRD-Net: An effective CNN-based seg- [63] J. Wang et al., “Rtformer: Efficient design for real-time semantic
mentation network for surface defect detection,” IEEE Trans. Instrum. segmentation with transformer,” in Proc. Adv. Neural Inf. Process. Syst.,
Meas., vol. 71, pp. 1–12, 2022. vol. 35, 2022, pp. 7423–7436.
[41] J. Zhu, G. He, and P. Zhou, “MFNet: A novel multilevel feature fusion [64] Q. Wan, Z. Huang, J. Lu, G. Yu, and L. Zhang, “SeaFormer++:
network with multibranch structure for surface defect detection,” IEEE Squeeze-enhanced axial transformer for mobile visual recognition,”
Trans. Instrum. Meas., vol. 72, pp. 1–11, 2023. 2023, arXiv:2301.13156.
[42] T. Liu, Z. He, Z. Lin, G.-Z. Cao, W. Su, and S. Xie, “An adaptive image [65] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo,
segmentation network for surface defect detection,” IEEE Trans. Neural “Segformer: Simple and efficient design for semantic segmentation with
Netw. Learn. Syst., vol. 1, no. 1, pp. 1–14, Jan. 2022. transformers,” Comput. Vis. Pattern Recognit., vol. 1, no. 1, pp. 1–24,
[43] J. Cheng, G. Wen, X. He, X. Liu, Y. Hu, and S. Mei, “Achieving the May 2021.
defect transfer detection of semiconductor wafer by a novel prototype [66] J. Zhang, K. Yang, A. Constantinescu, K. Peng, K. Muller, and
learning-based semantic segmentation network,” IEEE Trans. Instrum. R. Stiefelhagen, “Trans4Trans: Efficient transformer for transparent
Meas., vol. 73, pp. 1–12, 2024. object and semantic scene segmentation in real-world navigation
[44] J. Xu, Z. Xiong, and S. P. Bhattacharyya, “PIDNet: A real-time assistance,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 10,
semantic segmentation network inspired by PID controllers,” in Proc. pp. 19173–19186, Oct. 2022.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, [67] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for
pp. 19529–19539. dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
[45] N. Zhang, J. Li, Y. Li, and Y. Du, “Global attention pyramid network Oct. 2017, pp. 2999–3007.
for semantic segmentation,” in Proc. Chin. Control Conf., Jul. 2019, [68] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards
pp. 1–26. real-time object detection with region proposal networks,” IEEE
[46] W. Yuan, S. Wang, X. Li, M. Unoki, and W. Wang, “A skip attention Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149,
mechanism for monaural singing voice separation,” IEEE Signal Pro- Jun. 2017.
cess. Lett., vol. 26, no. 10, pp. 1481–1485, Oct. 2019. [69] Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional
[47] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Training very deep one-stage object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.
networks,” in Proc. Neural Inf. Process. Syst., Dec. 2015, pp. 1–23. (ICCV), Oct. 2019, pp. 9626–9635.

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.
2525316 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 73, 2024

Zhenyu Yin received the B.S., M.Sc., and Ph.D. Xiaoqiang Shi is currently pursuing the master’s
degrees from Northeastern University, Shenyang, degree in applied computer science and technol-
China, in 2001, 2004, and 2007, respectively, all in ogy with the University of Chinese Academy of
computer science. Sciences, Beijing, China.
He was a Post-Doctoral Researcher with the His main research interests include deep learning,
Shenyang Institute of Computing Technology, semantic segmentation, object detection, and related
Chinese Academy of Sciences, Shenyang, and the fields.
Shenyang Institute of Automation, Shenyang, from
October 2008 to October 2011. He is currently a
Professor with the Shenyang Institute of Comput-
ing Technology, Chinese Academy of Sciences. His
research interests include the Industrial Internet of Things, machine learning,
artificial intelligence, industrial embedded systems, FPGA/SoC, numerical
control systems, and functional safety.
Dr. Yin served as the Secretary of Subcommittee 3 on safe control systems Feiqing Zhang received the bachelor’s degree in
for machinery and Technical Committee 231 on electrical systems of industrial engineering from Lanzhou University of Technol-
machinery of standardization administration of China. ogy, Lanzhou, China, in June 2017, and the Doctor’s
degree in engineering from the University of Chinese
Academy of Sciences, Beijing, in June 2024.
Her research interests include the IIOT, edge
computing, and deep learning.
Li Qin is currently pursuing the master’s degree
with the School of Computer Science and Technol-
ogy, University of Chinese Academy of Sciences,
Beijing, China.
His main research interests include deep learning,
semantic segmentation, autonomous driving, and
related fields.
Guangyuan Xu received the Bachelor of Science
from Jilin University, Changchun, China, in July
2015. He is currently pursuing the Ph.D. degree
in applied computer science and technology with
the University of Chinese Academy of Sciences,
Beijing, China.
His research interests include the IIOT, edge
computing, and deep learning.

Guangjie Han (Fellow, IEEE) received the Ph.D.


degree from Northeastern University, Shenyang,
China, in 2004.
In February 2008, he was a Post-Doctoral Yuanguo Bi (Member, IEEE) received the Ph.D.
Researcher with the Department of Computer degree in computer science and technology from
Science, Chonnam National University, Gwangju, Northeastern University, Shenyang, China, in 2010.
South Korea. From October 2010 to October 2011, He was a Visiting Ph.D. Student with the
he was a Visiting Research Scholar with Osaka Broadband Communications Research (BBCR) Lab-
University, Suita, Japan. From January 2017 to oratory, Department of Electrical and Computer
February 2017, he was a Visiting Professor with the Engineering, University of Waterloo, Waterloo,
City University of Hong Kong, Hong Kong, China. ON, Canada, from 2007 to 2009. He is cur-
From July 2017 to July 2020, he was a Distinguished Professor with the rently a Professor with the School of Computer
Dalian University of Technology, Dalian, China. He is currently a Professor Science and Engineering, Northeastern University.
with the Department of Internet of Things Engineering, Hohai University, He has authored/co-authored more than 80 jour-
Changzhou, China. He has over 500 peer-reviewed journals and conference nals/conference papers, including high-quality journal papers, such as IEEE
papers, in addition to 160 granted and pending patents. Currently, his H-index J OURNAL ON S ELECTED A REAS IN C OMMUNICATIONS, IEEE T RANS -
is 62 and i10-index is 260 in Google Citation (Google Scholar). The total ACTIONS ON W IRELESS C OMMUNICATIONS , IEEE T RANSACTIONS ON
citation count of his papers raises above 14 000 times. His research interests I NTELLIGENT T RANSPORTATION S YSTEMS, IEEE T RANSACTIONS ON
include the Internet of Things, Industrial Internet, machine learning, artificial V EHICULAR T ECHNOLOGY, IEEE I NTERNET OF T HINGS J OURNAL, IEEE
intelligence, mobile computing, and security and privacy. Communications Magazine, IEEE W IRELESS C OMMUNICATIONS, and IEEE
Dr. Han is a fellow of the U.K. Institution of Engineering and Technology Network, and mainstream conferences, such as IEEE Global Communications
(FIET). He was awarded the 2020 IEEE Systems Journal Annual Best Conference and IEEE International Conference on Communications. His
Paper Award and the 2017–2019 IEEE Access Outstanding Associate Editor research interests include medium access control, QoS routing, multihop
Award. He served as the chair of organizing and technical committees at broadcast, and mobility management in vehicular networks, service deploy-
many international conferences. He served on the Editorial Boards of up to ment, service migration, and task offloading in mobile edge computing,
ten international journals, including IEEE T RANSACTIONS ON I NDUSTRIAL federated and distributed learning, software-defined networking, and space-
I NFORMATICS, IEEE T RANSACTIONS ON C OGNITIVE C OMMUNICATIONS air-ground integrated networks.
AND N ETWORKING , and IEEE S YSTEMS J OURNAL . He has guest-edited Dr. Bi served as a TPC Co-Chair for IEEE/CIC ICCC 2023, the General
several special issues in IEEE Journals and Magazines, including IEEE J OUR - Co-Chair for IEEE ICCSN 2023, and the Publication Co-Chair for IEEE
NAL ON S ELECTED A REAS IN C OMMUNICATIONS , IEEE Communications MSN 2018. He served as an Editor/Guest Editor for IEEE Communications
Magazine, IEEE W IRELESS C OMMUNICATIONS, and Computer Networks. Magazine, IEEE W IRELESS C OMMUNICATIONS, and IEEE Network.

Authorized licensed use limited to: ShanghaiTech University. Downloaded on July 30,2024 at 12:16:14 UTC from IEEE Xplore. Restrictions apply.

You might also like