0% found this document useful (0 votes)
38 views18 pages

MobileNet CBAM

Uploaded by

Kamalesh S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views18 pages

MobileNet CBAM

Uploaded by

Kamalesh S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

mathematics

Article
A Weld Surface Defect Recognition Method Based on Improved
MobileNetV2 Algorithm
Kai Ding 1 , Zhangqi Niu 1 , Jizhuang Hui 1 , Xueliang Zhou 2 and Felix T. S. Chan 3, *

1 Institute of Smart Manufacturing Systems, Chang’an University, Xi’an 710064, China


2 School of Mechanical Engineering, Hubei University of Automotive Technology, Shiyan 442002, China
3 Department of Decision Sciences, Macau University of Science and Technology, Macao SAR, China
* Correspondence: [email protected]; Tel.: +853-65921222

Abstract: Traditional welding quality inspection methods for pipelines and pressure vessels are
time-consuming, labor-intensive, and suffer from false and missed inspection problems. With
the development of smart manufacturing, there is a need for fast and accurate in-situ inspection
of welding quality. Therefore, detection models with higher accuracy and lower computational
complexity are required for technical support. Based on that, an in-situ weld surface defect recognition
method is proposed in this paper based on an improved lightweight MobileNetV2 algorithm. It
builds a defect classification model with MobileNetV2 as the backbone of the network, embeds
a Convolutional Block Attention Module (CBAM) to refine the image feature information, and
reduces the network width factor to cut down the number of model parameters and computational
complexity. The experimental results show that the proposed weld surface defect recognition method
has advantages in both recognition accuracy and computational efficiency. In summary, the method
in this paper overcomes the limitations of traditional methods and achieves the goal of reducing
labor intensity, saving time, and improving accuracy. It meets the actual needs of in-situ weld surface
defect recognition for pipelines, pressure vessels, and other industrial complex products.

Citation: Ding, K.; Niu, Z.; Hui, J.; Keywords: weld surface; defect recognition; MobileNetV2; attention mechanism
Zhou, X.; Chan, F.T.S. A Weld Surface
Defect Recognition Method Based on
MSC: 90B25
Improved MobileNetV2 Algorithm.
Mathematics 2022, 10, 3678. https://
doi.org/10.3390/math10193678

Academic Editors: Ioannis G. Tsoulos 1. Introduction


and Frank Werner Affected by welding procedure [1,2], welding method [3], environment, and operator’s
Received: 7 August 2022
technical level, various welding defects often occur during the welding process of pipelines
Accepted: 6 October 2022
and pressure vessels, such as crack, blowhole, slag inclusion, undercut, incomplete pen-
Published: 8 October 2022
etration, and incomplete fusion [4], which directly affect the sealing and strength of the
products. To ensure the safety of products such as pipelines and pressure vessels, it is
Publisher’s Note: MDPI stays neutral
necessary to carry out strict welding quality inspection during the manufacturing process.
with regard to jurisdictional claims in
The reason for this is to find the causes of welding defects and take corrective measures in a
published maps and institutional affil-
targeted manner [5]. However, current weld surface defect recognition is still dominated by
iations.
manual inspection, which is not only time-consuming and labor-intensive but also suffers
from false and missed inspection problems. Therefore, it is particularly important to realize
efficient and accurate recognition of weld surface defects.
Copyright: © 2022 by the authors.
The key technology for intelligent detection of weld surface defects is to use machine
Licensee MDPI, Basel, Switzerland. vision instead of artificial vision to complete the weld surface image classification task. In
This article is an open access article the field of computer vision image recognition, Convolutional Neural Network (CNN) is
distributed under the terms and one of the core algorithms. LeNet [6] proposed in 1998 is one of the earliest CNNs, and its
conditions of the Creative Commons structure is simple but successfully solves the problem of handwritten digit recognition.
Attribution (CC BY) license (https:// Subsequently, several classic networks, AlexNet [7], InceptionNet [8], and ResNet [9], were
creativecommons.org/licenses/by/ successively proposed, which reduced the error rate on the ImageNet [10] dataset year by
4.0/). year.

Mathematics 2022, 10, 3678. https://doi.org/10.3390/math10193678 https://www.mdpi.com/journal/mathematics


Mathematics 2022, 10, 3678 2 of 19

structure is simple but successfully solves the problem of handwritten digit recognition.
Mathematics 2022, 10, 3678 Subsequently, several classic networks, AlexNet [7], InceptionNet [8], and ResNet 2 of[9],
18
were successively proposed, which reduced the error rate on the ImageNet [10] dataset
year by year.
However, the training difficulty, the number of model parameters, and computa-
However, the training difficulty, the number of model parameters, and computational
tional complexity
complexity also growalso grow withincreasing
with the the increasing
number number of network
of network layers.layers. Meantime,
Meantime, it isit
is difficult to deploy the above deep CNN algorithms on resource-constrained
difficult to deploy the above deep CNN algorithms on resource-constrained devices. In this devices. In
this paper, an improved lightweight MobileNetV2 model is constructed
paper, an improved lightweight MobileNetV2 model is constructed to achieve efficient and to achieve effi-
cient and high-accuracy
high-accuracy in-situ recognition
in-situ recognition of weld
of weld surface surface
defects defects inand
in pipelines pipelines
pressure and pres-
vessels.
sure vessels. The advantages of the proposed methods can be reflected
The advantages of the proposed methods can be reflected in two aspects: (1) recognition in two aspects: (1)
recognition
accuracy; (2)accuracy;
recognition(2) recognition
speed. On the speed.
one On the to
hand, one hand, torecognition
improve improve recognition
accuracy, the ac-
curacy, the attention mechanism is embedded, which can focus on
attention mechanism is embedded, which can focus on the important features of the imagethe important features
of the
and image and
suppress suppress the
the interference ofinterference of irrelevant
irrelevant information; oninformation;
the other hand,on the other hand,
to improve the
to improve the recognition speed, the width factor of MobileNetV2
recognition speed, the width factor of MobileNetV2 is narrowed to reduce the number is narrowed to reduce of
the number
model of model
parameters andparameters
computational and complexity.
computational Ancomplexity.
experiment An experiment
is conducted to is con-
verify
ducted
the to verify
proposed the proposed
method, the resultsmethod,
of which theshow
results
thatofthe
which show MobileNetV2
improved that the improved has goodMo-
bileNetV2 has
recognition good recognition
accuracy and a smallaccuracy
number of and a small
model number of model parameters.
parameters.
The remainder
The remainder of of this
this paper
paper is is organized
organized as as follows:
follows: Section
Section 22 reviews
reviews deepdeep CNNs
CNNs
and lightweight CNNs for surface defect detection and further weld
and lightweight CNNs for surface defect detection and further weld defect defection, defect defection, Sec-
tion 3 constructs the improved MobileNetV2-based
Section 3 constructs the improved MobileNetV2-based weld surface weld surface defect recognition
recognition
model, Section
model, Section 44describes
describesananexperiment
experiment and results
and to verify
results the the
to verify proposed
proposed method,
method,Sec-
tion 5 discusses
Section 5 discussesthetheadvantages
advantagesand andfurther
furtherimprovement
improvementof ofthe
the proposed
proposed method, and and
Section 66 draws
Section draws thethe conclusion.
conclusion.

2.
2. Literature
Literature Review
Review
As
As shown
shownininFigure
Figure1, 1,
thethe
related work
related on CNN-based
work on CNN-based surface defect
surface detection
defect or weld
detection or
defect detection
weld defect can becan
detection reviewed from two
be reviewed fromaspects: (1) deep
two aspects: (1) CNNs, whichwhich
deep CNNs, emphasize more
emphasize
on
morerecognition accuracy;
on recognition (2) lightweight
accuracy; CNNs,
(2) lightweight which
CNNs, emphasize
which more
emphasize on recognition
more on recogni-
speed.
tion speed.

Figure 1.
Figure 1. Analytic
Analytic flow
flow of
of related
related work.
work.

2.1.
2.1. Applications
Applications of
of Deep
Deep CNNs
CNNs
Because
Because the
the end-to-end
end-to-end [11][11] recognition
recognition method
method addresses
addresses thethe issues
issues involved
involved inin
complex
complex artificial processes,
processes, itithas
hasbeen
beenapplied
appliedininseveral
several fields
fields such
such as as image
image process-
processing,
ing, speech
speech recognition
recognition [12],[12], medical
medical imaging
imaging [13],[13], natural
natural language
language processing
processing [14],[14],
and and
bio-
biomedical signal
medical signal processing
processing [15],
[15], etc.Many
etc. Manyscholars
scholarshave
havedone
donerelated
related research
research in these
these
fields.
fields. Mustageem
Mustageemetetal.al.[16]
[16]designed
designed four local
four feature
local learning
feature blocks
learning (LFLB)
blocks to solve
(LFLB) the
to solve
problem of low prediction performance of intelligent speech emotion recognition systems.
For the early detection of COVID-19 from chest X-ray images, Khishe et al. [17] proposed
an automatically designs classifiers framework and repeatedly makes use of a heuristic
for optimization. Aiming at the problem of unreasonable weight allocation of attention in
Mathematics 2022, 10, 3678 3 of 18

aspect-level sentiment analysis, Han et al. [18] proposed an Interactive Graph ATtention
(IGATs) networks model.
(1) Surface defect detection
When detecting surface defects of products in industrial applications, many scholars
have proposed research methods based on deep CNN and achieved good experimental
results. Tsai et al. [19] proposed SurfNetv2 to recognize surface defects of the Calcium
Silicate Board (CSB) using visual image information, experimental results show that the
proposed SurfNetv2 outperforms five state-of-the-art methods. Wan et al. [20] proposed a
strip steel defect detection method that achieved surface rapid screening, sample dataset’s
category balance, defect detection, and classification. The detection rate of improved
VGG19 was greatly improved with few samples and imbalanced datasets. Lei et al. [21]
proposed a Segmented Embedded Rapid Defect Detection Method for Surface Defects
(SERDD), which realizes the two-way fusion of image processing and defect detection. This
method can provide machine vision technical support for bearing surface defect detection
in its real sense.
(2) Weld defect detection
When applying deep learning technology to the field of weld defect detection, scholars
actively explore solutions for different problems and verify the application effect in experi-
ments. In order to boost productivity and quality of welded joints by accurate classification
of good and bad welds, Sekhar et al. [22] presented a transfer learning approach for the
accurate classification of tungsten inert gas (TIG) welding defects. Transfer learning can
also be used to overcome the limitation that neural networks trained with small datasets
produce less accurate results. Kumaresan et al. [23] adopted transfer learning using Pre-
trained CNNs and extracted the features of the weld defect dataset using VGG16 and
ResNet50. Experiments showed that transfer learning improves performance and reduces
training time. In order to improve the accuracy of CNN in weld defect identification,
Jiang et al. [24] introduced an improved pooling strategy that considers the distribution of
the pooling region and feature map, and proposed an enhanced feature selection method
integrating the ReliefF algorithm with the CNN. Aims to make the best use of unannotated
image data, Dong et al. [25] proposed a novel unsupervised local deep feature learning
method based on image segmentation, built a network that can extract useful features from
an image, and demonstrated the approach on two aerospace weld inspection tasks. Aiming
at the problem of the poor robustness of existing methods to deal with diverse industrial
weld image data, Deng et al. [26] collected a series of asymmetric laser weld images for
study. The median filter was used to remove the noises, the deep CNN was employed for
feature extraction, and the activation function and the adaptive pooling approach were
improved.

2.2. Applications of Lightweight CNNs


Although the application effect of deep CNNs is getting better and better, the training
difficulty, the number of model parameters, and computational complexity also grow with
the increasing number of network layers. However, fast and in-situ [27] detection of the
welding surface quality is often required at the welding workstation, so as to facilitate the
discovery and repair of welding defects and provide a reference for subsequent welding
operations. Therefore, weld surface defect recognition needs to take into account the two
indicators of recognition accuracy and recognition speed.
The limitations of deep CNNs have prompted the development of lightweight CNNs.
Subsequently, a series of lightweight CNNs appeared such as ShuffleNet [28], Xception [29],
MobileNet [30]. They have fewer model parameters while ensuring accuracy, which greatly
reduces the computational complexity and makes the model run faster. The emergence of
these lightweight models makes it possible to run deep learning models directly on mobile
and embedded devices.
Mathematics 2022, 10, 3678 4 of 18

Lightweight CNNs have been used in many fields, especially in image recognition
tasks. Scholars have proposed a lot of model improvement methods for specific problems
to achieve better results. In the field of aerial image detection, Joshi et al. [31] proposed
an ensemble of DL-based multimodal land cover classification (EDL-MMLCC) models
using remote sensing images, namely VGG-19, Capsule Network, and MobileNet for
feature extraction. Junos et al. [32] proposed a feasible and lightweight aerial images object
detection model and adopted an enhanced spatial pyramid pooling to increase the receptive
field in the network by concatenating the multi-scale local region features. In the field
of garbage classification, Chen et al. [33] proposed a lightweight garbage classification
model GCNet (Garbage Classification Network), which contains three improvements
to ShuffleNetv2. The experimental results show that the average accuracy of GCNet
on the self-built dataset is 97.9%, and the amount of model parameters is only 1.3 M.
Wang et al. [34] proposed an improved garbage identification algorithm based on YOLOv3,
introduced the MobileNetV3 network to replace Darknet53, and a spatial pyramid pooling
structure is added to reduce the computational complexity of the network model. In the
field of medical image recognition, Rangarajan et al. [35] developed a novel fused model
combining SqueezeNet and ShuffleNet to evaluate with CT scan images. The fused model
outperformed the two base models with an overall accuracy of 97%. Natarajan et al. [36]
presented a two-stage deep learning framework UNet-SNet for glaucoma detection, a
lightweight SqueezeNet is fine-tuned with deep features of the ODs to discriminate fundus
images into glaucomotousor normal.
Although lightweight CNNs have great application potential in many areas, there are
few studies discussing their applications in the field of weld defect recognition. Actually,
in the in-situ weld defect detection scenario, the lightweight CNNs can be well applied
to balance the recognition accuracy and recognition speed. In this paper, we proposed an
improved MobileNetV2 algorithm to deal with the weld defect detection problem.

3. Weld Surface Defect Recognition Model


3.1. Weld Surface Defect Dataset
Weld defects include internal defects and surface defects. This paper focuses on weld
surface defect detection. The weld surface defect images used in this study are mainly taken
from the workstation and partially collected from the Internet as a supplement to form the
original image dataset. Because the defect area in the original weld image is smaller than
the entire weld image, and some weld images contain two or more types of weld defect, it
is difficult to train the model by directly using the original images as the input of the neural
network. Therefore, the original weld images need to be preprocessed. First, uniform
grayscale processing is performed on all original weld images. Second, a 224 × 224 area
containing only one type of weld defect is intercepted as the region of interest (ROI), and
the ROI image is used as the input of the neural network. There are 610 weld surface defect
images after preprocessing, including 198 images of crack, 186 images of the blowhole,
26 images of incomplete fusion, and 200 images of normal. Some of the four types of weld
surface defect images after preprocessing are shown in Figure 2. In the figure, the specific
location of the defect has been marked with the red circle.
Due to the number of original sample images is still small and unbalanced distributed,
the ROI weld image is subjected to data enhancement processing [37], such as flip transfor-
mation, random rotation transformation, and enhancement of brightness and contrast, to
increase the amount of training data and improve the generalization ability of the model.
Taking the blowhole defect image as an example, the image comparison before and after
enhancement is shown in Figure 3. The sample dataset after enhancement has 2845 weld
images, and the types of weld defects include four categories: crack, blowhole, incomplete
fusion, and normal. The detailed number of various defect images is shown in Table 1.
All these images were divided into a training dataset, a validation dataset, and a testing
dataset at a ratio of 7:2:1. A total of 1995 images are obtained for training, 567 images for
validation, and 283 images for testing. In order to maximize the effect of using this model
Mathematics 2022, 10, 3678 5 of 18

Mathematics 2022, 10, 3678 for defect detection in the workshop, the testing dataset images are not included in the19
5 of
model training process.

Figure 2. Sample images in the self-built dataset (partial).

Due to the number of original sample images is still small and unbalanced distrib-
uted, the ROI weld image is subjected to data enhancement processing [37], such as flip
transformation, random rotation transformation, and enhancement of brightness and con-
trast, to increase the amount of training data and improve the generalization ability of the
model. Taking the blowhole defect image as an example, the image comparison before
and after enhancement is shown in Figure 3. The sample dataset after enhancement has
2845 weld images, and the types of weld defects include four categories: crack, blowhole,
incomplete fusion, and normal. The detailed number of various defect images is shown in
Table 1. All these images were divided into a training dataset, a validation dataset, and a
testing dataset at a ratio of 7:2:1. A total of 1995 images are obtained for training, 567 im-
ages for validation, and 283 images for testing. In order to maximize the effect of using
this model for defect detection in the workshop, the testing dataset images are not in-
cluded
Figure
Figure 2.2.in the model
Sample
Sample training
images
imagesin
inthe process.
theself-built
self-builtdataset
dataset(partial).
(partial).

Due to the number of original sample images is still small and unbalanced distrib-
uted, the ROI weld image is subjected to data enhancement processing [37], such as flip
transformation, random rotation transformation, and enhancement of brightness and con-
trast, to increase the amount of training data and improve the generalization ability of the
model. Taking the blowhole defect image as an example, the image comparison before
and after enhancement is shown in Figure 3. The sample dataset after enhancement has
2845 weld images, and the types of weld defects include four categories: crack, blowhole,
incomplete fusion, and normal. The detailed number of various defect images is shown in
Figure3.
Figure
Table 3.Comparison
1. Comparison
All ofofdefect
these images defect
wereimages
images before
before
divided and
and
into after
aafter enhancement.
enhancement.
training dataset, a validation dataset, and a
testing dataset at a ratio of 7:2:1. A total of 1995 images are obtained for training, 567 im-
Table 1. Sample distribution of weld surface defect dataset.
ages for validation, and 283 images for testing. In order to maximize the effect of using
this model for Class
Sample defect detection inBefore
the workshop,
Enhancement the testing dataset images are not in-
After Enhancement
cluded in the model
Crack training process. 198 753
Blowhole 186 810
Incomplete fusion 26 576
Normal 200 706
Total 610 2845

3.2. Algorithm Design


To solve the problem of in-situ recognition of weld surface defects, the lightweight
MobileNetV2 [30] is used as the backbone of the network to build a weld surface defect
recognition model. There is a certain room for optimization when using MobileNetV2 to rec-
Figure 3. Comparison of defect images before and after enhancement.
Normal 200 706
Total 610 2845

3.2. Algorithm Design


Mathematics 2022, 10, 3678 To solve the problem of in-situ recognition of weld surface defects, the lightweight 6 of 18
MobileNetV2 [30] is used as the backbone of the network to build a weld surface defect
recognition model. There is a certain room for optimization when using MobileNetV2 to
recognize
ognize weld weld surface
surface defects,
defects, the improvements
the improvements are asare as follows:
follows: (1) Embed
(1) Embed the Convolu-
the Convolutional
tional Block Attention Module (CBAM) [38]; (2) Reduce
Block Attention Module (CBAM) [38]; (2) Reduce the width factor α. the width factor α.
Thestructure
The structurediagram
diagramof ofthe
theimproved
improvedMobileNetV2
MobileNetV2isisshownshownin inFigure
Figure4,4,including
including
themain
the mainpart
partofofthe
thenetwork
networkand andthe
thefully
fullyconnected
connectedlayer.
layer.The
Themain
mainpart
partofofthe
thenetwork
network
includes 17
includes 17 bottleneck
bottleneck blocks, and the the expansion
expansionfactors
factorsareare66except
exceptthethebottleneck
bottleneck0 0isis1.
1.Bottleneck
Bottleneck blocks
blocks located
locatedononthethe
same
samerow have
row thethe
have samesamenumber
numberof output channels
of output (de-
channels
noted by c). The bottleneck in the blue box is an inverted
(denoted by c). The bottleneck in the blue box is an inverted residual residual structure without
without thethe
shortcutconnections,
shortcut connections, the the bottleneck
bottleneck in in the
the red
redbox
boxisisan
aninverted
invertedresidual
residualstructure
structurewith
with
the
theshortcut
shortcutconnections,
connections,and andssrepresents
representsthe thestrides
stridesofofthe
theDWDWconvolution.
convolution.The TheCBAM
CBAM
module
module isisembedded
embedded in inthe
thebottleneck
bottleneck of ofthe
thered
redbox.
box. The
The structure
structure ofofthe
thebottleneck
bottleneck isis
shown
shownon onthe
theleft
leftside
sideof ofthe
thefigure.
figure.PWPWstands
standsfor forpointwise
pointwiseconvolution,
convolution,andandDWDWstands
stands
for depthwise convolution. 𝑀c and 𝑀
for depthwise convolution. M and M s represent channel attention mechanism
represent channel attention mechanism and spa-and spatial
attention mechanism
tial attention mechanism respectively.
respectively.

Figure4.4.Model
Figure Modelstructure
structureof
ofimproved
improvedMobileNetV2.
MobileNetV2.

3.2.1. Lightweight MobileNetV2


MobileNetV2 is a lightweight CNN proposed by the Google team in 2018, and it
is a network structure specially tailored for mobile terminals and resource-constrained
environments [30]. While maintaining the same accuracy, it significantly reduces the
number of operations and memory requirements. Its advantages are listed as follows:
(1) Depthwise separable convolution is the core of MobileNetV2 to achieve lightweight
performance.
The basic idea is to decompose the entire convolution process into two parts. The
first part is called depthwise (DW) convolution, which performs lightweight convolution
by applying a single convolution kernel to each channel of the input feature map, so the
number of channels of the output feature matrix is equal to the input feature matrix. The
second part is called pointwise (PW) convolution, and the convolution kernel size is 1 × 1,
(1) Depthwise separable convolution is the core of MobileNetV2 to achieve lightweight
performance.
The basic idea is to decompose the entire convolution process into two parts. The first
part is called depthwise (DW) convolution, which performs lightweight convolution by
Mathematics 2022, 10, 3678 7 of 18
applying a single convolution kernel to each channel of the input feature map, so the num-
ber of channels of the output feature matrix is equal to the input feature matrix. The sec-
ond part is called pointwise (PW) convolution, and the convolution kernel size is 1 × 1,
which
whichconstructs
constructsnew
newfeatures
featuresbybylinearly
linearlycombining
combiningeacheachchannel
channelofofthe
theinput
inputfeature
feature
map.
map.The
Theprinciple
principleofofPWPWconvolution
convolutionisisroughly
roughlythe
thesame
sameasasstandard
standardconvolution.
convolution.Since
Since
the number of channels of the output feature matrix is determined by the
the number of channels of the output feature matrix is determined by the number of con- number of
convolution kernels, it has the function of raising dimensions and reducing
volution kernels, it has the function of raising dimensions and reducing dimensions. Thedimensions.
The schematic
schematic diagram
diagram of depthwise
of depthwise separable
separable convolution
convolution is shown
is shown in Figure
in Figure 5. 5.

Figure5.5.The
Figure Theprinciple
principleofofdepthwise
depthwiseseparable
separableconvolution.
convolution.

Assuming
Assumingthat thatthe
thewidth
widthofofthe
theinput
input feature
feature map
map is is w𝑤in , ,the heightisis hℎin , ,and
theheight andthe
the
number of channels is M, the convolution kernel size is k × k, the width and
number of channels is 𝑀, the convolution kernel size is 𝑘 × 𝑘, the width and height of the height of
the output feature map before and after convolution remain unchanged,
output feature map before and after convolution remain unchanged, and the number of and the number
of channels is N, the computational complexity of standard convolution and depthwise
channels is 𝑁, the computational complexity of standard convolution and depthwise sep-
separable convolution is P1 and P2 respectively, then the calculation formulas of P1 and P2
arable convolution is 𝑃 and 𝑃 respectively, then the calculation formulas of 𝑃 and 𝑃
are as follows:
are as follows:
P1 = win × hin × M × N × k × k (1)
𝑃 =𝑤 ×ℎ ×𝑀×𝑁×𝑘×𝑘 (1)
P2 = win × hin × M × k × k + win × hin × M × N × 1 × 1 (2)
𝑃 =𝑤 ×ℎ ×𝑀×𝑘×𝑘+𝑤 ×ℎ ×𝑀×𝑁×1×1 (2)
The ratio of P2 to P1 is:
The ratio of 𝑃 to 𝑃 is:
P2 w × hin × M × k × k + win × hin × M × N × 1 × 1 1 1
= in × × × × × × × × × = + 2 (3)
P1 =win × hin × M × N × k × k = + N k (3)
× × × × ×

InInsummary,
summary,depthwise
depthwise separable convolution
separable reduces
convolution computation
reduces compared
computation to stan-
compared to
1 1
dard convolution by
standard convolution +
N by k 2 times.
+ The convolution kernel size used in the MobileNetV2
times. The convolution kernel size used in the Mo- is
3 × 3, so the computational cost is 8 to 9 times smaller than that of standard convolution.
bileNetV2 is 3 × 3, so the computational cost is 8 to 9 times smaller than that of standard
(2)convolution.
The inverted residual structure effectively solves the gradient vanishing.
(2) The
Thedepth of CNN
inverted affects
residual the recognition
structure effectivelyaccuracy of weld
solves the surface
gradient defects to a large
vanishing.
extent,The
anddepth
a deeper network means stronger feature expression ability. Therefore,
of CNN affects the recognition accuracy of weld surface defects to deepen-
a large
ing the network depth is a common method to improve image recognition accuracy.
extent, and a deeper network means stronger feature expression ability. Therefore, deep- Simply
stacking
ening the more layersdepth
network will lead
is a to gradient
common vanishing:
method the recognition
to improve accuracy reaches
image recognition a
accuracy.
highly stable state, and the accuracy drops sharply after reaching the highest
Simply stacking more layers will lead to gradient vanishing: the recognition accuracy point. The
residual module of some models (e.g., ResNet) that adds identity mapping allows the
neural network with more layers, and the recognition effect is effectively improved at the
same time. However, this residual structure undergoes a process of “dimension reduction-
feature extraction-dimension raising”, which cause the extractable image features to be
compressed.
The inverted residual structure in MobileNetV2 first uses PW convolution with a kernel
size of 1, then uses DW convolution with a kernel size of 3, and then uses a PW convolution
with a kernel size of 1. It has gone through the process of “dimension raising-feature
extraction-dimension reduction”, as shown in Figure 6. Compared with the traditional
residual structure, the inverted residual structure avoids image compression before feature
extraction and increases the number of channels through PW convolution to enhance the
expressiveness of features. At the same time, another advantage of this structure is that it
allows the use of smaller input and output dimensions, which can reduce the number of
The inverted residual structure in MobileNetV2 first uses PW convolution with a ker-
nel size of 1, then uses DW convolution with a kernel size of 3, and then uses a PW con-
volution with a kernel size of 1. It has gone through the process of “dimension raising-
feature extraction-dimension reduction”, as shown in Figure 6. Compared with the tradi-
tional residual structure, the inverted residual structure avoids image compression before
Mathematics 2022, 10, 3678 8 of 18
feature extraction and increases the number of channels through PW convolution to en-
hance the expressiveness of features. At the same time, another advantage of this structure
is that it allows the use of smaller input and output dimensions, which can reduce the
networkofparameters
number and computational
network parameters complexity,complexity,
and computational reduce the reduce
runningthe
time and realize
running time
the lightweight of the model.
and realize the lightweight of the model.

Figure
Figure6.6.The
Theschematic
schematicdiagram
diagramofofthe
thebottleneck
bottleneckstructure. (a)(a)
structure. With thethe
With shortcut connection,
shortcut (b)(b)
connection,
Without
Without thethe shortcut
shortcut connection.
connection.

Notethat,
Note that,when
whenthethestride
strideisis11and
andthetheoutput
outputfeature
featuremapmaphashasthethesame
sameshape
shapeas asthe
the
inputfeature
input featuremap,
map,thetheshortcut
shortcut connection
connection is performed,
is performed, as shown
as shown in Figure
in Figure 6a. when
6a. when the
the stride
stride is 2, there
is 2, there is no isshortcut
no shortcut connection,
connection, as shown
as shown in Figure
in Figure 6b. The 6b.purpose
The purpose
of intro- of
introducing
ducing the shortcut
the shortcut connections
connections is toisimprove
to improve thethe ability
ability ofofgradient
gradientpropagation
propagationand and
solvethe
solve theproblem
problemof ofgradient
gradient vanishing
vanishing caused
caused byby the
the deepening
deepening of of network
network layers.
layers.
As shown in Figure 6, assuming that the input feature map is 𝐹, then 𝐹1 , 2𝐹, F
As shown in Figure 6, assuming that the input feature map is F, then F , F , 3𝐹can be
can
expressed as:
be expressed as:
F1 = f RL ( f PWc ( F )) (4)
F2 = f RL ( f DWc ( F1 )) (5)
F3 = f Ln ( f PWc ( F2 )) (6)
In these formulas above, f PWc and f DWc are the PW convolution calculation and DW
convolution calculation respectively, f RL is the ReLU6 activation function, f Ln is the linear
activation function.
Therefore, when there is a shortcut connection, the operation process of the bottleneck
structure can be expressed as:
Fout = F + F3 (7)
when there is no shortcut connection, the operation process of the bottleneck structure can
be expressed as:
Fout = F3 (8)
Therefore, when there is a shortcut connection, the operation process of the bottle-
neck structure can be expressed as:
𝐹 =𝐹+𝐹 (7)
when there is no shortcut connection, the operation process of the bottleneck structure can
Mathematics 2022, 10, 3678 9 of 18
be expressed as:
𝐹 =𝐹 (8)
In
InEquations
Equations(7)
(7)and (8),Fout
and(8), 𝐹 represents thethe
represents output feature
output map.
feature map.
3.2.2. Improved MobileNetV2
3.2.2. Improved MobileNetV2
(1) Embed the Convolutional Block Attention Module
(1) Embed the Convolutional Block Attention Module
In
Inthis
thispaper,
paper,the
theCBAM
CBAMisisparalleled
paralleledin
inbottleneck
bottleneckblocks
blocksthat
thathave
haveshortcut
shortcutconnec-
connec-
tions, as shown in Figure 7. The CBAM module integrates the channel
tions, as shown in Figure 7. The CBAM module integrates the channel attentionattention mechanism
mecha-
and
nismtheand
spatial attention
the spatial mechanism
attention [38], which
mechanism [38],can simultaneously
which focus onfocus
can simultaneously the feature
on the
map information
feature in both channel
map information and space
in both channel anddimensions, thus focusing
space dimensions, on the important
thus focusing on the im-
features
portantof the image
features andimage
of the suppressing the interference
and suppressing of irrelevant
the interference ofinformation. Therefore,
irrelevant information.
the CBAM module is introduced in MobileNetV2 when extracting the features of weld
Therefore, the CBAM module is introduced in MobileNetV2 when extracting the features
surface defect images to better focus on the defect area and analyze the feature information
of weld surface defect images to better focus on the defect area and analyze the feature
more efficiently.
information more efficiently.

PW DW PW
input output
convolution convolution convolution

Ms

CBAM

Figure7.7.Bottleneck
Figure Bottleneckstructure
structureembedded
embeddedininCBAM
CBAMmodule.
module.

The
Theoperation
operationprocess
processofofCBAM
CBAMisisdivided
dividedinto
intotwo
twoparts.
parts.The
Thefirst
firstpart
partisisthe
thechannel
channel
attention
attention operation process. At first, the input feature map F is subjected to globalaverage-
operation process. At first, the input feature map F is subjected to global average-
pooling and global max-pooling operations to obtain two 1D feature vectors to realize
pooling and global max-pooling operations to obtain two 1D feature vectors to realize the
the compression of the space dimension. The average-pooling function focuses on the
compression of the space dimension. The average-pooling function focuses on the infor-
information of each pixel in the feature map, the max-pooling function focuses on the
mation of each pixel in the feature map, the max-pooling function focuses on the region
region information with the largest response during the gradient propagation process.
information with the largest response during the gradient propagation process. Secondly,
Secondly, these two feature vectors are sent into a shared multi-layer perceptron (MLP)
these two feature vectors are sent into a shared multi-layer perceptron (MLP) network for
network for calculation. Finally, add the corresponding elements of the two feature vectors
calculation. Finally, add the corresponding elements of the two feature vectors and acti-
and activate them through the sigmoid function to obtain the channel attention feature
vate them through the sigmoid function to obtain the channel attention feature map 𝑀 .
map Mc . The calculation formula is as follows:
The calculation formula is as follows:
Mc ( F ) = σ ( f MLP ( f avg ( F )) + f MLP ( f max ( F ))) (9)
𝑀 𝐹 =𝜎 𝑓 𝑓 𝐹 +𝑓 𝑓 𝐹 (9)
where F is the input feature map, f avg and f max are the average-pooling function and the
max-pooling function respectively, f MLP is the MLP function, σ is the sigmoid activation
function.
The second part is the spatial attention operation process. First, the average-pooling
and max-pooling operations are performed on the input feature map F in the channel di-
mension, and then concatenate the corresponding generated two 2D maps. Then, convolve
the spliced feature map and activate it through the sigmoid function to output the spatial
attention feature map Ms . The calculation formula is as follows:

Ms ( F ) = σ( f c ( f avg ( F ); f max ( F ))) (10)

where f c is the convolution calculation.


Therefore, the operation process of CBAM can be expressed as:

F 0 = Mc ( F ) ⊗ F (11)

F00 = Ms F 0 ⊗ F 0

(12)
Mathematics 2022, 10, 3678 10 of 18

0 of
From Equations (8), (9) and (12), it can be known that the output feature map Fout
the bottleneck structure after embedding the CBAM module can be expressed as:
0
Fout = F + F00 + Fout (13)

In summary, this paper embeds the CBAM modules in the inverted residual structures
that have the shortcut connections of lightweight MobileNetV2. The embedding method
is to parallel CBAM in each bottleneck. The purpose is to enable the model to focus on
important features in both channel and space dimensions when extracting weld defect
features, so as to generate better defect feature description information and achieve more
accurate in-situ recognition of weld surface defects.
(2) Reduce the width factor α
It is a hyperparameter in the MobileNet series of models, which can be used to
modify the number of convolution kernels in each layer, thereby controlling the number of
parameters and computational complexity of the network. Taking 224 × 224 input size as
an example, the model performance of MobileNetV2 on the ImageNet dataset under three
common width factors of 1.0, 0.75, and 0.5 is shown in Table 2.

Table 2. Performance of MobileNetV2 under different width factors.

Width Factor MACs (M) Parameters (M) Top 1 Accuracy Top 5 Accuracy
1 300 3.47 71.8% 91.0%
0.75 209 2.61 69.8% 89.6%
0.5 97 1.95 65.4% 86.4%

It can be seen from Table 2 that if the width factor is adjusted from the initial state of
1.0 to 0.5, although have a better performance in the computational cost and the number of
parameters compared with adjusted to 0.75, the recognition accuracy also loses more. In
comprehensive consideration, this paper chooses to appropriately reduce the width factor
to 0.75 to achieve a lightweight model while ensuring accuracy. To sum up, in the image
recognition task of weld surface defects, the width factor α is adjusted to 0.75 to reduce the
Mathematics 2022, 10, 3678 number of convolution kernels in each layer, thereby reducing the inference cost on11mobile
of 19

devices and achieving faster in-situ recognition of weld surface defects.

4.4.Experiment
Experimentand
andResults
Results
4.1. Experiment Environment
4.1. Experiment Environment
The
Theindustrial
industrialscenario
scenarioof
ofweld
weld surface
surface defect recognition is
defect recognition isshown
shownininFigure
Figure8,8,which
which
shows the entire process of weld quality detection. It mainly includes the detection platform
shows the entire process of weld quality detection. It mainly includes the detection plat-
based
form on machine
based vision,vision,
on machine the dataset construction
the dataset process,
construction the creation
process, of the
the creation ofrecognition
the recog-
model, and the defect prediction system.
nition model, and the defect prediction system.

Figure8.8.Industrial
Figure Industrialscenario
scenarioof
ofweld
weld surface
surface defect
defect recognition.

The experiment was performed on a Dell ® 5820T workstation with Windows 10


operating system, using an Intel (R) Xeon (R) W-2245 CPU with a 3.90-GHz and an
NVIDIA Quadro RTX 4000 GPU processor, PyCharm integrated development environ-
Mathematics 2022, 10, 3678 11 of 18

The experiment was performed on a Dell ®5820T workstation with Windows 10 oper-
ating system, using an Intel (R) Xeon (R) W-2245 CPU with a 3.90-GHz and an NVIDIA
Quadro RTX 4000 GPU processor, PyCharm integrated development environment based
on Python 3.7, and Google open source TensorFlow 2.5.0 deep learning framework.
The Adam optimizer was selected for training and the learning rate was set to 0.001,
the batch size was set to 32, cross-entropy was used as the loss function, and the model
was trained for 500 epochs. After training, the testing dataset was input into the model to
verify the weld surface defect recognition accuracy.

4.2. Algorithm Comparison and Analysis


4.2.1. Comparison among Algorithms on the Self-Built Dataset
In order to prove the feasibility and superiority of the improved algorithm in this
study, it is compared with MobileNetV2 and ResNet50, and they are trained respectively
using the self-built weld surface defect dataset of this research. In the training process,
the recognition accuracy and loss value on the training dataset and validation dataset are
recorded after each epoch. In this way, the training situation of the model can be observed
to ensure that each model completes the training under the convergence condition. Plot the
training results of each model on the validation dataset as a curve, as shown in Figure 9.
Mathematics 2022, 10, 3678 Since the generated curve graph has noise, it is necessary to smooth the curve12toofreduce
19

the interference of noise. The reason is that it is more intuitive to compare the recognition
effects of each model. The enlarged part in the figure shows the curve after smoothing.

(a) (b)
Figure 9. The
Figure 9. training curvecurve
The training of each model
of each on the
model onself-built dataset.
the self-built (a) is the
dataset. recognition
(a) is accuracy
the recognition accuracy
curve. (b) is the loss value curve.
curve. (b) is the loss value curve.

The The
experimental
experimentalresults of each
results model
of each model ononthethe
self-built
self-builtweld
weldsurface
surfacedefect
defectdataset
dataset are
are analyzed inin detail,
detail,asasshown
shown in in Table
Table 3. the
3. In In the table,
table, , 𝐴
Amax𝐴, A avg 0
and Aand 𝐴
are theare the
maximum
avg
recognition
maximum accuracy,
recognition average average
accuracy, recognition accuracy,accuracy,
recognition and averageandrecognition accuracy after
average recognition
stabilization,
accuracy E represents
after stabilization, E the numberthe
represents of epochs
numberatof theepochs
beginning ofbeginning
at the convergence.of con-
vergence.
Table 3. Comparison of experimental results of each model on the self-built dataset.
Table 3. Comparison of experimental results of each model on the self-built dataset.
Recognition Accuracy (%)
Model E Parameters (M)
Model Amax RecognitionAavg Accuracy (%)A’avg E Parameters (M)
𝑨𝒎𝒂𝒙 𝑨𝒂𝒗𝒈 𝑨𝒂𝒗𝒈
Improved
Improved 99.0899.08 96.45 97.16 25 25 1.40 1.40
modelmodel 96.45 97.16
MobileNetV2
MobileNetV2 98.5398.53 95.30
95.30 96.10
96.10 36 36 2.26 2.26
ResNet50
ResNet50 98.9098.90 96.31
96.31 97.19
97.19 47 47 23.5723.57

From the the


From experimental
experimentalresults, it can
results, it canbebe
seen
seenthat the𝐴Amax of
thatthe of the improved
improved algorithm
algo-
rithmin in thispaper
this paperisisthe
the highest
highest at 99.08%,
99.08%, which
whichisis0.55%
0.55% and 0.18%
and 0.18%higher than
higher thanthatthat
of of
MobileNetV2 and ResNet50 respectively. Its 𝐴 is also the highest at 96.45%,
MobileNetV2 and ResNet50 respectively. Its A avg is also the highest at 96.45%, which which is is
1.15% andand
1.15% 0.14% higher
0.14% than
higher thatthat
than of MobileNetV2
of MobileNetV2 andandResNet50
ResNet50 respectively.
respectively.ItIttends
tendstoto be
be stable after 25 epochs, has the fastest convergence speed, and the 𝐴 is 1.06% higher
than that of MobileNetV2. In general, the recognition accuracy of the improved algorithm
is roughly the same as that of ResNet50, higher than that of MobileNetV2, while the num-
ber of parameters of the improved algorithm is only about 3/5 of MobileNetV2 and 3/50
Mathematics 2022, 10, 3678 12 of 18

stable after 25 epochs, has the fastest convergence speed, and the A0avg is 1.06% higher than
that of MobileNetV2. In general, the recognition accuracy of the improved algorithm is
roughly the same as that of ResNet50, higher than that of MobileNetV2, while the number
of parameters of the improved algorithm is only about 3/5 of MobileNetV2 and 3/50 of
ResNet50.
Analyze the reasons: First of all, MobileNetV2 and ResNet50 are both excellent CNNs.
However, because the depthwise separable convolution used by MobileNetV2 greatly
reduces the number of parameters and computational complexity compared with tradi-
tional convolutions, this operation achieves a lightweight model and only slightly reduces
the recognition accuracy, which is enough to reflect the superiority of the lightweight
MobileNetV2. Secondly, the improved algorithm in this paper introduces the CBAM mod-
ule integrating the channel and spatial attention mechanism, so that it can focus on the
important features of the weld surface defect image in the two dimensions of the chan-
nel and space, and the effective feature refinement improves the recognition accuracy of
the algorithm. Finally, the adjustment of the hyperparameter width factor α enables the
improved algorithm in this paper to have fewer parameters and faster convergence than
MobileNetV2.
Mathematics 2022, 10, 3678 13 of 19
4.2.2. Comparison among Algorithms on the GDX-ray Dataset
In order to verify that the improved algorithm in this study is also competent for
ray other
imagesimage classification
in the tasks,
public dataset further [39].
GDX-ray experiments were process
The training performed using
curve on the
the weld X-ray
valida-
tionimages
datasetinisthe public
shown dataset10.
in Figure GDX-ray [39].
It can be seenThe training
from Figureprocess
10a thatcurve
afteron
thethe validation
curve be-
dataset
comes is shown
stable, in Figureaccuracy
the recognition 10. It canremains
be seen above
from Figure 10ahas
98%, and thataafter
trendthe
of curve becomes
continuous
stable, the recognition accuracy remains above 98%, and has a trend of continuous
increase. As can be seen in Figure 10b, the loss value converges quickly and tends to zero. increase.
As can be seen in Figure 10b, the loss value converges quickly and tends to zero.

(a) (b)
Figure 10. The
Figure training
10. The curve
training of the
curve improved
of the algorithm
improved on on
algorithm thethe
GDX-ray dataset.
GDX-ray (a)(a)
dataset. The recogni-
The recognition
tionaccuracy
accuracycurve.
curve.(b)
(b)The
Theloss
lossvalue
valuecurve.
curve.

Then, the trained


Then, the trainedmodel was tested
model usingusing
was tested the X-ray weld images
the X-ray in the testing
weld images da-
in the testing
taset, and the
dataset, andclassification accuracy
the classification on the on
accuracy testing datasetdataset
the testing reachedreached
99.28%.99.28%.
Since this da-this
Since
tasetdataset is andataset,
is an open open dataset, many scholars
many scholars have alsohave also conducted
conducted researchresearch on this Fer-
on this dataset. dataset.
gusonFerguson et al.
et al. [40] [40] proposed
proposed a system a for
system for the identification
the identification of defectsofindefects in X-raybased
X-ray images, images,
on the Mask Region-based CNN architecture. The proposed defect detection system sim-sys-
based on the Mask Region-based CNN architecture. The proposed defect detection
tem simultaneously
ultaneously performs defect performs defectand
detection detection and segmentation
segmentation on input
on input images, theimages,
system the
system
reached reached accuracy
a detection a detection accuracy
of 85.0% of 85.0%
on the GDX-ray on welds
the GDX-ray welds testing
testing dataset. Nazarov dataset.
et
Nazarov
al. [41] used theet al. [41] used theneural
convolutional convolutional neural network
network VGG-16 to buildVGG-16 to build
a weld defect a weld defect
classification
modelclassification model and
and used transfer used transfer
learning learning
for training. for training.
The resulting modelTheisresulting
applied model is applied
to a specially
to a specially created program to detect and classify welding defects.
created program to detect and classify welding defects. The model classifies welding The model classifies
de-
welding defects into 5 categories with an average accuracy of about
fects into 5 categories with an average accuracy of about 86%. Hu et al. [42] used an im- 86%. Hu et al. [42]
used an improved pooling method based on grayscale adaptation
proved pooling method based on grayscale adaptation and the ELU activation function and the ELU activation
function the
to construct to construct
improved the improved convolutional
convolutional neural network neural network
(ICNN) model(ICNN) model
for weld flaw for weld
de-
flawimage
tection detection
defectimage defect recognition,
recognition, and the
and the overall overall recognition
recognition rate can reachrate98.13%.
can reach 98.13%.
Fagehi
et al. [43] designed a feature extraction and classification framework to classify three com-
mon welding defects: crack, porosity, and lack of penetration. They used the combination
of image processing and a support vector machine to optimize the model, the total accu-
racy of the classifier would become 98.8%. In contrast, the method in this study has the
Mathematics 2022, 10, 3678 13 of 18

Fagehi et al. [43] designed a feature extraction and classification framework to classify
three common welding defects: crack, porosity, and lack of penetration. They used the
combination of image processing and a support vector machine to optimize the model,
the total accuracy of the classifier would become 98.8%. In contrast, the method in this
study has the highest recognition accuracy for X-ray weld defect images. In short, the
improved algorithm maintains a high recognition accuracy on the X-ray dataset, and the
overall performance is excellent. It shows that the improved algorithm in this paper is
universal.

4.3. Model Testing


In order to further verify the recognition performance of the weld surface defect classi-
fication model, the model was tested with the testing dataset images, and the classification
model evaluation metrics were used to indicate the recognition effect of various defects.
Then a set of weld defect images were input into the model for prediction one by one to
simulate the actual industrial environment of welding defect detection.
(1) Model Performance Evaluation Metrics
The Confusion Matrix is an error matrix, a visual tool for judging model accuracy,
and is often used to evaluate the performance of supervised learning algorithms. In image
classification tasks, it is used to reflect the accuracy of image classification by comparing
the classification results with the actual label values.
Taking binary classification as an example, when the true value is Positive and the
predicted value is Positive, it is expressed as True Positive (TP); When the true value is
Positive and the predicted value is Negative, it is expressed as False Negative (FN); When
the true value is Negative and the predicted value is Positive, it is expressed as False
Positive (FP); When the true value is Negative and the predicted value is Negative, it is
expressed as True Negative (TN). Common model performance evaluation metrics are
Accuracy, Precision, Recall, and Specificity. The calculation formulas can be expressed as:

TP + TN
Accuracy = (14)
TP + FN + FP + TN
TP
Precision = (15)
TP + FP
TP
Recall = (16)
TP + FN
TN
Specificity = (17)
FP + TN
(2) Recognition accuracy test and defect prediction
The model was tested with the testing dataset to verify the recognition ability of the
improved model trained in this paper on weld surface defect images, and the testing result
was visualized with the confusion matrix, as shown in Figure 11. It can be calculated from
the testing result that the recognition accuracy of the model on the testing dataset reaches
98.23%, which is sufficient to meet the high-precision detection requirements for weld
surface defects in the manufacturing process.
To more clearly show the testing results of various defects in the self-built weld surface
defect dataset in this paper, the performance evaluation indicators of precision, recall,
and specificity corresponding to the crack, blowhole, incomplete fusion, and normal were
calculated respectively. The results are shown in Table 4. It can be seen from the table that
the improved MobileNetV2 in this paper has excellent performance for the four types of
defects: crack, blowhole, incomplete fusion, and normal. The three evaluation metrics
corresponding to various defects are all above 96.55%, especially the precision of the normal,
the recall rate of the crack, and the specificity of the normal have reached 100.00%.
improved model trained in this paper on weld surface defect images, and the testing result
was visualized with the confusion matrix, as shown in Figure 11. It can be calculated from
the testing result that the recognition accuracy of the model on the testing dataset reaches
98.23%, which is sufficient to meet the high-precision detection requirements for weld
surface defects in the manufacturing process.
Mathematics 2022, 10, 3678 14 of 18

Mathematics 2022, 10, 3678 15 of 19

11. Testing
Figureresult
Figure 11. Testing result of
of the model inthe
thismodel
study.in this study.
Table 4. Model performance evaluation metrics for each defect class.
Table 4. Model performance evaluation metrics for each defect class.
To more clearly show the testing results of various defects in the self-built weld sur-
Defect Class Precision (%) Recall (%) Specificity (%)
face defect dataset in this
Defect paper, the performance
Class evaluation indicators
Precision (%) of precision, Specificity
Recall (%) recall, (%)
Crack
and specificity corresponding to the crack, 97.40
blowhole, incomplete 100.00
fusion, and normal 99.04
were
Crack 97.40 100.00 99.04
Blowhole
calculated respectively. The results are shown 98.73
in Table 4. It can be96.30
seen from the table that 99.50
Blowhole 98.73 96.30 99.50
Incompletefusion
the improved MobileNetV2
Incomplete fusion
in this paper has 96.55
excellent performance
96.55 98.25
98.25for the four types99.12 of
99.12
Normal
Normal
defects: crack, blowhole, incomplete fusion, 100.00
100.00 98.57
and normal. The three98.57 evaluation metrics cor- 100.00
100.00
responding to various defects are all above 96.55%, especially the precision of the normal,
To
the recall rate of To simulate
thesimulate
crack, andthethe
the weld surfacedefect
specificity
weld surface defect recognition
of therecognition
normal havescene ininthe
reached
scene the workstation
100.00%.
workstation toto the
the great-
greatest
est extent, a group of weld surface defect images was randomly searched
extent, a group of weld surface defect images was randomly searched on the Internet for on the Internet
for model
model prediction.
prediction. TheThe prediction
prediction results
results ofofweld
weldsurface
surfacedefect
defectpictures
picturesareare shown
shown in in
Figure 12.
Figure 12. In
In the
the figure,
figure, the
the predicted
predicted class
class and
and the
the confidence
confidence of of the
the predicted
predicted class
class are
are
displayed above each defect picture. Obviously, the model in this study
displayed above each defect picture. Obviously, the model in this study can accurately can accurately
identify the
identify the defect
defect category
category inin these
these pictures.
pictures.

(a) (b) (c) (d)


Figure 12. Prediction results of weld surface defect pictures. (a): The predicted defect type is crack,
Figure 12. Prediction results of weld surface defect pictures. (a): The predicted defect type is crack,
and the confidence is 0.933. (b): The predicted defect type is blowhole, and the confidence is 0.96.
and the confidence is 0.933. (b): The predicted defect type is blowhole, and the confidence is 0.96.
(c): The predicted defect type is incomplete fusion, and the confidence is 0.996. (d): The predicted
(c): Thetype
defect predicted defect
is normal, type
and the is incomplete
confidence fusion, and the confidence is 0.996. (d): The predicted
is 1.0.
defect type is normal, and the confidence is 1.0.
5. Discussion
5. Discussion
(1) Advantages
(1) Advantages
The algorithm in this paper solves the problem of in-situ recognition of weld surface
The algorithm in this paper solves the problem of in-situ recognition of weld surface
defects, and the
defects, and the recognition
recognition accuracy
accuracy onon the
the testing
testing dataset
dataset reaches
reaches 98.23%.
98.23%. And
And the
the model
model
has a very small size with only 1.4 M parameters. The improved algorithm performs
has a very small size with only 1.4 M parameters. The improved algorithm performs better better
than MobileNetV2 on the self-built dataset and is basically the same as ResNet50,
than MobileNetV2 on the self-built dataset and is basically the same as ResNet50, but the but the
number of
number of parameters
parameters isis only
only 3/50
3/50 of
of that
that of
of ResNet50.
ResNet50.
(2) Limitations
At first, the defect class covered in the self-built weld surface defect dataset in this
paper is not comprehensive enough and the number of original sample images is small.
Especially, the number of incomplete fusion defect images is too small compared with the
Mathematics 2022, 10, 3678 15 of 18

(2) Limitations
At first, the defect class covered in the self-built weld surface defect dataset in this
paper is not comprehensive enough and the number of original sample images is small.
Especially, the number of incomplete fusion defect images is too small compared with
the other three defect classes, and the problem of unbalanced distribution may make the
generalization ability of the model worse. Secondly, for the weld surface defect detection,
this paper solves the problem of “what is the defect”, that is, the recognition of weld surface
defect images, but the problem of “where is the defect” remains to be solved. Finally, the
trained model is not deployed to mobile devices for the actual application scene testing.
(3) Extension
Based on the limitations of this paper, the subsequent research will focus on the
following aspects. First, the defect categories of the self-built weld surface defect dataset
need to be enriched, such as undercut, burn through, spatter, etc. In the meantime, the
number of original sample images of each defect class needs to be expanded to avoid
overfitting caused by insufficient data. The further improvement of the self-built dataset
is conducive to strengthening the generalization ability of the model, so as to meet the
actual needs of accurate recognition for weld surface defects. Second, solve the problem
of “where is the defect”, that is, the target detection task in weld surface defect detection.
This part is based on the improved self-built weld surface defect dataset, and the first step
of object detection requires labeling each defect image. The LabelMe [44] annotation tool
will be used to manually label the defect location and defect class in each image. Next, the
YOLOv3 one-stage target detection algorithm will be used to complete the weld surface
defect detection task [45]. Considering the requirement of model lightweight, the improved
MobileNetV2 in this study is used as the backbone of the YOLOv3 network. Then, the
network model is trained and optimized according to the same process in this paper to
achieve high-precision and high-efficiency weld surface defect detection based on improved
YOLOv3. Third, deploy the trained model to the embedded device with limited memory
for real-time and in-situ prediction of weld surface defects, and two evaluation indicators
of recognition accuracy and recognition efficiency will be used to verify the feasibility of
the improved algorithm proposed in this paper.
In addition, for the problem of weld quality detection, there is also the detection of
weld internal defects and the measurement of weld quality parameters besides the weld
surface defects studied in this paper [46].
For the detection of weld internal defects, the common detection methods currently
used are X-ray inspection, ultrasonic flaw detection, and magnetic flaw detection. The
weld internal defect categories mainly include the internal crack, internal blowhole, slag
inclusion, and incomplete penetration. The next step is to carry out research on the detection
of weld internal defects.
For the measurement of weld quality parameters [47], it is planned to use active
vision technology based on machine vision to achieve, which uses a line laser to emit a
laser line perpendicular to the weld to obtain a laser fringe image, and then process the
laser fringe image to extract features to get the three-dimensional information of the weld
surface, such as weld width, depth of penetration, excess weld metal, etc. Therefore, the
feature extraction of weld laser fringe images is the most critical content in the research
of quality parameter measurement. Laser image feature extraction mainly includes two
parts: centerline extraction [48] and feature point extraction [49]. The centerline extraction
methods mainly include the gray centroid method, curve fitting method, morphological
refinement method, Steger algorithm, and so on. Feature point extraction methods can
be summarized into traditional methods such as the slope analysis method, windowing
analysis method, curve fitting method, corner detection method, and deep learning-based
methods. Subsequent research plans to use the feature point extraction method based on
deep learning, which can directly perform regression analysis from the position of image
pixel points and has strong applicability and anti-interference ability.
Mathematics 2022, 10, 3678 16 of 18

6. Conclusions
Aiming at the in-situ detection of welding quality in the manufacturing process of
pipelines and pressure vessels, this paper studies the recognition and classification method
of weld surface defects, and uses MobileNetV2 as the network backbone to improve it.
First, the CBAM module is embedded in the bottleneck structure, which integrates the
channel and spatial attention mechanism. This lightweight structure effectively improves
the recognition accuracy and only slightly increases the number of model parameters. Then,
reduce the width factor of the network. The adjustment of the width factor only loses
a small reduction in recognition accuracy but effectively reduces the number of model
parameters and computational complexity. The number of parameters of the improved
MobileNetV2 is 1.40 M, and the recognition accuracy on the testing dataset reaches 98.23%.
The improved model performance provides a basis for in-situ recognition of weld surface
defects during production.

Author Contributions: K.D.: Conceptualization; Validation; Writing—review and editing; Super-


vision. Z.N.: Conceptualization; Methodology; Validation; Data curation; Writing—original draft
preparation. J.H.: Validation; Investigation. X.Z.: Investigation F.T.S.C.: Conceptualization; Writing—
review and editing. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by China Postdoctoral Science Foundation (grant numbers:
2021M700528 and 2022T150073), Major Science and Technology Project of Shaanxi Province (grant
number: 2018zdzx01-01-01), and Chunhui Plan Joint Research Project of Ministry of Education.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author. The data are not publicly available due to privacy.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Gao, X.S.; Wu, C.S.; Goecke, S.F.; Kuegler, H. Effects of process parameters on weld bead defects in oscillating laser-GMA hybrid
welding of lap joints. Int. J. Adv. Manuf. Tech. 2017, 93, 1877–1892. [CrossRef]
2. Fang, D.; Liu, L. Analysis of process parameter effects during narrow-gap triple-wire gas indirect arc welding. Int. J. Adv. Manuf.
Tech. 2017, 88, 2717–2725. [CrossRef]
3. Liu, G.; Tang, X.; Xu, Q.; Lu, F.; Cui, H. Effects of active gases on droplet transfer and weld morphology in pulsed-current
NG-GMAW of mild steel. Chin. J. Mech. Eng. 2021, 34, 66. [CrossRef]
4. He, Y.; Tang, X.; Zhu, C.; Lu, F.; Cui, H. Study on insufficient fusion of NG-GMAW for 5083 Al alloy. Int. J. Adv. Manuf. Tech. 2017,
92, 4303–4313. [CrossRef]
5. Feng, Q.S.; Li, R.; Nie, B.H.; Liu, S.C.; Zhao, L.Y.; Zhang, H. Literature Review: Theory and application of in-line inspection
technologies for oil and gas pipeline girth weld defection. Sensors 2017, 17, 50. [CrossRef]
6. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86,
2278–2324. [CrossRef]
7. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017,
60, 84–90. [CrossRef]
8. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with
convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA,
USA, 7–12 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–9.
9. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR) 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [CrossRef]
10. Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ,
USA, 2009; pp. 248–255.
11. Lin, I.; Tang, C.; Ni, C.; Hu, X.; Shen, Y.; Chen, P.; Xie, Y. A Novel, Efficient Implementation of a Local Binary Convolutional
Neural Network. IEEE Trans. Circuits Syst. II Express Briefs 2021, 68, 1413–1417. [CrossRef]
12. Nassif, A.B.; Shahin, I.; Attili, I.; Azzeh, M.; Shaalan, K. Speech recognition using deep neural networks: A systematic review.
IEEE Access 2019, 7, 19143–19165. [CrossRef]
Mathematics 2022, 10, 3678 17 of 18

13. Razzak, M.I.; Naz, S.; Zaib, A. Deep learning for medical image processing: Overview, challenges and the future. In Classification
in BioApps: Automation of Decision Making; Dey, N., Ashour, A.S., Borra, S., Eds.; Springer International Publishing: Cham,
Switzerland, 2018; pp. 323–350, ISBN 978-3-319-65981-7.
14. Otter, D.W.; Medina, J.R.; Kalita, J.K. A survey of the usages of deep learning for natural language processing. IEEE Trans. Neur.
Net. Lear. 2021, 32, 604–624. [CrossRef]
15. Li, Y.; Huang, C.; Ding, L.Z.; Li, Z.X.; Pan, Y.J.; Gao, X. Deep learning in bioinformatics: Introduction, application, and perspective
in the big data era. Methods 2019, 166, 4–21. [CrossRef]
16. Mustageem; Kwon, S. CLSTM: Deep feature-based speech emotion recognition using the hierarchical ConvLSTM network.
Mathematics 2020, 8, 2133. [CrossRef]
17. Khishe, M.; Caraffini, F.; Kuhn, S. Evolving deep learning convolutional neural networks for early COVID-19 detection in chest
X-ray images. Mathematics 2021, 9, 1002. [CrossRef]
18. Han, H.; Wu, Y.H.; Qin, X.Y. An interactive graph attention networks model for aspect-level sentiment analysis. J. Electron. Inf.
Technol. 2021, 43, 3282–3290. [CrossRef]
19. Tsai, C.Y.; Chen, H.W. SurfNetv2: An improved real-time SurfNet and its applications to defect recognition of calcium silicate
boards. Sensors 2020, 20, 4356. [CrossRef]
20. Wan, X.; Zhang, X.; Liu, L. An improved VGG19 transfer learning strip steel surface defect recognition deep neural network
based on few samples and imbalanced datasets. Appl. Sci. 2021, 11, 2606. [CrossRef]
21. Lei, L.; Sun, S.; Zhang, Y.; Liu, H.; Xie, H. Segmented embedded rapid defect detection method for bearing surface defects.
Machines 2021, 9, 40. [CrossRef]
22. Sekhar, R.; Sharma, D.; Shah, P. Intelligent classification of tungsten inert gas welding defects: A transfer learning approach. Front.
Mech. Eng. 2022, 8, 824038. [CrossRef]
23. Kumaresan, S.; Aultrin, K.S.J.; Kumar, S.S.; Anand, M.D. Transfer learning with CNN for classification of weld defect. IEEE Access
2021, 9, 95097–95108. [CrossRef]
24. Jiang, H.; Hu, Q.; Zhi, Z.; Gao, J.; Gao, Z.; Wang, R.; He, S.; Li, H. Convolution neural network model with improved pooling
strategy and feature selection for weld defect recognition. Weld. World 2021, 65, 731–744. [CrossRef]
25. Dong, X.; Taylor, C.J.; Cootes, T.F. Automatic aerospace weld inspection using unsupervised local deep feature learning. Knowl.
Based Syst. 2021, 221, 106892. [CrossRef]
26. Deng, H.G.; Cheng, Y.; Feng, Y.X.; Xiang, J.J. Industrial laser welding defect detection and image defect recognition based on deep
learning model developed. Symmetry 2021, 13, 1731. [CrossRef]
27. Madhvacharyula, A.S.; Pavan, A.V.S.; Gorthi, S.; Chitral, S.; Venkaiah, N.; Kiran, D.V. In situ detection of welding defects: A
review. Weld. World 2022, 66, 611–628. [CrossRef]
28. Zhang, X.; Zhou, X.Y.; Lin, M.X.; Sun, R. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In
Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA,
18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 6848–6856.
29. Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the 30th IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp.
1800–1807.
30. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted residuals and linear bottlenecks. In
Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA,
18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4510–4520.
31. Joshi, G.P.; Alenezi, F.; Thirumoorthy, G.; Dutta, A.K.; You, J. Ensemble of deep learning-based multimodal remote sensing image
classification model on unmanned aerial vehicle networks. Mathematics 2021, 9, 2984. [CrossRef]
32. Junos, M.H.; Khairuddin, A.; Dahari, M. Automated object detection on aerial images for limited capacity embedded device
using a lightweight CNN model. Alex. Eng. J. 2022, 61, 6023–6041. [CrossRef]
33. Chen, Z.C.; Yang, J.; Chen, L.F.; Jiao, H.N. Garbage classification system based on improved ShuffleNet v2. Resour. Conserv. Recy.
2022, 178, 106090. [CrossRef]
34. Wang, Z.; Zhang, R.; Liu, Y.; Huang, J.; Chen, Z. Improved YOLOv3 garbage classification and detection model for edge
computing devices. Laser Optoelectron. Prog. 2022, 59, 0415002. [CrossRef]
35. Rangarajan, A.K.; Ramachandran, H.K. A fused lightweight CNN model for the diagnosis of COVID-19 using CT scan images.
Automatika 2022, 63, 171–184. [CrossRef]
36. Natarajan, D.; Sankaralingam, E.; Balraj, K.; Karuppusamy, S. A deep learning framework for glaucoma detection based on robust
optic disc segmentation and transfer learning. Int. J. Imag. Syst. Tech. 2022, 32, 230–250. [CrossRef]
37. Ma, D.; Tang, P.; Zhao, L.; Zhang, Z. Review of data augmentation for image in deep learning. J. Image Graph. 2021, 26, 487–502.
38. Woo, S.H.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Computer Vision—ECCV 2018, PT VII,
15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu,
C., Weiss, Y., Eds.; Springer: Berlin, Germany, 2018; Volume 11211, pp. 3–19.
39. Mery, D.; Riffo, V.; Zscherpel, U.; Mondragon, G.; Lillo, I.; Zuccar, I.; Lobel, H.; Carrasco, M. GDXray: The Database of X-ray
Images for Nondestructive Testing. J. Nondestruct. Eval. 2015, 34, 42. [CrossRef]
Mathematics 2022, 10, 3678 18 of 18

40. Ferguson, M.; Ak, R.; Lee, Y.T.; Law, K.H. Detection and segmentation of manufacturing defects with convolutional neural
networks and transfer learning. Smart Sustain. Manuf. Syst. 2018, 2, 137–164. [CrossRef] [PubMed]
41. Nazarov, R.M.; Gizatullin, Z.M.; Konstantinov, E.S. Classification of Defects in Welds Using a Convolution Neural Network.
In Proceedings of the 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (Elconrus),
Moscow, Russia, 26–29 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1641–1644.
42. Hu, A.D.; Wu, L.J.; Huang, J.K.; Fan, D.; Xu, Z.Y. Recognition of weld defects from X-ray images based on improved convolutional
neural network. Multimed. Tools Appl. 2022, 81, 15085–15102. [CrossRef]
43. Faghihi, R.; Faridafshin, M.; Movafeghi, A. Patch-based weld defect segmentation and classification using anisotropic diffusion
image enhancement combined with support-vector machine. Russ. J. Nondestruct. Test. 2021, 57, 61–71. [CrossRef]
44. Torralba, A.; Russell, B.C.; Yuen, J. LabelMe: Online image annotation and applications. Proc. IEEE 2010, 98, 1467–1484. [CrossRef]
45. Wang, Z.H.; Zhu, H.Y.; Jia, X.Q.; Bao, Y.T.; Wang, C.M. Surface defect detection with modified real-time detector YOLOv3. J. Sens.
2022, 2022, 8668149. [CrossRef]
46. Liu, T.; Zheng, H.; Bao, J.; Zheng, P.; Wang, J.; Yang, C.; Gu, J. An explainable laser welding defect recognition method based on
multi-scale class activation mapping. IEEE Trans. Instrum. Meas. 2022, 71, 5005312. [CrossRef]
47. Han, J.; Zhou, J.; Xue, R.; Xu, Y.; Liu, H. Surface morphology reconstruction and quality evaluation of pipeline weld based on line
structured light. Chin. J. Lasers-Zhongguo Jiguang 2021, 48, 1402010. [CrossRef]
48. Yang, Y.; Yan, B.; Dong, D.; Huang, Y.; Tang, Z. Method for extracting the centerline of line structured light based on quadratic
smoothing algorithm. Laser Optoelectron. Prog. 2020, 57, 101504. [CrossRef]
49. Zhang, B.; Chang, S.; Wang, J.; Wang, Q. Feature points extraction of laser vision weld seam based on genetic algorithm. Chin. J.
Lasers-Zhongguo Jiguang 2019, 46, 0102001. [CrossRef]

You might also like