A New Image Recognition and Classification Method Combining Transfer Learning Algorithm and MobileNet Model For Welding Defects
A New Image Recognition and Classification Method Combining Transfer Learning Algorithm and MobileNet Model For Welding Defects
Corresponding author: LIN CHEN (e-mail: [email protected]). * These authors contributed equally to this work.
The project is supported by the National Natural Science Foundation of China (No.51465005), Guangxi Science and Technology Major Project (No.
AA18118002), Nanning Key Research and Development Project (No. 20181018-1 and 20181018-3)
ABSTRACT Welding quality directly affects the welding structure’s service performance and life. Hence,
the effective monitoring welding defects is essential to ensure the quality of the weld structure. Owing to
the non-uniformity of the shape, position and size of welding defects, it is a complicated task to analyze and
evaluate the acquired welding defects images manually. Fortunately, deep learning has been successfully
applied to image analysis and target recognition. However, the use of deep learning to identify welding
defects is time-consuming and less accurate due to the lack of adequate training data samples, which easily
cause redundancy into the classifier. In this situation, we proposed a new transfer learning model based on
MobileNet as a welding defect feature extractor. By using the ImageNet dataset (non-welding defect data)
to pre-train a MobileNet model, migrate the MobileNet model to the welding defects classification field.
This article suggested a new TL-MobileNet structure by adding a new Full Connection layer (FC-128) and
a Softmax classifier into a traditional model called MobileNet. The entire training process of TL-MobileNet
model has been successfully optimized by the DropBlock technology and Global average pooling (GAP)
method. They can effectively accelerate the convergence rate and improve the classification network
generalization. By testing the proposed TL-MobileNet on the welding defects dataset, it turned out our
model prediction accuracy has arrived at 97.69%. The experimental results show that in several aspects,
TL-MobileNet have better performance than other transfer learning models and traditional neural network
methods.
INDEX TERMS Welding Defects Classification, Feature Extraction, Deep Learning, DropBlock, Transfer
Learning, MobileNet.
reliable classifier to distinguish different types of defects, including convolutional neural network (CNN) [19], deep
presently many researchers have studied and discussed the belief network (DBN) [20] and sparse auto-encoder (SAE)
development of different classification algorithms. Machine [21]. Wang et al. [22] proposed a deep learn-based algorithm
learning methods such as artificial neural network (ANN), for X-ray image multi-defect type classification and
support vector machine (SVM) and fuzzy system are the automatic position recognition. Zhang et al. [21] studied SAE
most widely used in the field of X-ray image defects and particle swarm optimization (PSO) algorithm to realize
recognition. The prime application of fuzzy theory in the real-time detection of welding defects. In addition, Hou et al.
field of welding defects detection was in the late 1990s [3], [19] adopted random oversampling, random under-sampling
Liao et al. [8] studied a fuzzy expert system method for and synthetic minority over-sampling techniques to solve
classification of X-ray defect types, which has better unbalanced sample defected dataset problem, and used deep
classification accuracy than fuzzy k-nearest neighbor and convolutional neural network to identify porosity, cracks,
multi-layer perceptron. Baniukiewicz [9] investigated a new slag inclusion and lack of penetration defects with an
type of compound classifier composed of fuzzy system and accuracy rate of 97.2%. Zhang et al. [23] achieved a high
ANN. But there is a compromise between accuracy and prediction accuracy on relatively small datasets of welding
interpretability in fuzzy defect detection. SVM and ANN are defects based on VGG-16 full convolution neural network.
the most commonly used methods in defect detection. Nevertheless, in some areas the sample size is relatively
Abderrazak et al. [10] established a welding quality small, which affects the prediction results. Thus, many
evaluation method of ANN by simulating welding researchers use transfer learning to overcome the problems of
parameters (welding time, current, voltage, thickness, etc.). small samples, and use the deep CNN model trained on
Zapata et al. [11] modified the ANN to improve the detection ImageNet as a feature extractor to migrate to the small
accuracy of individual and overall defect characteristics. dataset in another field and obtain good results [24]. It is
Yuan et al. [12] studied adaptive organization and adaptive worth noting that these small datasets are completely
feed-forward neural network to figure out the essential different from ImageNet. Zhang et al. [25] studied medical
features of defects and effectively reduce identification errors. images by transfer learning methods and then obtained an
In order to obtain high accuracy and improve the efficiency identification accuracy of 97.041%. Ren et al. [26]
of classification. Mu et al. [13] proposed an automatic researched the automatic surface detection of Decaf model
classification algorithm combining principal component based on deep transfer learning. Compared with other
analysis (PCA) and SVM for selecting the optimal dataset. methods, the accuracy of Ren’s method was improved by
Inspired by this, Chen et al. [14] applied bees algorithm (BA) 0.66%-25.5% in classification task and 2.29%-9.86% in
to extract defect features, used hierarchical multi-class SVM seven-minute defect detection. Yang et al. [27] used the
to obtain the accuracy up to 95%. Qi and Manasa et al. [4, 15] mixed layer strategy to extract different scale features and
provided an idea on how to optimize the feature redundancy obtained a high recognition accuracy in the small dataset
process and improve classification efficiency and accuracy. military target recognition in the end. Since DL method
Also, Extreme learning machine (ELM) is often used in achieves good effect in feature learning and avoids the
image classification research because of its advantages in influence on the prediction result, it has shown great potential
learning rate and generalization ability[16]. Su et al.[17] in welding defects classification.
established an automatic defect identification system for It is difficult to train the deep CNN model without a good
solder joints by extracting texture features of welding defects. training dataset, especially for the small sample size of
Han et al.[18] combined M-estimation with ELM and welding defect labels. Thus, we proposed a new image
proposed a new ME-ELM algorithm, the algorithm can recognition and classification method for welding defects,
effectively improve the anti-interference and robustness of which combines the transfer learning algorithm and
the model, and has high accuracy in the prediction of welding MobileNet model, namely TL-MobileNet model. This TL-
defects. Usually, these shallow machine learning methods are MobileNet model has three advantages. (1) It can solve the
combined with the feature extraction process, which problems of low prediction accuracy and time-consuming,
ultimately affects the machine learning prediction results. which are induced by insufficient welding defects learning
However, it is difficult to know which features should be samples. Because this model combining transfer learning
extracted. Consequently, it is necessary to design efficient theory with trained MobileNet model form a welding defects
Deep Learning (DL) methods to realize automatic feature feature extractor. (2) It has an enhanced feature extraction
learning and welding defects prediction. capability, since it added a new Fully Connected layer (FC-
As a new field of machine learning, DL shows great 128) and a Softmax classifier after the MobileNet. The
potential in the field of defect detection, by continuously network layer structure gets deeper and the feature extraction
reducing the dimension in the process of feature learning to level increases, the final classification accuracy will be
avoid the influence of feature extraction on the identification improved. (3) It can prevent the occurrence of over-fitting,
results, effectively improving the accuracy of defect and has a good generalization ability. Because the Global
detection. DL method has been applied to defect detection, average pooling (GAP) and DropBlock are integrated for the
utilization of optimizing the entire training process in the TL- order to achieve the perfect classification effect of model
MobileNet model. This proposed TL-MobileNet will be training, the weights and features parameters of the training
tested on a welding defects dataset. And its good effect in model of the migrated welding defect image dataset are fine-
welding defect recognition also will be proved by comparing tuned.
with other methods (such as traditional MobileNet, Xception, The structure of TL-MobileNet as shown in Fig 2, which
VGG-16, VGG-19 and ResNet-50). includes three parts: 1) Data preprocessing, 2) Pre-trained
The rest of this paper is organized as follows: Section II, MobileNet model initialization and 3) Defect classifier. The
proposing the related welding defects classification model pre-training MobileNet model is composed of many
TL-MobileNet and DropBlock optimization algorithm; convolutional layers, pooling layers and FC-1024. The
Section III, Experimental research on the classification of number of neurons in the hidden layer of FC layer is 1024,
welding defects based on TL-MobileNet model; and Section which has 28 layers (1+2*13+1=28) and is taken as the
IV, presenting the conclusion and future research work. feature extraction layer for welding defects. The defect
classifer has a Fully Connection layer FC-128 (new layer)
II. ARCHITECTURE OF THE PROPOSED APPROACH and Softmax classifiers for improving the accuracy of
It is difficult to train a deep network structure with a small welding defects classification. Thus, the TL-MobileNet has a
number of labeled samples in the welding defect recognition depth of 29 layers. The Residual Connection Block (RCB) is
field, compared with the well-trained ImageNet dataset the most important element for MobileNet. The RCB-1 and
model with 14 million labeled images. Hence, the proposed RCB-2 structures are used in the pre-trained MobileNet
TL-MobileNet model integrates Transfer learning & model (Fig 2b) to prevent gradient explosion. For welding
MobileNet for improving the classification accuracy of defects classification, the multiple RCB-1 and RCB-2 blocks
welding defects. the training process of TL-MobileNet is are superimposed after inputting the first convolution layer
address, and the performance evaluation method of welding Conv3-32.
defect classification model is presented. In Fig 2, the Conv3-128 indicates that the filter size in the
convolutional layer is 3×3 and its depth is 128. Conv1-128
A. TL-MobileNet welding defects classification model indicates that the filter size in the convolutional layer is 1×1.
1) TL-MOBILENET STRUCTURES FC-128 represents 128 neurons in the full connection layer. It
is worth noting that the structure of Conv1-512 has 5 layers.
2) RESIDUAL CONNECTION BLOCK
The RCB is based on the idea of shortcut connection to skip
convolutional layers, which will help optimize the
parameters of the training process and avoid the problem of
gradient explosion in back propagation of errors.
RCB consists of multiple convolutional layers, batch
normalization [29] and Rectified linear unit [30] (ReLU)
function. Two different structures RCB-1 and RCB-2 are
FIGURE 1. TL-MobileNet based on transfer learning model
shown in Fig 3. RCB-1 represents stride=1 and the input and
The weights and features of the MobileNet model are pre- output feature sizes are the same, so the input and output are
trained in the source domain ImageNet [28] dataset (non- directly added, and F(x) represents the non-linear function of
welded dataset), then they are transfer to the target domain the convolution path, then the output of RCB-1 can be
for welding defects classification (Fig 1). The target domain expressed as equation (1). RCB-2 means that the stride=2
does not use random initialization to start the data learning and the input and output feature sizes are different, then the
process from the beginning, and the model parameters are output of RCB-2 can be expressed by equation (2).
shared between the source domain and the target domain, so = y F ( x) + x (1)
this method will help to improve the learning efficiency. In y = F ( x) (2)
Fig.2 (b). The performances of TL-MobileNet is compared 0.001~0.0001 and decreases every 5 epochs with the factor
with different transfer learning models (MobileNet, Xception, of learning rate decaying 0.5. DropBlock is set to 0.8 in the
VGG-16, VGG-19, ResNet-50). During the training process, TL-MobileNet model, whereas other transfer learning
the variables / adjusted parameters were used as shown in models do not have such optimized technique in the
Table 2. We use the "step" learning strategy. The basic convolution process. Unless otherwise stated, all models are
learning rate of other transfer learning models is set to trained using Adaptive Moment Estimation (Adam)
0 PO 1248
1 LOP 1496
2 CR 1104
3 SI 1168
4 ND 1192
The training parameters of experiment were set as Table 2. Xception. Because the TL-MobileNet employed the deep
And the results of the mean accuracy, running time and separable convolution to compress and accelerate in the
model size of the entire classification process for different training process, which can greatly reduce the number of
models are presented in Table 4. The prediction accuracy of model parameters. The TL-MobileNet model has less
TL-MobileNet increases by 0.94% and 2.7% than MobileNet calculation time and model size compare with other transfer
and VGG-19, respectively. The TL-MobileNet achieved learning models, but it can acquire a higher prediction
96.88% accuracy with running 182.46s. The accuracy of TL- accuracy than other models. This indicates the TL-MobileNet
MobileNet is 0.08% higher than that of Xception, but its has a potential in welding defects detection.
model size is only 12.5MB and the spent time is about 2/3 of
TABLE 3 Comparison Results of TL-MobileNet Model and Transfer Learning Models in 32×32(%)
Transfer Learning Models
Run NO TL-MobileNet
MobileNet Xception ResNet-50 VGG-16 VGG-19
1 96.70 96.62 97.50 62.08 92.03 92.91
2 96.05 96.62 97.26 76.09 95.33 93.40
3 97.02 97.10 96.20 89.05 95.25 93.48
4 97.18 94.12 96.46 62.88 96.46 95.33
5 96.22 95.25 96.50 75.28 96.78 91.55
6 97.42 97.10 96.42 74.88 95.89 95.81
7 96.70 95.65 97.18 75.28 96.94 94.93
8 96.86 94.77 97.34 85.83 96.86 95.89
9 96.86 96.62 96.94 66.99 95.57 93.16
10 97.75 95.49 96.22 73.19 94.85 95.94
Mean 96.88 95.94 96.80 74.16 95.59 94.18
Std 0.48 0.98 0.47 8.31 1.38 1.40
TABLE 4 Detection results of TL-MobileNet Model and Transfer Learning Models (32×32×3)
Models Mean Accuracy (%) Running time(s) Model size (MB)
TL-MobileNet 96.88 182.46 12.5
MobileNet 95.94 175.29 12.5
Xception 96.80 277.53 79.9
ResNet-50 74.16 335.35 90.4
VGG-16 95.59 186.99 58.9
VGG-19 94.18 193.96 76.4
2) THE INFLUENCE OF IMAGE SIZE M ON PREDICTION accuracy. With the increase of image size, the accuracy of
ACCURACY TL-MobileNet becomes significantly higher than that of
The influence of different input size m on the prediction MobileNet, which demonstrates the proposed TL-MobileNet
accuracy of the model was researched. The value of m was is effective in welding defects identification.
set to 32×32, 64×64, 96×96,128×128, respectively, which Combined with the analysis of the experimental results in
was more suitable for defect identification of different sizes. Table 5, when the welding defects image size is increased
The statistical parameters of running 10 times for each model, from 32×32 pixels to 96×96 pixels, the prediction accuracy
such as the maximum (Max), minimum (Min), mean of various transfer learning models is improved. This is
accuracy and standard deviation (Std), were shown in Table because when the size of the picture is increased, the details
5. It is obvious that with the increase of the input defect extracted from the image that can describe the target feature
image size, the accuracy of the model prediction will be are enlarged, accordingly the constructed TL-MobileNet
improved to a certain extent, and the mean accuracy and model can obtain and learn more welding defects features
standard deviation of TL-MobileNet are all better than those from the enlarged picture, and improve the accuracy of defect
of VGG-16, VGG-19 and ResNet-50. prediction. Nevertheless, when the welding defect picture
When the value of m is 96×96, TL-MobileNet achieves its continues to increase from 96×96 pixels to 128×128 pixels,
best prediction result, and the best prediction accuracy is TL-MobileNet model’s prediction accuracy rate drops. This
98.95% by Xception model. Moreover. The prediction is precise because the image resolution is inversely
accuracy of TL-MobileNet was 3.96%, 0.84% and 4.01% proportional to the picture size (resolution = pixel / size),
higher than that of VGG-16, VGG-19 and ResNet-50 models, when the pixel size is fixed, the larger the size will reduce the
respectively. The prediction accuracy of Xception is similar resolution of the picture, cause the picture to be blurry and
to that of TL-MobileNet. When m is 128 × 128, the some defects to be overlapped, and also affect the prediction
maximum of prediction accuracy of TL-MobileNet is accuracy of the model. In other words, it is not the larger the
increase, but the mean accuracy decrease 0.25 than 96×96. size of welding defect image is, the higher the prediction
However, the standard deviation of TL-MobileNet and accuracy of small defects is. When the size of the welding
Xception all increase. The results in Table 5 indicate that the defect image exceeds a size (for our test the size is 96×96),
image size can affect the defects identification and prediction the accuracy of welding defect prediction begins to decrease.
Confusion matrix is the most basic and intuitive method to image size is 96×96, the best results of prediction accuracy
measure the accuracy of classification model. According to achieved for different welding defects (Fig.5c): the prediction
equation (5) ~ equation (7), the confusion matrix of the TL- accuracy of PO, LOP, CR, and ND is 99.61%, 99.65%,
MobileNet model is calculated, which represents the 100 % and 100%, respectively. However, when the input
recognition results of welding defects for different size image size is 128 × 128, the probability of SI being
images. In Fig 5, the row of confusion matrix represents the misclassified as PO is as high as 8.29%, and the features of
actual weld defect type and the column is the predicted defect PO and SI feature details begin to overlap which affect the
type. When the size of the input image is 32 × 32, the model recognition. It is obvious that the LOP, CR and ND
accuracy of the PO prediction is 97.30% and the probability are easily identified, but PO and SI are very difficult
of being misclassified as SI is 2.7% (Fig.5a). When the input identified and easily lead to classification errors for any size.
TABLE 5 Detection results of TL-MobileNet Model and Transfer Learning Models under different input sizes (%)
Size TL-MobileNet MobileNet Xception VGG-16 VGG-19 ResNet-50
Max 97.75 97.10 97.50 96.94 95.94 89.05
m=32×32 Min 96.05 94.12 96.20 92.03 91.55 62.08
Mean 96.88 95.94 96.80 95.59 94.18 74.16
Std 0.48 0.98 0.47 1.38 1.40 8.31
Max 98.39 98.31 98.15 95.73 94.77 95.89
m=64×64 Min 96.94 91.06 96.78 92.35 90.66 84.46
Mean 97.64 96.59 97.39 94.40 92.80 90.32
Std 0.47 2.03 0.46 1.06 1.30 3.81
Max 98.63 98.15 98.95 95.97 97.18 96.30
m=96×96 Min 97.26 97.23 96.05 90.58 96.05 91.38
Mean 97.69 97.68 97.52 93.73 96.85 93.68
Std 0.45 0.46 0.90 1.99 0.52 1.62
Max 99.03 98.23 98.47 95.65 97.50 96.62
m=128×128 Min 96.05 93.64 93.96 94.69 95.65 87.04
Mean 97.44 97.23 97.33 94.94 96.68 93.18
Std 0.94 1.38 1.28 0.79 0.55 2.87
(a) Confusion matrix of input size=32×32×3 (b) Confusion matrix of input shape=64×64×3
(c) Confusion matrix of input size=96×96×3 (d) Confusion matrix of input size=128×128×3
FIGURE 5. Confusion matrix of TL-MobileNet results
In order to verify the prediction accuracy of the proposed features of PO and LOP are significantly distinguishable.
TL-MobileNet model, the TL-MobileNet model was However, sometimes the defect images for SI and CR are
compared with other models for welding defect detection. similar and they are difficult to distinguish (Fig 6), which can
The other models include: back propagation (BP) [4], K- lead to misjudgments (Fig 5).
nearest neighbors (KNN) [4], Extreme Learning Machine Table 3~Table 6 indicate the proposed TL-MobileNet
[18], Histogram of Oriented Gridients (HOG) [19], model can obtain good results for different types of welding
Convolutional neural networks (CNN) [31], artificial neural defects, even in small sample datasets. Because the TL-
network (ANN) [11], support vector machines with principal MobileNet model combined the transfer learning, FC-128
component analysis (PCA-SVM) [32] and Extreme learning (new layer) and Softmax classifiers, it has the advantages in
machine [17]). Table 6 represents the comparisons between feature learning. Furthermore, TL-MobileNet is better than
the accuracy of the proposed method and that of other MobileNet, Xception, VGG-16, VGG-19, ResNet-50 and
researchers in the prediction of welding defects (the input traditional methods (such as BP, KNN, HOG, CNN, ANN,
image size 96×96). The TL-MobileNet and ELM models PCA-SVM, ELM) in terms of average prediction accuracy
have stronger robust than the KNN, ANN and BP models. and standard deviation, which further shows the potential of
There are large differences (14.69%) between the ANN and this model in welding defects detection.
CNN models. It is worth mentioning that the prediction TABLE 6 Mean accuracy of different methods
accuracy of the Single-ELM model is 95.45%, higher than Method Mean Accuracy
that of other traditional neural network methods. However, BP [4] 89.00%
the prediction accuracy of TL-MobileNet is 97.69%, ranking KNN [4] 93.00%
the second highest in the queue, and close to that of Traditional Methods HOG [19] 81.60%
Ensemble-ELM with a value difference of only 0.24%. CNN [31] 94.69%
The visualization prediction results of a welding defects by ANN [11] 80.00%
TL-MobileNet are shown in Fig 6. Obviously, in this test PCA-SVM [33] 90.75%
identification process, the TL-MobileNet model has better Single-ELM [17] 95.45%
ELM Method Ensemble-ELM [17] 97.93%
recognition for each welding defect type, and prediction type
is the same as the actual type without misjudgment. The Proposed Method TL-MobileNet 97.69%