0% found this document useful (0 votes)
492 views5 pages

WDXI: The Dataset of X-Ray Image For Weld Defects: Wenming Guo, Huifan Qu Lihong Liang

Uploaded by

alif islam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
492 views5 pages

WDXI: The Dataset of X-Ray Image For Weld Defects: Wenming Guo, Huifan Qu Lihong Liang

Uploaded by

alif islam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2018 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)

WDXI: The Dataset of X-ray Image for Weld Defects

Wenming Guo, Huifan Qu Lihong Liang


School of Software Engineering China Special Equipment Inspection and Research Institute
Beijing University of Posts and Telecommunications Beijing, China
Beijing, China

Abstract—We establish the dataset of X-ray image for weld I. INTRODUCTION


defects called WDXI. WDXI consists of 13,766 X-ray images,
including seven significant types of weld defects. The images of
Welding technology has been used in aerospace,
metal plates, metal fittings and other welded parts are included construction, defense and other fields and departments. Many
in the data set. Qualified nondestructive testing(NDT) inspectors defects will inevitably occur during the welding process. To
annotates every image. After that the image is audited by a ensure the quality of the project, the welding quality must be
higher level inspector. The inspector gives the final result based checked according to the welding standard. The accuracy of the
on the image and image test report. According to the report, we welding inspection is particularly important. At present, the
have sorted out the defect type, defect size, quality level of each radiography technology widely used in the field of the non-
weld, and other detailed information during the NDT process. destructive test(NDT). Without damaging the test substance,
We remove the letters and numbers which was placed manually uniform X-ray or gamma rays pass through the metal or other
when irradiated and extract the weld area from the image. The materials. The radiation will be attenuated according to the
window width and window level method is used to enhance the material of the object. The weld image will be generated based
image quality of weld area and highlights details of defects. On on radiation intensity. Due to subjective judgments and the
this data set, we trained a 5-layer Convolution neural impact of film printing process, there is instability in the
network(CNN) to identify the type of defects. The rate of top-1 manual testing. High-quality public data sets will promote the
accuracy reached 46.6%. Although the result is not very high, it development and research of nondestructive testing and
can provide a useful example of how to use this data set. And this improve the accuracy of weld inspection.
result also shows that there is still an extensive research space for
this task in the future. The WDXI dataset will be publicly For this reason, we collect a large number of X-ray image
available in the future for academic research purposes which for weld defects and create WDXI. In the field of industrial
may promote the development of industrial radiographic NDT. applications, WDXI will help people establish a library of weld
WDXI will be the foundation for the study of weld defects defect features and explore methods for automatic
classification and localization. It will provide data support for the identification of defects. These can be applied to the quality
majority of researchers to test their models. inspection of industrial automation products, automated
welding quality control and so on. In the field of academic
Keywords-Data sets; X-ray image; Computer vision; Weld research, it will provide data resources for researchers. They
Defects; Convolutional neural networks
can build their models on WDXI, compare results with other
researchers. WDXI will promote the development of computer
vision and defect detection.
All images come from the experimental results of a series
of non-destructive testing experiments. We have collected a
total of 16,950 images of which 13,766 of them have valid
annotations. The parameters which related to radiation
detection are also preserved. During the collection of images,
we used a professional film reproducer to transform X-ray
films into digital images. Images are saved in 16-bit TIF format.
The standards and grades of weld defects are evaluated by the
national standards of the People's Republic of China [1]. The
whole dataset is divided into eight different groups regarding
defect types, including round defects, bar defects, gas pores,
lack of fusion, lack of penetration, slag inclusion, cracks and no
defects, covering all the major defects. Figure 1 shows three
Figure 1. Original weld images of no defects(left), lack of fusion(middle)
original example images of them.
and lack of penetration(right)

978-1-5386-8097-1/18/$31.00 ©2018 IEEE 1051


TABLE I. SAMPLE DATA TABLE II. DATA DISTRIBUTION STATISTICS
Parameter name Parameter value Round defects 8,960
Weld number B7-8 Bar defects 1,759
Defect type Bar defects Gas pores 300
Defect description 6mm Lack of fusion 450
Lack of penetration 284
Rating 2
Slag inclusion 969
Penetrated thickness(mm) 22
Cracks 916
Image quality indicator(IQI) index 11 No defects 128
Trans illumination mode Center No labeling 3,184
F1(mm) 800 Total 16,950
X-ray tube voltage range(Kv) 230 type of defect, rating, and experimental parameters.
X-ray tube current range(mA) 5 Parameter details and sample data are shown in Table I.
Exposure time(min) 4
Device model 300EGS2 A total of 16,950 weld images were collected, of which
Film model AGFA-C7 13,766 were marked. The data is divided into eight groups
Film processing mode Auto according to the defect types, and the data distribution is shown
Radioactive sources type X-ray in Table II. Due to the different probabilities of occurrence of
Image quality indicator(IQI) model 2,3 each defect, round defects and bar defects often appear, while
The study of weld defect classification based on the use of other defects appear less frequently. Slight round defects will
traditional methods on private data sets. Kasban et al. [2] not affect the welding quality. Once cracked, it must be re-
proposed a feature extraction method that provides the basis for welded. Therefore, these defects will be avoided as much as
classification and identification. Domingo Mery and Miguel possible during the welding operation, resulting in an
Angel Berti [3] detect weld defects based on texture features. A unbalanced data collection. All images are saved in 16-bit TIF
method for the detection and classification using both texture format, and The images vary in resolution and aspect ratios. In
features and geometrical features was presented by Valavanis the double-byte 16-bit data, the real data accounted for 12 of
and Kosmopoulos [4]. Gang Wang and T. Warren Liao [5] them. The entire image is the gray scale of 4096 level. Due to
proposed the procedure for the automatic classification of the length of the weld, it will be separated by multiple trans
welding defect, including three processes: image processing, illuminations, and ultimately generate multiple images, each
Feature extraction, and Pattern Recognition. They use BSM image represents one part of the weld.
method to extract defect area. And they use fuzzy K Nearest B. Data Annotation
Neighbor(K-NN) algorithm and Multilayer Perceptron(MLP)
We organized and transcribed data based on hand-
neural networks to classify, separately. The artificial neural
written paper reports. We mainly collected defective weld data
network(ANN) is used for classification tasks after feature
extraction [6-7]. A different method to build the classifier and also obtained a small amount of defect-free weld data.
based on image histogram and neural network is described by Data records that are illegible in writing are not included. At
Abouelatta, Ossama, et al. [8]. Principal components the same time, data describing defects of similar type are
analysis(PCA) combined with ANN to reduce the number of treated as the same type. For example, gas pores and bar gas
features [9]. Using the SVM classifier to classify based on pores, are recorded as gas pores. Also, slag inclusion and bar
texture features and morphological features for more efficient
calculations [10]. The combination of PCA and SVM enables
the accuracy to be further improved [11]. According to the
characteristics of the data, setting to manually design features
is the current method. Specific methods only adapt to particular
data sets.
In this paper, we design a pipeline to extract the weld area
and use the window width window level method to enhance the
image quality. We trained a 5-layer convolutional neural
network to extract features and classify the type of weld defects.
II. WDXI DATASET
A. Data Collection
Our laboratories spend a lot of time and effort to collects
data in the equipment manufacturer and the quality inspection
laboratory. The original weld images saved in the form of the
X-ray film, and the quality inspection report saved in paper
format. We converted the film to digital images with a Figure 2. Mean line chart
professional film reproducer. At the same time, all the data in
the paper reports are entered into the computer, including the

1052
and then calculate the maximum point from this sequence,
select the closest point to the 375-pixel central axis, and crop
the 200-pixel rectangular areas on the left and right sides. The
mean line chart is shown in Figure 2.
If the point is too far away from the standard center weld
center axis, the rectangular area with the size of

Figure 4. Enhanced (window width is128, window level is 1828) image is on


the left, original image is on the right. We can clearly see a crack on the left,
but we can’t see it on the right.

Figure 3. 8 example result, each pair is a sample result, the original image is
on the left and the cropped image is on the right.
c) [175:575, 0: h] is cropped as the final result. 8
example results randomly chosen by us are shown in Figure 3.
slag is recorded as slag. Some weld images do not have the After the image is cut well, we use Gaussian blur to reduce
corresponding test report. And we also collected it back as image noise. Images are saved in 16-bit TIF format, of which
unlabeled data. One image corresponds to a type of defect, 12 valid data range, 4096 gray levels entirely. The normal
except 200 of which have multiple defects. Each image is display can't display all gray levels, only can display 256 gray
marked with a serial number, and each image has a record levels. And often the weld defects are not easy to see directly,
corresponding. their grayscale range and background grayscale range are very
similar. Here we use the window width window level method
C. Image Preprocessing to adjust the image to makes the defects clearer. For example,
Professional film reproducer transforms the file into digital if the selected window width is W and the window level is L,
image. There will be a blank area on each side of the image. The gray scale between W f L will be displayed. All the
During the ray inspection, numbers, letters, and arrows were grayscale values smaller than the W-L range are set to 0, and
manually placed to identify the weld number and useful space. the grayscale values larger than the W + L range is set to 255.
Our primary concern is the weld area. Trimming numbers and The remaining gray value linearly converted to 0 ~ 255 range.
arrows will increase the effective area ratio. Due to a large Window width window effect is shown in Figure 4.
number of images, the artificial cut does not work efficiently.
And we collect the weld images are vertical as shown in Figure III. MODEL
1. Based on this, we designed a set of tailoring methods for Our model will deal with a multi-classification problem,
automatic tailoring. which is divided into eight categories: 7 defect types, and no
defects. The model takes each weld image (gray image) as
1) The steps of our tailoring methods are as follows. input, and the neural network will calculate the probability that
a) We extract left and right arrows as a template. Match the image belongs to which class. The maximum value in the
the two arrows in the original image [12]. When the matching selection probability is used as the final prediction category.
similarity is greater than 0.9, the arrow position is adopted. If
there are two match similarity higher than 0.9, choose a point A. Network Detail
with a column coordinate closer to 375 (average weld center Our model uses a convolutional neural network that
axis). Assuming that the width of the image is w, the height is contains a 4-layer convolutional layer. The pooled layer is
h, and the matching point result is (x, y), the rectangular area connected to each convolution layer. The LRN regularization
with the size of [x: x f 400, 0: h] is cropped. layer [13] is connected after the first three pooling layers. The
last layer is a fully connected layer which connected with eight
b) If there is no matching similarity higher than 0.9, We output nodes. We use 40% dropout to prevent overfitting, the
will find the average pixel value for each column in the image, detail of the network structure is shown Table III. Softmax is

1053
TABLE III. NETWORK STRUCTURE C. Results
Type Patch Output size #Params After 100K steps training, we use the test set to measure the
size/stride training results. We finally achieved the top-1 accuracy of
Convolution 5x5/2 200 x 200 x 32 0.8K
46.6% and top-3 accuracy of 77% in the multiclass
Max pool 2x2/2 100 x 100 x 32
LRN
classification of weld defects. Because of the higher signal-to-
Convolution 3x3/2 50 x 50 x 64 18K noise ratio and fewer feature points, the classification accuracy
Max pool2 2x2/2 25 x 25 x 64 is lower than the manual design feature method. In our
LRN previous experiment, a similar network structure was used, but
Convolution 3x3/1 25 x 25 x 128 73K two fully connected layers (384, 192) were used so that the
Max pool 2x2/2 13 x 13 x 128 number of parameters reached 5,000K. At the same time, the
LRN original images were used directly without image enhancement.
Convolution 3x3/1 13 x 13 x 256 294K
The top-1 accuracy was finally around 28% and top-3 accuracy
Avg pool 2x2/2 7 x 7 x 256
Dropout(40%) 7 x 7 x 256
around 53%.
Fc 8 100K For convolutional neural networks, the quality of the data
Total 485K
has a significant influence on the final result. It can be
used to normalize the output. The goal of the final optimization concluded from the previous results that the use of the window
is the cross-entropy loss function. width window level method to enhance the data has resulted in
a specific increase in the final result. The size of the original
L({x, y}1N ) ¦¦ yi( n ) * log( f ( x ( n ) ) i ) image is large, and the defect area occupies a small proportion.
n i
It is difficult to observe defects in the original image. Image
ǂǂǂwhere x is the image and y is the label of the enhancement makes the defect more accessible to be
image. yi(n) represents whether the n-th sample belongs to the i- recognized from the background. In the previous training, there
th category. Its value is 0 or 1. f(x(n))i represents the prediction would be an overfitting phenomenon. A small amount of data
probability of the classification model for the n-th sample cannot drive an extensive network. So we reduce the numbers
image of the i-th category. The prediction probability is the of fully connected layers to one. The number of parameters
output of softmax layer. The probability of prediction for each also decreased significantly.
class adds up to 1. We use the top-1 and top-3 classification
accuracy to measure the result of classification is good or bad. IV. CONCLUSION
B. Training Detail This paper introduces an utterly new weld defect image
The proportion of samples between classes and classes is an data set WDXI. Based on this data set, a convolutional neural
imbalance. The ratio between round defects and no defects network model was designed for defect classification tasks.
reaches 70:1. This imbalance will lead to a decrease in the final The use of template matching in image preprocessing
accuracy [14]. We extracted 1000 examples from each class as automatically removes the numbers and arrows in the image.
a training set, extracted 200 samples from each type as the Keep the weld area in the original picture. The window width
validation set, and extracted 200 samples of each kind as the window level method was used to enhance the image, making
test set. the defect more visible. Finally, the model achieved a top-1
accuracy of 46.6% on the task of identifying weld defect types.
Use the over-sampling method for categories with WDXI data set will be available for academic and research
insufficient sample sizes and use under-sampling for samples purposes.
that exceed the number of samples. Training sets, validation
sets, and test sets do not overlap each other. Finally, there are In this paper, the prediction accuracy of the model is not
8000 samples for the training set, 1600 samples for validation very high. To improve the accuracy of model prediction and
set, and 1600 samples for the test set. the quality of data sets, future work can be divided into the
following sections. First, because there are too few positive
All images were normalized before training. Linearly scales examples in the data set, it is impossible to build models to
the image to have zero mean and unit norm. Randomly flip left predict whether there are defects or nor. We need to collect
and right for each image. We then scaled the variable-sized more positive examples to enrich the dataset. Second, all the
image to 400 * 400. negatives are not illuminated under the same conditions, and
We use a Gaussian distribution to initialize the weights in the gray-scale distribution is not the same. At present, the
the network. The initial learning rate is 0.005, learning rate parameters of the window width window level method are set
decay factor is 0.1. We use the Adam algorithm to optimize the in batches. It is necessary to build a model of automatic
learning parameters to enhance each image more precisely.
loss function with £1 = 0.9, £2 = 0.999. The batch size is set
Third, the proportion of defective areas after image scaling is
to 8. We validated after each epoch and finally chose the model tiny, and a significant amount of detail lost. If we can directly
with the highest accuracy. extract the defect area, we can reduce the size of the image
We build the model based on the tensorflow platform. Our while retaining more information of the defect. In the future,
experimental platform is equipped with two CPU (Intel Xeon we will make more efforts to improve the quality of data sets
E5-2620 2.10GHz), 64GB of memory and graphics card is and improve classification accuracy.
Quadro M2000.
ACKNOWLEDGMENT

1054
This research was supported by the National key foundation [7] Zapata, Juan, R. Vilar, and R. Ruiz. "Automatic Inspection System of
for exploring scientific instrument of China (Project Welding Radiographic Images Based on ANN Under a Regularisation
Process." Journal of Nondestructive Evaluation 31.1(2012):34-45.
No.2013YQ240803).
[8] Abouelatta, Ossama, et al. "Classification of Welding Defects Using
Gray Level Histogram Techniques via Neural Network." 39(2014):M1-
REFERENCES M13.
[1] GB/T 3323-2005, Radiographic examination of fusion welded joints in [9] Vilar, Rafael, J. Zapata, and R. Ruiz. "An automatic system of
metallic materials. classification of weld defects in radiographic images." Ndt & E
[2] Kasban, H., et al. "Welding defect detection from radiography images International 42.5(2009):467-476.
with a cepstral approach." Ndt & E International 44.2(2011):226-231. [10] Wang, Xin. "Recognition of Welding Defects in Radiographic Images
[3] Mery, D, and M. A. Berti. "Automatic detection of welding defects by Using Support Vector Machine Classifier." Research Journal of
using texture features." Insight - Non-Destructive Testing and Condition Applied Sciences Engineering & Technology 2.3(2010):295-301.
Monitoring 45.10(2003):676-681. [11] Mu, Weilei, et al. "Automatic classification approach to weld defects
[4] Valavanis, Ioannis, and D. Kosmopoulos. "Multiclass defect detection based on PCA and SVM." Insight:Non-Destructive Testing and
and classification in weld radiographic images using geometric and Condition Monitoring 55.10(2013):535-539.
texture features." Expert Systems with Applications 37.12(2010):7606- [12] Hanebeck, Uwe D. "Template matching using fast normalized cross
7614. correlation." Optical Pattern Recognition XII International Society for
[5] Wang, Gang, and T. W. Liao. "Automatic identification of different Optics and Photonics, 2001:95-102.
types of welding defects in radiographic images." Ndt & E International [13] Krizhevsky, Alex, I. Sutskever, and G. E. Hinton. "ImageNet
35.8(2002):519-528. classification with deep convolutional neural networks." International
[6] Kumar, Jayendra, R. S. Anand, and S. P. Srivastava. "Flaws Conference on Neural Information Processing Systems Curran
classification using ANN for radiographic weld images." International Associates Inc. 2012:1097-1105.
Conference on Signal Processing and Integrated Networks IEEE, [14] TMasko, D. & Hensman, P. The impact of imbalanced training data for
2014:145-150. convolutional neural networks. Bachelor thesis, KTH, School of
Computer Science and Communication (2015).

1055

You might also like