Attention-Guided Multitask Learning For Surface Defect Identification
Attention-Guided Multitask Learning For Surface Defect Identification
Inception [3], Densenet [4], and EfficientNet [5] have been pro-
posed in the literature for the classification task. Object detection
is a task that localizes an object using a bounding box. Some
of the notable object detection algorithms include Fast R-CNN
[6], Faster R-CNN, Mask R-CNN [7], single shot detection
(SSD) [8], You Only Look Once (YOLO) [9], etc. Segmentation
is the task of performing pixel-by-pixel classification. Several
segmentation algorithms have been proposed in the literature
including fully convolutional networks, encoder–decoder-based
approaches [10], multiscale and pyramid architectures [11], etc.
However, industrial visual inspection systems barely utilized
the potential of those complex architectures due to several
reasons [12]. One of the main reasons is that the continuous
improvement in industrial processes has resulted in fewer and
fewer defective samples or the number of defective samples
is very limited [13]. This problem of learning from a limited
number of samples is usually referred to as the small sample
problem, which can easily lead to poor generalization ability of
the trained model [14]. In addition, the target surface defects
have different scales, making the deep learning models even
more challenging to identify the small-sized defects. On the one
hand, the visual appearance of the real-world surfaces defects
varies with the type of materials, imaging conditions, and camera
position. On the other hand, it is challenging to distinguish
tiny defects from the noise or non-defect components within
an image (as shown in Fig. 1). Hence, the appearance of false
positives in a defect-free image is an inevitable circumstance.
Furthermore, real-time applications of complex CNN models are
extremely limited due to the long inference time and the resulting
higher computational resource and power consumption.
To address these limitations, we present a novel universal
architecture that integrates classification, segmentation, and de-
tection of surface defects in a single network. Our architecture, Fig. 1. Magnetic particle inspection on threaded fasteners of differ-
Defect-Aux-Net, is primarily motivated by a multitask learning ent surface finish (TekErreka dataset). Surface defects are marked by
red circles and noise due to magnetic particle depositions are marked
(MTL) scheme that exploits useful information from related in yellow.
learning tasks to help mitigate the problem of data scarcity. The
proposed architecture is based on FPN-semantic-segmentation
[11] with the additional tasks of defect classification and de- learning features of related tasks can improve the performance
tection to improve the generalization ability by utilizing the of all tasks.
image-level information as an inductive bias. Specifically, we Overall, the contributions of our work are as follows.
developed a new MTL network based on FPN, where the classifi- 1) First, we propose a Defect-Aux-Net model architecture,
cation task is carried out in the bottom-up pathway of the network which can perform classification, segmentation, and de-
and segmentation is performed in the top-down pathway of the tection of surface defects in a single network. Compared
network. To create a bounding box, we employ two subnetworks with the existing state-of-the-art CNN models, this ar-
in the top-down pathway, where one subnet determines the class chitecture is lightweight and compact in terms of model
associated with the bounding box and the other performs the parameters. From the model training point of view em-
regression to adjust the bounding box position. ploying fewer parameters in the architecture enables the
The FPN-based feature extractor in the proposed network al- model to efficiently learn potential surface defects from
lows surface defects to be recognized at vastly different scales by a smaller number of labeled examples.
efficiently sharing features between image regions. We further 2) In contrast to existing single-task learning, our proposed
introduce the positional and the channel attention mechanisms MTL in surface defect detection facilitates the model
that focus on learning the features of small surface defects to to learn useful representations of the data by exploiting
improve the robustness of detecting small defects surrounded shared information from related tasks.
by a complex background. 3) Considering surface defect detection with a complex
We evaluate our model on TekErreka, and Severstal [15] sur- background, the positional and the channel attention
face defect datasets, with defect classification, segmentation, and mechanisms are incorporated to amplify target features
detection tasks. Experimental results demonstrate that jointly and to reduce the influence of background noise.
SAMPATH et al.: ATTENTION-GUIDED MULTITASK LEARNING FOR SURFACE DEFECT IDENTIFICATION 9715
Fig. 6. Overview of the proposed Defect-Aux-Net architecture. It is mainly composed of classification, segmentation, and detection module that
incorporates multitask loss function.
B. Loss Function
Our proposed method combines three loss functions from the
classification, segmentation, and detection tasks, which provide
mutual sources of inductive bias for each task. Specifically,
the segmentation and detection loss functions signal back to
the entire model (bottom-up and top-down pathway) while the
classification loss signals back only to bottom-up pathway. We
combine and weight the three losses into a multitask loss LM
to leverage the heterogeneous annotations and jointly optimize
multiple tasks as follows:
IV. EXPERIMENTS
A. Datasets
In this article, we evaluate our framework on real-world
surface defect identification problems. We use two challenging
datasets with increasing resolutions and complexities, 1) Sev-
erstal steel sheet [15] and 2) TekErreka steel fastener defect
datasets. Severstal, the largest steel and steel-related mining
company, has recently published the largest industrial steel
Fig. 7. Sample images of Severstal steel with four classes of defect.
sheet surface defect dataset, which contains pixelwise masks
annotated by their technical experts. The dataset contains 12 568
grayscale images of size 1600×256. Each image in the dataset The TekErreka dataset is a self-collected steel fastener surface
has the possibility of having either no defects, a single defect, defect dataset based on a magnetic particle inspection proce-
or multiple defects divided into four classes. Fig. 7 shows dure. The magnetic particle inspection is an excellent method
the example of steel defect images on Severstal datasets. We to investigate near-surface defects in steel fasteners. The basic
randomly select 10% and 20% of the 12 568 original images principle is to magnetize a steel fastener parallel to its surface.
as the validation and test data. The main challenge with this If the fastener is free from defects the magnetic field lines run
dataset is that the interclass similarities between defective and within the fastener and parallel to its surface. In case of magnetic
defect-free examples are very high. inhomogeneity, for instance, near cracks, the magnetic field lines
9718 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 9, SEPTEMBER 2023
will locally leave the surface and a leakage field occurs. When a TABLE I
PERFORMANCE OF THE PROPOSED APPROACH ON LOSS VARIANTS FOR THE
suspension of ferromagnetic particles is applied to the test piece DEFECT SEGMENTATION TASK
surface the magnetic particles will run off at defect-free areas.
In the places of leakage fields, the magnetic particles are at-
tracted and clustered together thus indicating the location of the
defect. The surface defects can be visible under ultraviolet light.
We acquired the TekErreka dataset from a magnetic particle
inspection apparatus located at the Erreka fastening solutions.
The defects in the TekErreka dataset differ in their size, shape,
location, and materials type and thus cover several scenarios
in real-time defect detection. The difficulty in this dataset lies
where TP, TN, FP, and FN denote true positive (correctly iden-
in the similarity of defects and noise due to magnetic particles
tified surface defects), true negative (correctly identified non-
deposition on the defect-free surface of the fasteners. There are
defect images), false positive (erroneously classified images as
many factors responsible for the noise component, which include
surface defect), and false negative (erroneously classified images
magnetic particle size, the amount of magnetic particles used,
as non-defect). Precision measures the percentage of images
ultraviolet light present, etc. The original examples are directly
with surface defects that are correctly classified while recall is
stored in a database as RGB images of size 2464 × 2056. It has
the ratio of correctly classified images with surface defects to
450 positive and 1200 negative examples. We split the TekErreka
all images with surface defects. F1-score can be interpreted as a
dataset into training and testing sets: 80% for training and 20%
harmonic mean of precision and recall. The overall performance
for evaluation of the model performance.
of the classification task is measured by its accuracy.
The segmentation results are evaluated using Dice score and
B. Preprocessing Intersection-over-Union (IoU), which quantify the percentage
We resized the images of the Severstal dataset to 128×800 overlap between the predicted and target binary masks. To
and the TekErreka dataset to 600×600. To keep the pixel values evaluate defect detection results, we used the mean average
in the same scale, we normalized the images using min–max precision (mAP) that compares the detected bounding box to
standardization. It rescales raw pixel values to a range of 0 and the ground truth bounding box and returns a score.
1. This helps the optimizer not get stuck taking steps that are too
large in one dimension, or too small in another. F. Experiments on Defect Segmentation
We performed a series of experiments on the TekErreka
C. Data Augmentation dataset to test the effectiveness of different loss functions. First,
To improve the diversity of the training set, we apply ran- we trained Defect-Aux-Net using BCE, and Dice loss alone as
dom but realistic data augmentation such as rotation, verti- the segmentation loss. Then, it was trained using a combination
cal/horizontal flips, zoom, shear, and channel shifts. of loss functions. The results are shown in Table I.
Using Dice loss alone yielded more accurate results than using
D. Training Details a combination of losses. Additionally, the Dice loss function
assisted our model to converge faster. We use the Dice loss
The Defect-Aux-Net is implemented using the Tensorflow function throughout rest of the experiments.
framework. All the experiments are run on Google-cloud TPU To verify the effectiveness of the segmentation task using
V2 infrastructure, which contains 8 cores with 64 GB memory. the MTL strategy, we compared the proposed MTL network
The network is optimized with the Adam optimizer and trained (Defect-Aux-Net) against the following network with the same
with a batch size of 128 for 50 epochs. We adopt one cycle policy bottom-up backbone (Resnet50 + SE + SA attention module).
[27] to find an optimal learning rate. 1) FPN [11]: This is the original FPN architecture without
the MTL strategy and serves as our baseline.
E. Evaluation Metrics 2) UNet [10]: This network uses an encoder for multilevel
The classification results are evaluated using precision, recall, feature extraction and a decoder that scales them up and
F1-score, and binary accuracy combines multilevel features through stacking.
3) LinkNet [28]: This is similar to UNet with the difference
TP
Recall = (7) of replacing stacking operation with addition in skip
TP + FN connections.
TP 4) PSPNet [28]: Pyramid scene parsing network uses a pyra-
Precision = (8)
TP + FP mid pooling module for multiscale feature extraction.
2 · (Precision · Recall) Based on the experimental results, we observed that the pro-
F1Score = (9) posed multitask learning strategy achieves better segmentation
(Precision + Recall)
performance as compared to the state-of-the-art segmentation
TP + TN models. The Dice and IoU scores of the various segmentation
Accuracy = (10)
TP + FP + TN + FN models on the Severstal dataset are depicted in Figs. 8 and 9.
SAMPATH et al.: ATTENTION-GUIDED MULTITASK LEARNING FOR SURFACE DEFECT IDENTIFICATION 9719
TABLE III
COMPARISON OF PERFORMANCE OF DEFECT-AUX-NET AND
STATE-OF-THE-ART CLASSIFICATION MODELS
TABLE V
TABLE IV SYSTEM SPECIFICATION
EFFECT OF USING ATTENTION MECHANISMS ON TEKERREKA DATASET
TABLE VI
COMPARISON OF THE INFERENCE TIME OF DEFECT-AUX-NET AND BASELINE
MODEL
multiscale feature learning and thus improve the performance as [11] S. Seferbekov, V. Iglovikov, A. Buslaev, and A. Shvets, “Feature pyra-
opposed to the single-task algorithms. Also, the MTL framework mid network for multi-class land segmentation,” in Proc. IEEE/CVF
Conf. Comput. Vis. Pattern Recognit. Workshops, 2018, pp. 272–2723,
can save computational inference time as only a single network doi: 10.1109/CVPRW.2018.00051.
needs to be evaluated for three different tasks. The experimental [12] X. Ni, Z. Ma, J. Liu, B. Shi, and H. Liu, “Attention network for rail
results show that our proposed algorithm greatly improves the surface defect detection via consistency of intersection-over-union (IoU)-
guided center-point estimation,” IEEE Trans. Ind. Inform., vol. 18, no. 3,
performance of the surface defect identification tasks compared pp. 1694–1705, Mar. 2022, doi: 10.1109/TII.2021.3085848.
to other state-of-the-art deep learning algorithms. [13] D. Zhang, K. Song, Q. Wang, Y. He, X. Wen, and Y. Yan, “Two deep
learning networks for rail surface defect inspection of limited samples with
line-level label,” IEEE Trans. Ind. Inform., vol. 17, no. 10, pp. 6731–6741,
VI. CONCLUSION Oct. 2021, doi: 10.1109/TII.2020.3045196.
[14] L. Wen, Y. Wang, and X. Li, “A new cycle-consistent adversarial networks
In this article, we described an attention-guided MTL scheme, with attention mechanism for surface defect classification with small sam-
which combines classification, segmentation, and defection for ples,” IEEE Trans. Ind. Inform., vol. 18, no. 12, pp. 8988–8998, Dec. 2022,
automated surface defect detection. Specifically, we proposed doi: 10.1109/TII.2022.3168432.
[15] Kaggle, “Severstal: Steel defect detection. Can you detect and classify
an extended FPN architecture with Resnet-50 incorporated as defects in steel?,” 2019.
the encoder section of the model. The hybrid loss function [16] M. S. Kim, T. Park, and P. Park, “Classification of steel surface defect
is introduced to enhance the performance of the model. An using convolutional neural network with few images,” in Proc. IEEE 12th
Asian Control Conf., 2019, pp. 1398–1401.
overall accuracy of 97.1%, Dice score of 0.926, and mAP of [17] H. Lin, B. Li, X. Wang, Y. Shu, and S. Niu, “Automated de-
0.762 on classification, segmentation, and detection tasks of the fect inspection of LED chip using deep convolutional neural net-
TekErreka dataset were achieved with Defect-Aux-Net. work,” J. Intell. Manuf., vol. 30, no. 6, pp. 2525–2534, Aug. 2019,
doi: 10.1007/s10845-018-1415-x.
[18] X. Tao, D. Zhang, W. Ma, X. Liu, and D. Xu, “Automatic metallic surface
ACKNOWLEDGMENT defect detection and recognition with convolutional neural networks,”
Appl. Sci., vol. 8, no. 9, 2018, Art. no. 1575, doi: 10.3390/app8091575.
This work was undertaken in the context of DIGIMAN4.0 [19] J. Ren and X. Huang, “Defect detection using combined deep au-
project (“Digital Manufacturing Technologies for Zero-Defect,” toencoder and classifier for small sample size,” in Proc. IEEE 6th
Int. Conf. Control Sci. Syst. Eng., 2020, pp. 32–35, doi: 10.1109/ICC-
https://www.digiman4-0.mek.dtu.dk/). DIGIMAN4.0 is a Eu- SSE50399.2020.9171953.
ropean Training Network supported by Horizon 2020, the EU [20] J. Lian et al., “Deep-learning-based small surface defect detection via
Framework Programme for Research and Innovation under an exaggerated local variation-based generative adversarial network,”
IEEE Trans. Ind. Inform., vol. 16, no. 2, pp. 1343–1351, Feb. 2020,
Project 814225. doi: 10.1109/TII.2019.2945403.
[21] D. Zheng et al., “A defect detection method for rail surface and fasteners
REFERENCES based on deep convolutional neural network,” Comput. Intell. Neurosci.,
vol. 2021, Jul. 2021, Art. no. 2565500, doi: 10.1155/2021/2565500.
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification [22] P. Xu, Z. Guo, L. Liang, and X. Xu, “MSF-Net: Multi-scale feature learning
with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, network for classification of surface defects of multifarious sizes,” Sensors,
pp. 84–90, May 2017, doi: 10.1145/3065386. vol. 21, no. 15, Jul. 2021, Art. no. 5125, doi: 10.3390/s21155125.
[2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image [23] D. Amin and S. Akhter, “Deep learning-based defect detection system in
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, steel sheet surfaces,” in Proc. IEEE Region 10 Symp., 2020, pp. 444–448,
pp. 770–778, doi: 10.1109/CVPR.2016.90. doi: 10.1109/TENSYMP50017.2020.9230863.
[3] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Re- [24] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc.
thinking the inception architecture for computer vision,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7132–7141,
IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2818–2826, doi: 10.1109/CVPR.2018.00745.
doi: 10.1109/CVPR.2016.308. [25] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense
[4] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2,
connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. pp. 318–327, Feb. 2020, doi: 10.1109/TPAMI.2018.2858826.
Pattern Recognit., 2017, pp. 2261–2269, doi: 10.1109/CVPR.2017.243. [26] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for
[5] M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolu- dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2017,
tional neural networks,” in Proc. 36th Int. Conf. Mach. Learn., vol. 97, pp. 2999–3007, doi: 10.1109/ICCV.2017.324.
2019, pp. 6105–6114. [27] L. Smith, “A disciplined approach to neural network hyper-parameters:
[6] R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, Part 1 – learning rate, batch size, momentum, and weight decay,” 2018,
pp. 1440–1448, doi: 10.1109/ICCV.2015.169. arXiv:1803.09820.
[7] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” IEEE [28] A. Chaurasia and E. Culurciello, “LinkNet: Exploiting encoder represen-
Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–397, Feb. 2020, tations for efficient semantic segmentation,” in Proc. IEEE Vis. Commun.
doi: 10.1109/TPAMI.2018.2844175. Image Process., 2017, pp. 1–4, doi: 10.1109/VCIP.2017.8305148.
[8] W. Liu et al., “SSD: Single shot multiBox detector,” in Proc. Eur. Conf. [29] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing
Comput. Vis. (Lecture Notes in Computer Science Series), B. Leibe, J. network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017,
Matas, N. Sebe, and M. Welling, Eds. Cham, Switzerland: Springer, 2016, pp. 6230–6239, doi: 10.1109/CVPR.2017.660.
pp. 21–37. [30] Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into high quality
[9] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog-
Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. nit., 2018, pp. 6154–6162, doi: 10.1109/CVPR.2018.00644.
Pattern Recognit., 2016, pp. 779–788, doi: 10.1109/CVPR.2016.91.
[10] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks
for biomedical image segmentation,” in Proc. Int. Conf. Med. Image
Comput. Comput.-Assist. Intervention, 2015, pp. 234–241.