Algorithms 14 00257 v2
Algorithms 14 00257 v2
Algorithms 14 00257 v2
Article
Metal Surface Defect Detection Using Modified YOLO
Yiming Xu, Kai Zhang and Li Wang *
Abstract: Aiming at the problems of inefficient detection caused by traditional manual inspection
and unclear features in metal surface defect detection, an improved metal surface defect detection
technology based on the You Only Look Once (YOLO) model is presented. The shallow features
of the 11th layer in the Darknet-53 are combined with the deep features of the neural network to
generate a new scale feature layer using the basis of the network structure of YOLOv3. Its goal is to
extract more features of small defects. Furthermore, then, K-Means++ is used to reduce the sensitivity
to the initial cluster center when analyzing the size information of the anchor box. The optimal
anchor box is selected to make the positioning more accurate. The performance of the modified
metal surface defect detection technology is compared with other detection methods on the Tianchi
dataset. The results show that the average detection accuracy of the modified YOLO model is 75.1%,
which ia higher than that of YOLOv3. Furthermore, it also has a great detection speed advantage,
compared with faster region-based convolutional neural network (Faster R-CNN) and other detection
algorithms. The improved YOLO model can make the highly accurate location information of the
small defect target and has strong real-time performance.
YOLO is higher. Even with small sample data, the paper [24] showed that the YOLOv3
still worked well. In reference [25], a comparative analysis was done in terms of preci-
sion, recall, accuracy, and F1 score. The results indicated the usefulness of auto-detecting
convolutional networks. Reference [26] improved the YOLO network and made it all
convolutional, which consists of 27 convolution layers. It provided an end-to-end solution
for surface defects detection. Reference [27] modified the framework of Faster R-CNN by
introducing multi-scale feature extraction and multi-resolution candidate bound extraction
into the network, which improved the detection effectively. Reference [28] improved the
YOLO model replacing the margin style with proportion style. Compared to the old loss
function, the new is more flexible and more reasonable in optimizing the network error.
Reference [29] developed a hybrid model by integrating the YOLO and U-net models. It
helped to make valid decisions at the right time. Furthermore, due to the small size, the
large number, and complex background, reference [30] proposed a two-layer detection
algorithm. Furthermore, it selected different feature extraction networks for each layer.
The test results showed that the detection results of the two-layer detection algorithm were
significantly better than those of the single-layer detection algorithm.
In this paper, a modified YOLOv3 model based on machine vision is proposed to
detect metal surface defects. By making datasets of three kinds of defects, the model
weights are trained after manually labeling the defective images. A new scale feature
layer has been generated from the shallow features of the 11th layer combined with the
deep features in the YOLOv3 model. Furthermore, the improved detection model uses
K-Means++ to analyze the size information of the anchor box on the datasets. It can extract
more features of small defects on the metal surface.
The rest of the paper is organized as follows. Section 2 presents a brief review of the
YOLOv3 neural network model and its classification prediction. Section 3 describes the
improvement of the proposed system and Section 4 outlines the process of defect detection
experiments. The detection results of the surface defects are given meanwhile. In Section 5,
the conclusions of the proposed method are drawn finally.
2. Related Work
2.1. Conventional CNN Models
Basic convolutional neural network (CNN) consists of three structures: convolution,
activation, and pooling. The output result of CNN is the specific feature space of each image.
When processing image classification tasks, the output feature space is generally taken
as the input of fully connected neural network (FCN). Furthermore, the fully connected
layer is used to complete the mapping from input image to label set, namely classification.
The most important work in the whole process is how to adjust the weight of the network
iteratively through the training data, that is, the back propagation algorithm. At present,
mainstream CNNs are all adjusted and combined by simple CNN.
The CNN is one of the most used deep learning models for image detection and
classification [31], due to its high accuracy when compared to other machine learning
algorithms. The inference of CNNs is usually done in centralized high-performance
platforms. The CNN model is therefore known to be faster than other types of deep
learning models without degrading effectiveness. In [32], a new efficient model was
proposed for text detection. The model used MobileNetV2 as a backbone and a balanced
decoder. The balanced decoder is a stack of inverted residual block (IRB) and standard
convolutional layers. It turns out that the proposed compact and accurate scene text
detector (CAST) is efficient and effective.
Darknet-53 Output
Small Scale
13*13
Detection
Input Upsampling
Concat
Medium Scale
Detection
26*26
416*416
Upsampling
Concat
Large Scale
Detection
52*52
The YOLOv3 model has 53 convolutional layers and extracts target features at three
convolutional layers using different scales. Furthermore, then, the features of these three
scales are integrated to conduct target classification.
Substituting it into the sigmoid function to get the prediction function, the formula is
as follows: 1
hθ ( x ) = g θ T x = T
(3)
1 + e−θ x
When the sigmoid value exceeds 0.5, it is determined that the target belongs to this
category. Logistic directly judge whether the target belongs to this category, using multiple
logistic can achieve the purpose of multi-label classification.
In the loss function of the YOLOv3 neural network, the binary cross-entropy loss is
used for classification, and the formula is as follows:
N
L = − ∑ yi log ŷi + 1 − yi log 1 − ŷi (4)
i =1
The advantage of cross-entropy as a loss function is that the use of the sigmoid
function can avoid the problem of learning rate declined in the mean square error loss
function during gradient descent process, because the learning rate can be controlled by
the output error.
y4 y3 y2 y1
26*26 13*13
52*52
104*104
As is shown in the red box, the shallow output of the second residual block is merged
with the deep output of the network after 2 times of upsampling in Darknet-53. Further-
more, then a new feature layer is formed through a convolutional layer with a convolution
kernel size of 1 × 1, making the network more capable of extracting features. The size of
the newly added feature layer is 1/4 the size of the input image, and the input image is
divided into smaller 4 × 4 grids, that is, the number of grids is 104 × 104. A smaller grid
makes the network more sensitive to small targets. The shallow features are merged with
the deeper features output by Darknet-53 to generate a feature layer that is conducive to
the detection of small targets. It not only inherits the deep features but also makes full
use of the shallow features of the network to enhance the model’s ability to extract small
target features, reduce the probability of missed small defect targets and improve detection
accuracy.
The number of anchor boxes for each feature layer in the network is still 3, after
adding a feature layer, the total number is increased from 9 to 12 to strengthen the detection
density. The superposition of the number of feature layers can divide the size levels of
defect targets, and enhance the network’s comprehensive detection capabilities for different
sizes of targets.
3.2. K-Means++
YOLOv3 uses K-Means cluster analysis to obtain anchor boxes, but it has certain
limitations. K-Means is sensitive to the selection of initial clustering centers, and the
clustering results of different initial clustering centers are very different. Since the K value
is not easy to determine in the clustering process, resulting in inaccurate positioning, it is
extremely important to select the appropriate K cluster centers.
Aiming at the problems of K-Means selecting initial clustering centers, K-Means++
is used to solve the shortcomings. Randomly select a sample as the current first cluster
center. Then calculate the shortest distance between each sample and the existing cluster
center, and classify the sample into the category corresponding to the cluster center with
the smallest distance. At the same time, the probability of each sample being identified as
the next cluster center is calculated, and the sample with the highest probability is selected
as the next center. The formula for calculating the probability is:
D ( x )2
p= 2
(5)
∑in=1 D ( xi )
D ( x ) is the shortest distance from each sample point to the current center. Each time
an object is allocated, the cluster center will be recalculated based on the objects in the
existing cluster, and this process will be repeated until no objects are reassigned to other
clusters. Finally, K cluster centers are screened out. As a benefit from the difference between
the acquired 12 anchor boxes, the effect of detecting the target is significantly improved.
The process of selecting cluster centers by K-Means++ greatly reduces the dependence
of the clustering results on the K value and makes the distance between the initial cluster
centers as far as possible, which effectively solves the defects of K-Means.
Image Image
acquisition preprocessing
Reject Defect
defective detection and
product identification
The choice of CCD camera should be based on the accuracy of the object, which
needs to be observed, to determine the resolution. In a dynamic capturing detection
target, the field of vision in one direction should be slightly larger than the size of the
detection target, to avoid incomplete image information. In static target acquisition, the
Algorithms 2021, 14, 257 8 of 14
closer to the detection target size is better under the condition that the light source is
adjusted. In this way, the image acquired has higher accuracy and less postprocessing.
The real object of defect detection is 80 mm and the maximum width is 50 mm, which
requires us to put forward requirements in the selection of industrial cameras. To achieve
a better detection effect, the detection accuracy is considered to be 0.5 mm. In addition,
MV-EM200C CCD industrial camera adopts the mode of frame exposure and belongs to
the plane array camera. Its frame rate, pixel size, optical size, and other performances can
meet the requirements of target detection on the general industry 4.0 assembly line. Finally,
this paper selects the MV-EM200C camera, as shown in Figure 4.
Camera
Lights
Light Source Selection. The choice of the light source is also very important in defect
detection based on machine vision. Table 2 shows the characteristics of various common
light sources.
Among them, in addition to the advantages described in Table 2, LED lights can also
be combined into various shapes. Through the comparison test of the actual light source,
the image acquisition effect is better when the LED ring light is used to illuminate the front
side. The LED ring light source is shown in Figure 4.
Through the overall design of the system described above and the selection and
analysis of related equipment used in the image acquisition module, this paper deploys
the whole system and carries out experiments on the system.
4.2. Dataset
In order to better compare experiments with past advances, the experiment uses the
Tianchi metal surface defect dataset collected by Alibaba Cloud [33]. There are three types
of defects, scratches, deformations, and wrinkles. There are 30 images of each type of
defect, totaling 90. In order to strengthen the model training effect, this study performs
data enhancement operations on the dataset, flips all the images horizontally and vertically,
Algorithms 2021, 14, 257 9 of 14
adjusts the saturation and contrast of the images, and adds some defect-free metal images
to improve the robustness of the data. Finally, a new dataset is formed, a total of 300 sheets.
The classification diagram is shown in Figure 5.
Scratches
Deformations
Wrinkles
This study uses Yolo_Mark to locate and classify the defect location. The dataset is
randomly divided into a train dataset and a test dataset at a ratio of 4:1, that is, 240 sheets
in the train dataset and 60 sheets in the test dataset.
Yolo_Mark software is used to mark the defects on the images, namely, to mark the
coordinates of the location of the defects and the categories of the defects. The contents of
the yaml file are shown in Figure 6. Furthermore, the effect of image labeling is shown in
Figure 7.
TP
precision = (6)
TP + FP
TP
recall = (7)
TP + FN
N
1
mAP =
N ∑ APi (8)
i =1
Among them, true positive (TP) is a positive example that is correctly predicted, false
positive (FP) is a negative example that is incorrectly predicted as a positive example, false
negative (FN) is a positive example that is incorrectly predicted as a negative example, N
is the number of detection categories, AP is the detection accuracy of various types, and
the calculation formula is:
Z 1
AP = precision(recall)d(recall) (9)
0
Figure 8. The curve of loss function. Training loss is measured during each epoch while validation
loss is measured after each epoch.
Algorithms 2021, 14, 257 11 of 14
Figure 9 is the mAP graph. The mAP is a comprehensive measurement index com-
monly used in the field of target detection. It measures the overall detection accuracy of
the detection box under different IOUs. The higher the value, the higher the accuracy of
the model.
Figure 9. The curve of the mAP. The abscissa is the number of the iterations. Furthermore, the y-axis
is the value of the prediction, recall, [email protected], and [email protected]:0.95.
Table 4 shows the comparison of the detection accuracy of various types of defects on
the dataset of YOLOv3 before and after the improved network structure. The K-Means++
clustering algorithm is used to cluster the generated Anchor Box. According to the analysis
of Table 3, the mAP after the improvement of the network structure is 75.1%, which is
1.03 times of that before the improvement. The modified network structure has significantly
improved the accuracy of various detections, especially the detection of small defect targets.
For example, the detection accuracy of deformation is 62.8%, which is 11.8% higher than
that of YOLOv3 before the improvement. In Table 4, the classification accuracy of the
scratches detected by the proposed model is worse than the original one. The reason is that
the improved model can detect small targets more accurately. However, the metal surface
texture is very clear after image preprocessing, so that some metal surfaces are wrongly
detected as scratches resulting in a decrease in accuracy.
The precision is also called the accuracy rate. It shows how many samples of the
predicted results are correct. The precision represents the generalization ability of a model.
The recall refers to how many positive samples of the predicted results are correctly
detected. When Recall = 1, there are no missed tests. The AP curve is calculated by the
area enclosed under the curve of precision and recall. As shown in Table 4, the value of the
precision increased by 6.2% while the value of the recall increased by 5.6%. This indicates
that the improved model reduces the false negative error rate, so that the additional costs
in the next stage of the production line can also be reduced.
Using the YOLOv3 and the modified model detect metal surface defects, respectively,
as shown in Figure 10. Comparing the YOLOv3 and the improved model in this paper,
it can be clearly seen that the improved model in this study detects all the small defect
targets. In the detection of the three types of defects, the detection effect of the improved
model in this paper is better than that of the YOLOv3, indicating that the improved model
can effectively reduce the probability of missed detection.
Deformations
Scratches
Wrinkles
5. Conclusions
In this paper, a surface defect detection system based on the improved YOLOv3 model
is designed and deployed. To ensure the collection of high-quality images in the actual
production environment, this study compares and analyzes in detail a variety of main
image acquisition equipment, taking into account the cost and performance factors of
equipment. The modified model proposed in this paper reaches 75.1% mAP by using
K-Means++, and the reasoning speed reaches 83 FPS.
The improved model can achieve real-time detection while ensuring high accu-
racy. It provides a feasible scheme for eliminating products with surface defects on the
assembly line.
Algorithms 2021, 14, 257 13 of 14
Author Contributions: The authors of this article all had a significant contribution to the work
performed, including the following: Conceptualization, L.W. and Y.X.; Data curation, K.Z.; Formal
analysis, L.W. and K.Z.; Methodology, L.W. and K.Z.; Supervision, Y.X.; Validation, L.W. and K.Z.;
Writing—original draft, L.W. and K.Z.; Writing—review and editing, K.Z. and L.W. All authors have
read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China General
Project: 61973178, National Natural Science Foundation-Smart Grid Joint Fund Key Project: U2066203,
National Natural Science Foundation of China project number 6210020040 and the Nantong Uni-
versity talent introduction project: Research on high precision and strong robust machine vision
detection technology.
Institutional Review Board Statement: Not applicable
Informed Consent Statement: Not applicable
Data Availability Statement: Not applicable
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Wheeler, B.J.; Karimi, H.A. Deep Learning-Enabled Semantic Inference of Individual Building Damage Magnitude from Satellite
Images. Algorithms 2020, 13, 195. [CrossRef]
2. Zhang, J.; Yang, X.; Li, W.; Zhang, S.; Jia, Y. Automatic detection of moisture damages in asphalt pavements from GPR data with
deep CNN and IRS method. Autom. Constr. 2020, 113, 103119. [CrossRef]
3. Song, E.P.; Eem, S.H.; Jeon, H. Concrete crack detection and quantification using deep learning and structured light. Constr. Build.
Mater. 2020, 252, 119096.
4. Yu, L.; Wang, Z.; Duan, Z. Detecting Gear Surface Defects Using Background-Weakening Method and Convolutional Neural
Network. J. Sens. 2019, 2019, 3140980. [CrossRef]
5. Cao, C.; Ouyang, Q.; Hou, J.; Zhao, L. Visual Locating of Reactor in an Industrial Environment Using the Composite Method.
Sensors 2020, 20, 504. [CrossRef] [PubMed]
6. Nver, H.M.; Ayan, E. Skin Lesion Segmentation in Dermoscopic Images with Combination of YOLO and GrabCut Algorithm.
Diagnostics 2019, 9, 72.
7. Tao, T.; Dong, D.; Huang, S.; Chen, W. Gap Detection of Switch Machines in Complex Environment Based on Object Detection
and Image Processing. J. Transp. Eng. Part A Syst. 2020, 146, 04020083. [CrossRef]
8. Zhang, H.W.; Zhang, L.J.; Li, P.F.; Gu, D. Yarn-dyed Fabric Defect Detection with YOLOV2 Based on Deep Convolution Neural
Networks. In Proceedings of the 2018 IEEE 7th Data Driven Control and Learning Systems Conference (DDCLS), Enshi, China,
25–27 May 2018.
9. Roy, S.S.; Haque, A.U.; Neubert, J. Automatic diagnosis of melanoma from dermoscopic image using real-time object detection.
In Proceedings of the 2018 52nd Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 21–23
March 2018.
10. He, D.; Xu, K.; Zhou, P. Defect detection of hot rolled steels with a new object detection framework called classification priority
network. Comput. Ind. Eng. 2019, 128, 290–297. [CrossRef]
11. Liu, J.; Wang, X. Tomato Diseases and Pests Detection Based on Improved Yolo V3 Convolutional Neural Network. Front. Plant
Sci. 2020, 11, 898. [CrossRef]
12. Huang, Z.; Sui, B.; Wen, J.; Jiang, G. An Intelligent Ship Image/Video Detection and Classification Method with Improved
Regressive Deep Convolutional Neural Network. Complexity 2020, 2020, 1520872. [CrossRef]
Algorithms 2021, 14, 257 14 of 14
13. Mandal, V.; Uong, L.; Adu-Gyamfi, Y. Automated Road Crack Detection Using Deep Convolutional Neural Networks. In
Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018.
14. Dai, W.; Mujeeb, A.; Erdt, M.; Sourin, A. Soldering defect detection in automatic optical inspection. Adv. Eng. Inform. 2020, 43, 101004.
[CrossRef]
15. Adou, M.W.; Xu, H.; Chen, G. Insulator Faults Detection Based on Deep Learning. In Proceedings of the 2019 IEEE 13th
International Conference on Anti-counterfeiting, Security, and Identification (ASID), Xiamen, China, 25–27 October 2019.
16. Du, Y.; Pan, N.; Xu, Z.; Deng, F.; Shen, Y.; Kang, H. Pavement distress detection and classification based on YOLO network. Int. J.
Pavement Eng. 2020, 2020, 1714047. [CrossRef]
17. Huang, Z.; Li, F.; Luan, X.; Cai, Z. A Weakly Supervised Method for Mud Detection in Ores Based on Deep Active Learning.
Math. Probl. Eng. 2020, 2020, 1714047 [CrossRef]
18. Qiao, R.; Ghodsi, A.; Wu, H.; Chang, Y.; Wang, C. Simple weakly supervised deep learning pipeline for detecting individual
red-attacked trees in VHR remote sensing images. Remote Sens. Lett. 2020, 11, 650–658. [CrossRef]
19. Majidifard, H.; Jin, P.; Adu-Gyamfi, Y.; Buttlar, W.G. Pavement Image Datasets: A New Benchmark Dataset to Classify and
Densify Pavement Distresses. Transp. Res. Rec. J. Transp. Res. Board 2020, 2674, 328–339. [CrossRef]
20. Jing, J.; Zhuo, D.; Zhang, H.; Liang, Y.; Zheng, M. Fabric defect detection using the improved YOLOv3 model. J. Eng. Fibers Fabr.
2020, 15, 155892502090826. [CrossRef]
21. Yao, S.; Chen, Y.; Tian, X.; Jiang, R.; Ma, S. An Improved Algorithm for Detecting Pneumonia Based on YOLOv3. Appl. Sci. 2020, 10, 1818.
[CrossRef]
22. Han, J.; Yang, Z.; Xu, H.; Hu, G.; Zhang, C.; Li, H.; Zeng, H. Search Like an Eagle: A Cascaded Model for Insulator Missing Faults
Detection in Aerial Images. Energies 2020, 13, 713. [CrossRef]
23. Kumar, S.S.; Wang, M.; Abraham, D.M.; Jahanshahi, M.R.; Iseley, T.; Cheng, J.C. Deep Learning–Based Automated Detection of
Sewer Defects in CCTV Videos. J. Comput. Civ. Eng. 2020, 34, 04019047. [CrossRef]
24. Pang, L.; Liu, H.; Chen, Y.; Miao, J. Real-time Concealed Object Detection from Passive Millimeter Wave Images Based on the
YOLOv3 Algorithm. Sensors 2020, 20, 1678. [CrossRef]
25. Yang, H.; Jo, E.; Kim, H.J.; Cha, I.H.; Jung, Y.S.; Nam, W.; Kim, D. Deep Learning for Automated Detection of Cyst and Tumors of
the Jaw in Panoramic Radiographs. J. Clin. Med. 2020, 9, 1839. [CrossRef]
26. Li, J.; Su, Z.; Geng, J.; Yin, Y. Real-time Detection of Steel Strip Surface Defects Based on Improved YOLO Detection Network-
ScienceDirect. IFAC-PapersOnLine 2018, 51, 76–81. [CrossRef]
27. Zhang, Z.; Zhang, X.; Lin, X.; Dong, L.; Zhang, S.; Zhang, X.; Yuan, K. Ultrasonic Diagnosis of Breast Nodules Using Modified
Faster R-CNN. Ultrason. Imaging 2019, 41, 353–367. [CrossRef]
28. Ahmad, T.; Ma, Y.; Yahya, M.; Ahmad, B.; Nazir, S. Object Detection through Modified YOLO Neural Network. Sci. Program.
2020, 2020, 8403262. [CrossRef]
29. Majidifard, H.; Adu-Gyamfi, Y.; Buttlar, W.G. Deep machine learning approach to develop a new asphalt pavement condition
index. Constr. Build. Mater. 2020, 247, 118513. [CrossRef]
30. He, Y.; Zhou, Z.; Tian, L.; Liu, Y.; Luo, X. Brown rice planthopper (Nilaparvata lugens Stal) detection based on deep learning.
Precis. Agric. 2020, 21, 1385–1402. [CrossRef]
31. Véstias, M.P. A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing. Algorithms 2019, 12, 154.
[CrossRef]
32. Jeon, M.; Jeong, Y.-S. Compact and Accurate Scene Text Detector. Appl. Sci. 2020, 10, 2096. [CrossRef]
33. Tianchi Data Sets. Alibaba Cloud. Available online: https://tianchi.aliyun.com/dataset (accessed on 20 June 2021).