An End-to-End Steel Surface Defect Detection Approach Via Fusing Multiple Hierarchical Features
An End-to-End Steel Surface Defect Detection Approach Via Fusing Multiple Hierarchical Features
fine-tune all the models on the NEU-DET. The MFN can and transferability so that there are some defect inspection
fuse the selected features into a multilevel feature, which has methods based on CNN. For example, Chen and Ho [21]
characteristics covering all the stages of the ResNet. Next, demonstrate that an object detector like Overfeat [24] can be
a region proposal network (RPN) is adopted in proposals transferred to be a defect detector by some means. Similar
generation based on the multilevel features and then the DDN to [18] and [19], they demonstrate that using a sequential
can output the class scores and the coordinates of bounding CNN to extract features can improve classification accuracy
box. Finally, we evaluate the proposed method on NEU-DET on defect inspection. Similarly, based on a sequential CNN,
and the results can demonstrate a clear superior to other ADI Ren et al. [17] perform an extra defect segmentation task on
methods. classification results to define the boundary of a defect. More-
To summarize, the main contributions of this paper are as over, Natarajan et al. [20] employ a deeper neural network
follows. VGG19 for defect classification. With the depth of CNN,
1) The introduction of the end-to-end defect detection the defect classification accuracy has been further improved.
pipeline DDN that integrates the ResNet and the RPN
for precise defect classification and localization. B. Baseline Networks
2) The proposed MFN for fusing multilevel features. Com-
There are three popular CNN architectures at present, which
pared with other fusing methods, MFN can combine
are used as baseline networks for pretraining. The early suc-
the lower level and higher level features, which makes
cessful networks are based on the sequential pipeline architec-
multilevel features to have more comprehensive charac-
ture [25], which establish the basic structure of CNN and prove
teristics.
the importance of depth of networks. Subsequently, the incep-
3) A defect detection data set NEU-DET for fine-tuning
tion networks employed modular units, which increase both
networks and a demonstration that the proposed DDN
the depth and width of a network without the increment of
has a very competitive performance on this data set.
computational cost [26]. The third type is ResNet using resid-
ual blocks to make networks deeper without overfitting [23].
II. R ELATED W ORK ResNet is widely applied in various vision tasks, achieving
A. Defect Inspection competitive results with a few parameters.
Generally, a defect classification method includes two parts: Choosing a proper baseline network is the key to gain
a feature extractor and a classifier. The classic feature extractor good results for DL methods. A large network has strong
is to obtain hand-craft features such as HOG and LBP, represent-ability for input data hence the extracted features
and they are always followed by a classifier, e.g., SVM. at high-abstract level, but there is a great demand for
Therefore, the combination of different feature extractors and training data.
classifiers produces a variety of defect classification meth-
ods. For instance, Song and Yan [3] improve the LBP to C. CNN Detectors
against noise and adopt NNC and SVM to classify defects.
The CNN detectors aim to classify and locate each target
Ghorai et al. [9] is based on a small set of wavelet features
with a bounding box. They are mainly divided into two meth-
and use SVM to perform defect classification. Different from
ods: one is the region-based method and another is the direct
above-mentioned two methods, Chu et al. [8] employ a general
regression method. The most famous region-based detectors
feature extractor and enhance SVM. From the perspective of
are the “R-CNN family” [27], [28], [14]. In this framework,
computer vision, the defect classification task is essentially
thousands of class-independent region proposals are employed
defect image classification, which is struggled in complicated
for detection. Region-based methods are superior in precision
defect images. To solve it, the simple and direct way is to
but require slightly more computation. The representative
perform defect localization before defect classification making
direct regression methods are YOLO [29] and SSD [30].
the inspection task classify on regions of defects instead of a
They directly divide an image into small grids and then for
whole defect image, which is the defect detection task. For
each grid predict bounding boxes, which then regressed to
example, the defect detectors in [11] and [12] first perform
the groundtruth boxes. The direct regression method is fast to
a 0–1 classification to judge features whether belong to a
detect but struggles in small instances.
defect class or a nondefect class, and then finds defect regions
based on the boundary of defect-class features, finally perform
different classification methods to determine the specific class III. D EFECT D ETECTION N ETWORK
of a defect. In addition, there is another simplified detector In this section, the DDN is described in detail (see Fig. 4).
for the requirement of quick detection, which only focuses on A single-scale image of an arbitrary size is processed by a
regions of defects but regardless of the defects are in different CNN, and the convolutional feature maps at each stage of
categories [10]. the ConvNet are produced (ConvNet represents the convo-
However, the DL-based methods differ radically from the lutional part of a CNN). We extract multiple feature maps
above methods. Hand-craft feature extractor locally analyses and then aggregate them in the same dimension by using
a single image and extract features. However, CNN is to a lightweight MFN. In this way, MFN features have the
construct the representation of all the input data through characteristics from several hierarchical levels of ConvNet.
a large amount of learning. CNN has fine generalization Next, RPN [14] is employed to generate region proposals
1496 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 69, NO. 4, APRIL 2020
Fig. 4. DDN. In a single pass, we extract features from each stage of the Baseline ConvNet, which then fused into a multilevel feature by MFN. RPN is
adopted to generate ROIs based on the multilevel feature. For each ROI, the corresponding multilevel feature is transformed into a fixed-length feature through
the ROI pooling and the GAP layers. Two fc layers process each fixed-length feature and feed into output layers producing two results: a one-of-(C + 1)
defect class prediction (cls) and a refined bounding box coordinate (loc).
[regions of interest (ROIs)] over the MFN feature. Finally, of sequential pipeline architecture of the same magni-
the MFN feature corresponding to each ROI is transformed tude (ResNet50 vs. VGG16, 0.85 M vs. 138 M para-
into a fixed-length feature through the ROI pooling [28] meters). It implies that ResNet has lower computational
and the global average pooling (GAP) layers. The feature cost and less probability of overfitting.
is fed into two fully connected (fc) layers. One is a one-of- 2) ResNet uses GAP to process the final convolutional
(C + 1) defect classification layer (“cls”) and the other is a feature map instead of the dual stacked fc layers, which
bounding-box regression layer (“loc”). can be in a manner of preserving more comprehensive
The rest of this section introduces the details of DDN and location information of defects in the image.
motivates why we need to design MFN into the network for 3) ResNet has a modularized ConvNet, which is easy to
the defect detection task. integrate.
In this paper, we select ResNet34 and ResNet50 as base-
A. Baseline ConvNet Architecture line networks. The detailed structures of both networks
are shown in Table I, and residual blocks are denoted as
As we know that pretraining on the ImageNet data set is
{R2, R3, R4, R5}.
important to achieve competitive performance, and then this
pretrained model can be fine-tuned on a relatively small defect
data set. In this paper, we select the recent successful baseline B. Produce Multilevel Features
network ResNet as the backbone. ResNet presents several Previous excellent approaches only utilize high-level fea-
attractive advantages as follows. tures to extract region proposals (like the faster R-CNN extract
1) ResNet can achieve the state-of-the-art precision with proposals upon the last convolutional feature maps). In order
extremely few parameters, in comparison with the CNN to obtain quality region proposals, single-level features should
HE et al.: END-TO-END STEEL SURFACE DEFECT DETECTION APPROACH 1497
TABLE I
A RCHITECTURE OF BASELINE N ETWORKS
be extended to multilevel features. Obviously, the simplest over a feature map inside each ROI to convert it into a small
way is to assemble feature maps from multiple layers [31]. feature vector (512-d for ResNet34 and 2048-d for ResNet50)
Therefore, now comes the question, which layers should be with a fixed size of W × H (in this paper, 7 × 7). At last,
combined? There are two essential conditions: nonadjacent, based on these small cubes, calculate the offset of each region
because adjacent layers have highly local correlation [32], and proposal with an adjacent groundtruth box and the probability
coverage, including features from low level to high level. For whether there exist defects.
a ResNet, the most intuitive way is to combine the last layers For a single image, RPN may extract thousands of region
in each residual block. proposals. To deal with the redundant information, the greedy
To fuse features at different levels, the proposed network nonmaximum suppression (NMS) is often applied for elimi-
MFN is appended on the pretrained model. MFN has four nating high-overlap region proposals. We set the intersection
branches, denoted as {B2, B3, B4, B5}, and each branch over union (IOU) threshold for NMS at 0.7, which can discard
is a small network. B2, B3, B4, and B5 are sequentially a majority of region proposals. After NMS, the top-K ranked
connected to the last layer of R2, R3, R4, and R5. When region proposals are selected from the rest. In the following,
an image flows through the baseline ConvNet, the Ri features we fine-tune DDN using top-300 region proposals owing to
are produced in order. The Ri feature means the feature map the extracted quality region proposals, but reduce this number
output from the last layer of the residual block Ri , i = to accelerate the detection speed without harming accuracy at
2, . . . , 5. Similarly, the Bi feature is the feature map produced test-time.
from the last layer of the MFN batch Bi , i = 2, . . . , 5. Then, IV. T RAINING
each of Ri features is led to the corresponding branch in MFN A. Multitask Loss Function
producing Bi features. Finally, multilevel features are obtained
The defect detection task can be divided into two subtasks,
via concatenating the B2, B3, B4, and B5 features, which come
hence DDN has two output layers. The cls layer outputs a
from different stages of a CNN.
discrete probability distribution, k = (k1 , . . . , kC ), for each
As a final note, MFN is efficient in computation and strong
ROI over C + 1 categories (C defect categories plus one
in generalization. MFN can reduce required parameters via
background category). As usual, k is computed by a softmax
modifying the number of filters of 1 × 1 conv. This operation
function. The cls loss L cls is a log loss over two classes (defect
may hurt accuracy but prevent overfitting in the case of
or not defect). L cls = − log(k, k ∗ ) where k ∗ is the groundtruth
insufficient training data.
class. The loc layer outputs bounding box regression offsets,
t = (tx , t y , tw , th ), for each of the C defect categories. As in
C. Extract Region Proposals [28], the loc loss L loc is a smooth L1 loss function. L loc =
The RPN is employed to extract region proposals by sliding SmoothL1(t − t ∗ ) where t ∗ is the groundtruth box associated
on the multilevel feature maps. RPN takes an image of with a positive sample. For bounding box regression, we adopt
arbitrary size as input and outputs anchor boxes (candidate the parameterizations of t and t ∗ given in [27]
boxes), each with a score representing whether it is a defect tx = (x − x a )/wa , t y = (y − ya )/ h a
or not. The originality of RPN is the “anchor” scheme that
tw = log(w/wa ), th = log(h/ h a )
makes anchor boxes in multiple scales and aspect ratios. Then,
anchor boxes are hierarchically mapped to the input image tx∗ = x ∗ − x a /wa , t y∗ = y ∗ − ya / h a
so that region proposals of multiple scales and aspect ratios tw∗ = log w∗ /wa , th∗ = log h ∗ / h a (1)
produced. As a result of the resolution size of MFN feature, the
where the subscripts x, y, w, and h denote each box’s center
RPN can be considered as sliding on the R4 feature. Follow
coordinates and its width and height. The variables x, x a , and
[14], we set three aspect ratios {1:1, 1:2, 2:1}. Considering
x ∗ separately represent the predicted box, anchor box, and
multiple sizes of defects, we set four scales {642 , 1282 , 2562,
groundtruth box (the same rules for y, w, and h).
5122 }. Therefore, RPN produces 12 anchor boxes at each
With these definitions, we minimize a multitask loss func-
sliding location.
tion, which is defined as
The region proposal extractor always ends with an ROI
pooling layer. This layer performs a max-pooling operation L(k, k ∗ , t, t ∗ ) = L cls (k, k ∗ ) + λp∗ L cls (t, t ∗ ) (2)
1498 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 69, NO. 4, APRIL 2020
obtaining model M R .
Fine-tune the detector network using proposals P, obtaining
model M D .
Combine M R and M D as the final model.
TABLE II
D ETECTION R ESULTS ON NEU-DET
TABLE III
C OMBINING L AYERS IN D IFFERENT M ANNERS
VI. D ISCUSSION
In this section, to demonstrate our design is logical and
advanced, we discuss several implicit factors that can influence
on defect detection.
Fig. 7. Examples of detection results on NEU-DET. For each defect,
the yellow box is the bounding box indicating its location and the green
label is the class score. The subset to which the image belongs (a) crazing, A. Combine Which Layers for MFN?
(b) inclusion, (c) patches, (d) pitted surface, (e) rolled-in scale, and
(f) scratches. MFN combines features from various levels into a mul-
tilevel feature, which is effective for improving detection.
layers making the decline of proposals in quality. With the In Section III-B, it is briefly discussed that what kind of
increasing number of proposals, the naive RPN drops more layers should be combined. In DDN, we select four layers
sharply when IOU > 0.7. This is because RPN extract too that are the last layers of R1, R2, R3, and R4. Therefore,
many low-quality proposals and it is more obvious with the whether other combination manners of these four layers may
increase of proposals. The naive RPN works badly with the result in better performance. Therefore, we train DDN +
strict IOU threshold (e.g., IOU > 0.7). MFN can help RPN ResNet34 in five different combination manners on NEU-DET
to obtain location information from low-level and mid-level data set. As shown in Table III, combining all the four layers
features, which makes RPN is under a higher tolerance for outperform the other manners. It indicates that the multilevel
strict IOU threshold. feature is effective for improving the accuracy of detection.
HE et al.: END-TO-END STEEL SURFACE DEFECT DETECTION APPROACH 1501
Fig. 8. Recall versus IOU threshold on the NEU-DET at different numbers of region proposals. (a) 50 region proposals. (b) 100 region proposals. (c) 300
region proposals.
Fig. 9. Recall versus number of proposals on the NEU-DET at different IOU thresholds. (a) IOU threshold is 0.5. (b) IOU threshold is 0.6. (c) IOU threshold
is 0.7.
Furthermore, low-level feature (e.g., R1 feature) should be multiple 5 × 5 convs to uniform the resolution and dimension-
paid more attention than high-level feature (e.g., R5 feature) ality simultaneously. However, the 5 × 5 conv is an expensive
for defect detection because R2 feature has richer location operation, which has the same effect as the double stacked
information than R5 feature. 3 × 3 conv but requiring additional parameters. Table IV
shows the comparable results among three patterns in detail.
B. Is the Simple Design More Effective for MFN? The front-mounted style uses three times fewer parameters
than the back-mounted, and five times fewer than hyperstyle.
The major role of MFN is to uniform the features from Therefore, MFN in the front-mounted style has less possibility
different levels in resolution and dimensionality. To keep the to be overfitting. Moreover, in case of the same resolution size,
dimension consistent, a straightforward approach is using 1×1 MFN features can preserve more complete information due to
conv to reduce/increase the dimensionality. There are two its larger dimensionality than Hyper feature’s (512 vs. 126).
placement patterns for 1 × 1 conv: front-mounted and back-
mounted. The front-mounted pattern means that 1 × 1 conv is
placed before concatenating multilevel feature. What we use C. Do We Need More Defect Data?
in this paper is the front-mounted pattern, that is, a 1 × 1 As we known, an object detector can improve performance
conv is placed at the end of each branch of MFN, and the with more training data [39]. Therefore, whether this rule is
back-mounted pattern means that a 1 × 1 conv is placed after also effective for industrial defect data? In order to make clear
concatenating multilevel feature. This pattern seems simple this problem, we train the DDN on not only the complete
but in fact needs more parameters. Similar to [34], we use NEU-DET data set but also each subset separately. As shown
1502 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 69, NO. 4, APRIL 2020
TABLE IV
U NIFORM D IMENSIONALITY IN D IFFERENT S TYLES
Fig. 11. AP of each defect class on separate training versus complete training.
Fig. 10. Detection time versus number of proposals on the NEU-DET. The
detection time refers to the GPU runtime per image. Sliding window, Edge
Boxes, and Selective Search are CPU-based methods that are far less than
GPU-based methods on detection speed.
the ability to handle the overlapped defects and the success [14] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-
case is shown in Fig. 7(f). We guess the reason is that the time object detection with region proposal networks,” in Proc. Neural
Inf. Process. Syst. (NIPS), Montreal, QC, Canada, Dec. 2015, pp. 91–99.
“inclusion” and the “patches” in the figure are similar, and [15] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are
they influence each other when they are very close. For the features in deep neural networks?” in Proc. Neural Inf. Process. Syst.
“rolled-in scale,” the bounding box may ignore some edge (NIPS), Montreal, QC, Canada, Dec. 2014, pp. 3320–3328.
[16] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
defects shown in Fig. 12(d) due to such defects that are too no. 7553, pp. 436–444, May 2015.
scattered to define their scope. A more ideal defect detector [17] R. Ren, T. Hung, and K. C. Tan, “A generic deep-learning-based
is yet wanted because there is still room for improvement. approach for automated surface inspection,” IEEE Trans. Cybern.,
vol. 48, no. 3, pp. 929–940, Mar. 2018.
[18] Y. Li, G. Li, and M. Jiang, “An end-to-end steel strip surface defects
VII. C ONCLUSION recognition system based on convolutional neural networks,” Steel Res.
Int., vol. 88, no. 2, Feb. 2017, Art. no. 1600068.
In this paper, the DDN, a defect inspection system for steel [19] S. Zhou, Y. Chen, and D. Zhang, “Classification of surface defects
plates is proposed. This system is a DL network that can on steel sheet using convolutional neural networks,” Mater. Technol.,
obtain the specific category and detailed location of a defect by vol. 51, no. 1, pp. 123–131, Feb. 2017.
fusing the multilevel features. For defect detection tasks, our [20] V. Natarajan, T.-Y. Hung, S. Vaikundam, and L.-T. Chia, “Convolu-
tional networks for voting-based anomaly classification in metal surface
system can provide detailed and valuable indicators for quality inspection,” in Proc. IEEE Int. Conf. Ind. Technol. (ICIT), Toronto, ON,
assessment system, such as the quantity, category, complexity, Canada, Mar. 2017, pp. 986–991.
and area of a defect. Furthermore, we set up a precious defect [21] P.-H. Chen and S.-S. Ho, “Is overfeat useful for image-based surface
defect classification tasks?” in Proc. IEEE Int. Conf. Image Process.
detection data set—NEU-DET. Experiments show that DDN (ICIP), Phoenix, AZ, USA, Sep. 2016, pp. 749–753.
can achieve 99.67% accuracy for defect classification task and [22] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet:
82.3 mAP for defect detection task. In addition, the system can A large-scale hierarchical image database,” in Proc. IEEE Comput. Vis.
Pattern Recognit. (CVPR), Anchorage, AK, Jun. 2009, pp. 248–255.
run at a detection speed of 20 ft/s while keeping the mAP at 70. [23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
In the feature, we will focus on two directions as follows: recognition,” in Proc. IEEE Comput. Vis. Pattern Recognit. (CVPR),
the one is data augmentation technology due to the expensive Boston, MA, USA, Jun. 2015, pp. 770–778.
[24] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and
manual annotations in detection data sets. The other is to Y. LeCun, “OverFeat: Integrated recognition, localization and detection
perform the defect segmentation task with DL technologies, using convolutional networks,” in Proc. Int. Conf. Learn. Represent.
which can obtain a more precise defect boundary. (ICLR), Banff, AB, Canada, Apr. 2014, pp. 1–16.
[25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica-
tion with deep convolutional neural networks,” in Proc. Neural Inf.
R EFERENCES Process. Syst. (NIPS), Las Vegas, NV, USA, Dec. 2012, vol. 60, no. 2,
pp. 1097–1105.
[1] D. Marr, Vision: A Computational Investigation Into the Human Repre-
[26] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
sentation and Processing of Visual Information. Cambridge, MA, USA:
the Inception architecture for computer vision,” in Proc. IEEE Com-
MIT, 2010, pp. 3–4.
put. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Jun. 2016,
[2] D. A. Forsyth, Computer Vision: A Modern Approach. Upper Saddle
pp. 2818–2826.
River, NJ, USA: Prentice-Hall, 2002, pp. 482–539.
[3] K. Song and Y. Yan, “A noise robust method based on completed local [27] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for
binary patterns for hot-rolled steel strip surface defects,” Appl. Surf. Sci., semantic segmentation,” in Proc. IEEE Comput. Vis. Pattern Recognit.
vol. 285, pp. 858–864, Nov. 2013. (CVPR), Columbus, OH, USA, Jun. 2015, pp. 3431–3440.
[4] P. Caleb-Solly and J. E. Smith, “Adaptive surface inspection via inter- [28] R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis.
active evolution,” Image Vis. Comput., vol. 25, no. 7, pp. 1058–1072, (ICCV), Santiago, Chile, Dec. 2015, pp. 1440–1448.
Jul. 2007. [29] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
[5] Y. Dong, D. Tao, X. Li, J. Ma, and J. Pu, “Texture classification and once: Unified, real-time object detection,” in Proc. IEEE Comput.
retrieval using shearlets and linear regression,” IEEE Trans. Cybern., Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Jun. 2016,
vol. 45, no. 3, pp. 358–369, Mar. 2015. pp. 779–788.
[6] M. Xiao, M. Jiang, G. Li, L. Xie, and L. Yi, “An evolutionary classifier [30] W. Liu et al., “SSD: Single shot multibox detector,” in Proc. Springer
for steel surface defects with small sample set,” EURASIP J. Image Vid. Euro. Conf. Comput. Vis. (ECCV), Amsterdam, Netherlands, Oct. 2016,
Process., vol. 2017, no. 48, pp. 1–13, Dec. 2017. pp. 21–37.
[7] Y. Park and I. S. Kweon, “Ambiguous surface defect image classification [31] L. Zhang, Y. Gao, C. Hong, Y. Feng, J. Zhu, and D. Cai, “Feature
of AMOLED displays in smartphones,” IEEE Trans. Ind. Inform., correlation hypergraph: Exploiting high-order potentials for multimodal
vol. 12, no. 2, pp. 597–607, Apr. 2016. recognition,” IEEE Trans. Cybern., vol. 44, no. 8, pp. 1408–1419,
[8] M. Chu, J. Zhao, X. Liu, and R. Gong, “Multi-class classification for Aug. 2014.
steel surface defects based on machine learning with quantile hyper- [32] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
spheres,” Chemom. Intell. Lab. Syst., vol. 168, pp. 15–27, Sep. 2017. learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11,
[9] S. Ghorai, A. Mukherjee, M. Gangadaran, and P. K. Dutta, “Automatic pp. 2278–2324, Nov. 1998.
defect detection on hot-rolled flat steel products,” IEEE Trans. Instrum. [33] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
Meas., vol. 62, no. 3, pp. 612–621, Mar. 2013. feedforward neural networks,” in Proc. 13th Int. Conf. Artif. Intell.
[10] Q. Luo and Y. He, “A cost-effective and automatic surface defect inspec- Statist., vol. 9, May 2010, pp. 249–256.
tion system for hot-rolled flat steel,” Robot. Comput.-Integr. Manuf., [34] T. Kong, A. Yao, Y. Chen, and F. Sun, “HyperNet: Towards accurate
vol. 38, pp. 16–30, Apr. 2016. region proposal generation and joint object detection,” in Proc. IEEE
[11] K. Liu, H. Wang, H. Chen, E. Qu, Y. Tian, and H. Sun, “Steel Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Jun. 2016,
surface defect detection using a new Haar–weibull-variance model in pp. 845–853.
unsupervised manner,” IEEE Trans. Instrum. Meas., vol. 66, no. 10, [35] C. L. Zitnick and P. Dollar, “Edge boxes: Locating object proposals from
pp. 2585–2596, Oct. 2017. edges,” in Proc. Springer Euro. Conf. Comput. Vis. (ECCV), Zurich,
[12] M. Chu, R. Gong, S. Gao, and J. Zhao, “Steel surface defects recognition Switzerland, Oct. 2014, pp. 391–405.
based on multi-type statistical features and enhanced twin support vector [36] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and
machine,” Chemom. Intell. Lab. Syst., vol. 171, pp. 140–150, Sep. 2017. A. W. M. Smeulders, “Selective search for object recognition,”
[13] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in Int. J. Comput. Vis., vol. 104, no. 2, pp. 154–171, Sep. 2013.
Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, [37] Y. Wei et al., “Cross-modal retrieval with CNN visual features: A new
pp. 2980–2988. baseline,” IEEE Trans. Cybern., vol. 47, no. 2, pp. 449–460, Feb. 2017.
1504 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 69, NO. 4, APRIL 2020
[38] J. Hosang, R. Benenson, P. Dollár, and B. Schiele, “What makes for Qinggang Meng received the B.S. and M.S. degrees
effective detection proposals?” IEEE Trans. Pattern Anal. Mach. Intell., from the School of Electronic Information Engi-
vol. 38, no. 4, pp. 814–830, Apr. 2016. neering, Tianjin University, Tianjin, China, and the
[39] X. Zhu, C. Vondrick, C. C. Fowlkes, and D. Ramanan, “Do we need Ph.D. degree in computer science from Aberystwyth
more training data?,” Int. J. Comput. Vis., vol. 119, no. 1, pp. 76–92, University, Aberystwyth, U.K.
Aug. 2016. He is currently a Professor with the Depart-
[40] K. Simonyan and A. Zisserman, “Very deep convolutional networks for ment of Computer Science, Loughborough Uni-
large-scale image recognition,” in Proc. Int. Conf. Learn. Represent. versity, Loughborough, U.K. His current research
(ICLR), San Diego, CA, USA, May 2015, pp. 1–16. interests include biologically and psychologically
inspired learning algorithms and developmental
robotics, service robotics, robot learning and adapta-
tion, multi-unmanned aerial vehicle cooperation, drivers distraction detection,
Yu He received the B.S. degree from the School of human motion analysis and activity recognition, activity pattern detection,
Mechanical Engineering and Automation, Liaoning pattern recognition, artificial intelligence, and computer vision.
Technical University, Fuxin, China, in 2014, and the Dr. Meng is a fellow of the Higher Education Academy, U.K.
M.S. degree from the School of Mechanical Engi-
neering and Automation, Northeastern University,
Shenyang, China, in 2016, where he is currently
pursuing the Ph.D. degree.
His current research interests include deep learn-
ing, pattern recognition, and intelligent inspection.
Kechen Song received the B.S., M.S., and Ph.D. Yunhui Yan received the B.S., M.S., and Ph.D.
degrees from the School of Mechanical Engineering degrees from the School of Mechanical Engineering
and Automation, Northeastern University, Shenyang, and Automation, Northeastern University, Shenyang,
China, in 2009, 2011, and 2014, respectively. China, in 1981, 1985, and 1997, respectively.
Since 2014, he has been a Teacher with North- Since 1982, he has been a Teacher with Northeast-
eastern University. His current research interests ern University, and became as a Professor in 1997.
include vision-based inspection system for steel sur- From 1993 to 1994, he stayed as a visiting scholar
face defects, surface topography, image processing, at the Tohoku National Industrial Research Institute,
and pattern recognition. Sendai, Japan. His current research interests include
intelligent inspection, image processing, and pattern
recognition.