0% found this document useful (0 votes)
9 views

Contact Wire Support Defect Detection Using Deep B

This document summarizes a research article that proposes a novel system for detecting defects in contact wire supports (CWS) for high-speed railways. The system uses deep learning methods including Faster R-CNN to localize CWS components in images, a Bayesian fully convolutional network (CCSN) to segment the CWS components, and defined criteria to determine defect status based on component geometry. Experiments on a Chinese high-speed railway line showed this approach can detect CWS defects. The system aims to improve railway operation safety through automated catenary inspection using computer vision and deep learning.

Uploaded by

Dishant Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Contact Wire Support Defect Detection Using Deep B

This document summarizes a research article that proposes a novel system for detecting defects in contact wire supports (CWS) for high-speed railways. The system uses deep learning methods including Faster R-CNN to localize CWS components in images, a Bayesian fully convolutional network (CCSN) to segment the CWS components, and defined criteria to determine defect status based on component geometry. Experiments on a Chinese high-speed railway line showed this approach can detect CWS defects. The system aims to improve railway operation safety through automated catenary inspection using computer vision and deep learning.

Uploaded by

Dishant Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2955753, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number

Contact Wire Support Defect Detection Using


Deep Bayesian Segmentation Neural Networks
and Prior Geometric Knowledge
1,2,3 1,2 1,2 1,2
GAOQIANG KANG , SHIBIN GAO , LONG YU , DONGKAI ZHANG , XIAOGUANG
1,2 1,2
WEI , DONG ZHAN
1
School of Electrical Engineering, Southwest Jiaotong University, Chengdu, 610031 China
2
National Rail Transit Electrification and Automation Engineering Technology Research Center, Southwest Jiaotong University, Chengdu,
610031 China
3
School of Electrical and Automation Engineering, East China Jiaotong University, Nanchang, 330013 China
Corresponding author: LONG YU (e-mail: [email protected]).
This work was partially funded by grants from the Key Projects of National Natural Science Foundation of China (No. U1734202), the National Key
Research and Development Plan of China (No. 2017YFB1200802-12 and No. VP99QQT1104Y18012), and the China Railway Corporation Science and
Technology Research and Development Plan (No. N2018G023).

ABSTRACT The contact wire support (CWS) is an important catenary component that maintains the
contact wire height and stagger. The direct impact of the pantographs makes the CWS a vulnerable part of
the catenary. Recently, automatic catenary inspection using computer vision and pattern recognition has
been introduced to improve railway operation safety. However, the automated detection of CWS defects
remains to be further studied. This paper proposes a novel CWS defect detection system that consists of
three stages. First, the Faster R-CNN network is adopted to localize the key catenary components, and the
image areas that contain CWS components are obtained. Then, the CWS components are segmented using a
Bayesian fully convolutional catenary components segmentation network (CCSN) that fuses different level
features of the backbone network. The CCSN is not only able to perform accurate CWS components
segmentation, but also capable of evaluating the model uncertainty by Monte Carlo dropout. Finally, the
defect status is determined using the proposed criteria, which are defined according to the geometries of the
components. Experiments on the Hefei-Fuzhou high-speed railway line indicate that this approach can be
applied to the CWS defect detection.

INDEX TERMS High-speed railway, catenary, defect detection, feature fusion, image segmentation

I. INTRODUCTION from the mechanical and electrical impact of the


With the rapid development of high-speed railway, the pantograph, which makes it a weak part in the traction
requirement for the reliability of traction power supply power supply system [1]. The CWS mainly comprises the
system is constantly increasing. registration arm and steady arm, as shown in Fig 1. The
contact wire clamp (CWC) and split pin (SP) are the most
Contact wire important components of the steady arm and registration
Registration
arm
arm respectively. The contact wire height and stagger
directly affect the interaction between the contact wire and
the pantograph, which determines the reliability and quality
Steady arm
Contact wire
of energy transmission to the train [2]. However, in the
clamp Split pin
process of railway operation, the pantograph runs on the
contact wire and directly exerts force on the CWS, which
FIGURE 1. Structure of the catenary support device. makes the CWS, especially the CWC and SP, the most
vulnerable part of the catenary system. Thus, to improve the
As a device maintaining the contact wire height and
reliability of the catenary and ensure safety of the traction
stagger, the contact wire support (CWS) inevitably suffers

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2955753, IEEE Access

power supply, it is essential to monitor the condition of the increases. In [17, 18], the segmentation network that used
high-speed railway CWS [3]. However, the automated multiple level features achieved better segmentation
detection of CWS defects remains to be further studied. accuracy than that only used one level feature. To improve
At present, automatic inspection systems based on the segmentation accuracy, [19] proposed an encoder-
computer vision technology have been widely applied in decoder network U-Net that used skip connections to allow
railway [4] and other fields [5-7]. As a distributed system, the decoder to fuse the high resolution features from the
the catenary is typically inspected using visual inspection encoder. In [20], it has been shown that combining in-
devices installed on inspection vehicle. To detect the network pyramid features of different resolutions can
defects of catenary components, the target component improve the performances of FCNs in image segmentation.
should first be detected from the complex background. The In addition to accuracy, uncertainty is another issue that
Faster R-CNN is used in this paper to detect the CWC and must be considered in the CWS components segmentation,
SP [8]. since a model can be uncertain about its predictions even
In the past decades, researchers have devoted into with a high softmax output [21]. Knowing the confidence
developing defect detection methods for railway with which we can trust the segmentation output is
infrastructure equipment. In [9], the sparse histograms of important for defect detection. Neural networks which
oriented gradients feature was used to train SVM model to model uncertainty are known as Bayesian neural networks
detect bogie block key defects. Deep convolutional [22, 23]. They are often computationally very expensive, as
classifiers were trained in [10, 11] to detect rail defects, and performing inference in the Bayesian neural networks is a
the same strategy was adopted in [12] to detect catenary very difficult task. Fortunately, Gal and Ghahramani [24]
fastener defects. The classifiers are not only unable to have cast dropout as approximate Bayesian inference over
quantitatively measure the severity of defects, but also the network’s weights. This means that we can obtain the
require defective samples for training. However, the uncertainties of segmentation models by performing
number of defective CWS components in practice is very dropout during the test phase [25, 26]. However, this
limited, which is not enough to train a robust classifier. method needs to run the model several times during the test
Fortunately, all CWS components have certain shapes, and phase to obtain the model’s predictive variance, which is
we propose to detect their defects using their geometrical time consuming.
characteristics. To analysis their geometrical characteristics, Inspired by these observations, we construct a catenary
image segmentation is an important issue. components segmentation network (CCSN), as an extension
Accurate and reliable segmentation of the CWC of Mask R-CNN, which is composed of a backbone
components is critical for defect detection. In the ideal case, network, a branch for classification and bounding box
if a component has good image quality and high contrast, regression, and two branches for segmentation masks
then traditional image segmentation methods could segment prediction. In comparison to Mask R-CNN that has only
it with high accuracy [13]. However, in practice, the target one mask branch, our architecture has several differences.
components are similar in color and material to the Firstly, the backbone network is not the FPN but a lighter
background components, resulting in the low contrast network ResNet50 [27]. Secondly the two mask branches
between them. In addition, the imaging process constantly are Bayesian neural networks that involve Monte Carlo
suffers from the motion of the inspection vehicle, which dropout layers to evaluate model uncertainty. Finally, the
inevitably impacts the imaging quality of the components. two branches fuse high-resolution features from different
These unfavorable conditions commonly lead to the lower layers of the backbone, and their segmentation results
unsatisfactory performances of the traditional image are integrated as the final results. In addition, considering
segmentation methods that are based on handcrafted that all CWS components have certain shapes, defect
features [14]. criteria are defined using their geometric characteristics that
In the recent past, tremendous progress has been made in can be obtained according to segmentation results.
image segmentation as a result of the development of deep In this paper, a whole solution of the CWS defect
convolutional neural networks. V. Badrinarayanan et al. [15] detection is provided. The first contribution is that a
adapted classification networks into fully convolutional Bayesian CWS components segmentation network CCSN
networks for image segmentation task, which achieved that can provide information about model uncertainty. The
good performance on large and well known datasets. second contribution is the proposal of a deep architecture
However, the application of FCN to image segmentation that fuses multi-resolution features to improve
still faces the problem that the inconsistence between the segmentation accuracy, which can achieve accurate CWS
segmentation results and the object boundaries. To components segmentation. The third contribution is the
overcome this problem, feature fusion is a widely used definition of the criteria for CWS components defects
method [16]. In a deep CNN, the feature hierarchy is detection.
extracted layer by layer, as the feature level increases, its This paper is organized as follows. Section II overviews
spatial resolution decreases and its representational capacity the catenary inspection system. The CCSN is theoretically

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2955753, IEEE Access

described in Section III. The defect detection criteria are diagram of the defect detection method.
defined in Section IV. The experimental results and
analysis are summarized in Section V. Section VI presents A. KEY COMPONENTS LOCALIZATION
the conclusion and future work. The purpose of key components localization is to localize
II. SYSTEM OVERVIEW and extract key components including CWS components
The catenary inspection system mainly consists of two from the catenary images. To achieve defect detection, it is
parts: the roof equipment and in-vehicle equipment. The essential to localize the components from the complex
roof equipment contains two groups of camera and backgrounds. The Faster R-CNN is used in this paper to
auxiliary light devices that are installed on the roof of the detect the CWC and SP [8].
inspection vehicle, as shown in Fig. 2. The in-vehicle
equipment is composed of the database and the industrial B. CWS COMPONENTS SEGMENTATION
computers. Once the CWS components are localized, the impact of
Contact wire
support the background will be greatly reduced. However, the
Contact
wire support components are not only very small, but also similar to
other components in terms of the gray-scale values. To
overcome these problems, a Bayesian fully convolutional
network CCSN that explicitly fused the multi-resolution
features is proposed to segment the catenary components.
The feature fusion method we propose for the CCSN is a
generic framework that can be applied to other
segmentation networks. Furthermore, the CCSN involves
FIGURE 2. Sketch map of the catenary inspection device. Monte Carlo dropout to evaluate model uncertainty, which
There are 9 cameras with a resolution of 4920×3280 is essential for defect detection.
pixels in each camera group. Each camera is responsible for
C. CWS COMPONENTS DEFECT DETECTION
a portion of the catenary imaging, and different cameras
The components defect detection faces the problem that
inspect different catenary components. During inspection,
the number of defective samples is not sufficient to train a
the captured images, mast number and other information
robust classifier. To overcome this problem, we determine
are stored in the vehicle database for defect detection. The
the defect status of the components using the defect criteria
proposed CWS defect detection method contains three main
defined by the components’ geometries that can be obtained
stages: key components localization, components
from the segmentation masks.
segmentation and defect detection. Fig. 3 shows the
Stage 1: components localization Stage 2: segmentation and uncertainty Stage 3: defect detection
evaluation

defective
d m  Ts or not
dm
crop CCSN
Faster Geometric
R-CNN analysis

a1 c1
a1c1  a2 c2  Tc defective
a2 c2 or not
inspection image

inspection image detected components

FIGURE 3. Diagram of the proposed three-stage defect detection approach that consists of key components localization,
components segmentation and defect detection.

III. CWS COMPONENTS SEGMENTATION as input and directly generates region proposals that may
Once the CWS components are localized, the impact of contain the objects to be segmented. In the second stage,
the background will be greatly reduced. However, due to three parallel branches are applied to each proposal for
the uneven illumination conditions and the gray-scale classification, bounding box regression, and segmentation
values similarity between CWS components and other mask prediction. In addition, the RPN and the parallel
catenary equipment, accurate segmentation of the CWS branches share the backbone convolutional network
components is still a very difficult task. In this study, the ResNet50.
CCSN is proposed to segment the CWS components, which As shown in Fig. 4, feature fusion is conducted to
is built on Mask R-CNN and Bayesian neural networks. improve segmentation performance. For each proposal, we
The CCSN is a two-stage segmentation pipeline. In the first map it to four different feature levels, as denoted by blue
stage, a regional proposal network (RPN) takes the image and red regions in Fig. 4(a), which are sent to RoIAlign

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2955753, IEEE Access

layer to extract different size feature maps (high level are concatenated together for segmentation mask prediction.
features are pooled into 14×14 resolution and low level Furthermore, with different low-level feature path and
features are pooled into 28×28 resolution) and concatenated dropout rate, the two segmentation branches have different
together for mask prediction. The segmentation mask architectures. To further improve the segmentation
prediction branches are Bayesian neural networks that accuracy, an ensemble approach is adopted to combine the
involve Monte Carlo dropout layers to evaluate model predictions of both branches. In the proposed framework,
uncertainty, as shown in Fig. 4(b) and Fig. 4(c). The multi-level features are fused in the CCSN to improve its
segmentation mask prediction branches fuse multi- segmentation accuracy.
resolution features from different layers of the backbone
network. The mean and variance of their prediction masks B. UNCERTAINTY EVALUATION
are used as the final prediction result and uncertainty map, Given training dataset X  {x1 ,..., xN } and their
respectively. Combining multi-resolution feature fusion and corresponding labels Y  { y1 ,..., yN } , in Bayesian neural
Bayesian neural networks, the CCSN can not only evaluate networks, we look for the posterior distribution over the
the uncertainty of the model, but also exhibit high space of parameters:
segmentation accuracy under uneven illumination P(Y  X ,  ) p( )
p(  X , Y )  (1)
conditions. P(Y  X )
(b) 14×14×256 Uncertainty
(a)
C4
ian
ce
According to it we can predict an output y∗ for a new input
r
28×28×256
C3
Va
x∗ by the following equation:
p( y*  x* , X , Y )   p( y*  x* , ) p(  X , Y )d
Segmentation
C2
(c) 14×14×256
M
ea
mask (2)
n
28×28×256 However, the model evidence in the posterior distribution:
C1
P(Y  X )   p(Y  X , ) p()d
class
(d) (3)
bbox
RoI Align Dropout Softmax cannot be done analytically for Bayesian neural networks.
Convolution deConvolution Concatenate Therefore, an approximation needs to be made using
Figure 4. The architecture of the CCSN that fuses multi- variational inference, which defines an approximating
resolution features. variational distribution q ( ) , parameterized by θ, and
minimizes the KL divergence between the approximating
A. FEATURE FUSION
distribution and the true posterior:
The goal of the feature fusion is to improve the accuracy
of components segmentation by fusing multi-level features KL(q ()  p(  X , Y )) (4)
from the backbone network. which can be solved using Monte Carlo estimation.
Mask R-CNN assigns proposals to one feature level of For deep neural networks, Gal and Ghahramani [24] have
the backbone FPN according to the size of proposals, which proved that the Monte Carlo estimation process is
has increased its ability to segment multi-scale targets. equivalent to performing Monte Carlo dropout training.
However, once the CWS components are localized, the Therefore, a deep convolutional neural network can be cast
change of their scales is limited. Therefore, we use a lighter into Bayesian neural network using dropout layers, without
backbone network ResNet50 instead of the FPN that has an requiring any additional model changes.
expensive computation. During the test phase, replacing the posterior
In the backbone network, high-level features have large p(  X , Y ) with the optimum approximate posterior q* ( ) ,
receptive fields and capture richer context information. In we can then approximate the integral in equation (2) with
contrast, low-level features are with fine details information MC integration:
and have higher localization accuracy. It has been proved
p( y*  x* , X , Y )   p( y*  x* ,  )q* ( )d 
that the fusion of different level features is an effective way
to improve the accuracy of segmentation [17-19]. With 1 T (5)
these thoughts, we propose to pool features from different 
T t 1
p( y*  x* , ˆ t )

levels for each proposal and fuse them for segmentation with ˆt ~ q () , which is obtained from Monte Carlo
*

mask prediction. As shown in Fig. 4(a), each proposal is dropout training. This can be considered as sampling the
mapped to four different feature levels, i.e., {C1, C2, C3, network with randomly dropped out units to get the
C4}. Low-level features are extracted from C1 and C2, posterior distribution of the predicted label probabilities.
which allows higher resolution features to be fused into the This means that, after performing T times forward passes
network. Low-level and high-level features are resized by through the trained model with dropout layers, the sample
the RoIAlign layer into feature maps of size 28×28×256 mean and variance can be used as the prediction and model
and 14×14×256, respectively. After the deconvolution uncertainty, respectively. But it has the short coming that
operation, the high-level features become the same size as the testing time is scaled by T, since it needs to perform T
the low-level features. Then, features from different levels

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2955753, IEEE Access

times forward passes. there is one mask for each object class in a mask branch.
In our framework, we perform forward pass in two We apply sigmoid operation to every pixel of the mask
parallel Bayesian segmentation branches at the same time, branch output and define the mask loss Lmask1 and Lmask2 as
as shown in Fig. 4(b) and (c). The two branches have the average binary cross-entropy loss. The multi-task loss for
same prediction expectation, and both of their variances can an object proposal can be defined as follow:
be used to evaluate the model uncertainty. This means that LCCSN  Lmcls  Lmreg  Lmask1  Lmask 2 (11)
we can accelerate the sample process by a factor of two. We adopt a stepping way to train the CCSN. The training
To quantitatively measure the uncertainty of the
procedure is composed of phases: the first phase consists of
segmentation results, we define the uncertainty value,
loading weights from the pre-trained backbone model and
N
Us  u (6) updating the network heads with learning rate lr1; the
Nm second phase consists of updating the whole network with
where Nu is the number of the pixels whose prediction learning rate lr2; the final phase consists of updating the
variances greater than predefined threshold Tu, and Nm is whole network with learning rate lr3. The details of the
the number of the pixels of the segmentation mask. training algorithm are shown in Algorithm 1.
Algorithm 1 Pseudocode for training the CCSN.
1: Input:
C. TRAINING CCSN  X: training set for the CCSN, including the segmentation
The CCSN is a two stage segmentation network that labels.
contains a backbone network and three network heads. In  Ka, Kb, Kc : the number of iterations of the three phases
the region proposal stage, the RPN generates region respectively.
2: load weights from the pre-trained ResNet50.
proposals for the target objects. The RPN has two outputs 4: for k = 1 to Ka do
for each proposal, a class probability and a bounding-box  Sample mini-batch of m examples from the training set X.
offset. In the segmentation stage, the CCSN has three  Update the network heads by minimizing the loss LRPN +LCCSN
with learning rate lr1.
branches that work in parallel, which outputs a class label, a end for
bounding-box offset and two segmentation masks for each 5: for k = 1 to Kb do
proposal. Let there are N kinds of object to be segmented,  Sample mini-batch of m examples from the training set X .
 Update the whole network by minimizing the loss LRPN +LCCSN
the dimensions of these outputs are N×1, N×4, N×m2, and with learning rate lr2.
N×m2, where m is the dimension of the object mask. end for
During the training of the RPNs, two kinds of loss 6: for k = 1 to Kc do
 Sample mini-batch of m examples from the training set X.
functions are involved, the classification loss and the  Update the whole network by minimizing the loss LRPN +LCCSN
bounding-box regression loss. The classification loss with learning rate lr3.
function is defined as follow: end for
Lrcls   ((1  pi ) log(1  pˆ i )  pi log pˆ i ) After training, the RPN and the other branches share the
(7)
i backbone network and form a unified deep neural network
where pi and pˆ i are respectively the ground-truth and CCSN to segment the CWS components.
predicted probabilities of the region proposal being an IV. DEFECT DETECTION
object or a context. The bounding-box regression function The CWC and the SP are the most vulnerable parts of the
is defined as follow: CWS. Our goal is to detect their defects. During the
Lrreg   smooth L1 (ti  tˆi ) dynamic detection process, the change of shooting distance
i
(8) and angle is inevitable, which is bound to cause the change

0.5(ti  tˆi ) , if | ti  tˆi | 1
2 of object pose and size. In addition, the evolution of the
smooth L1 (ti  tˆi )   CWCs defect state is continuous and the defective sample is
 | ti  tˆi | 0.5, otherwise
 (9)
rare, which make the classification methods unreliable. In
where ti and tˆi are respectively the ground-truth and this study, the defect criteria are defined using their
predicted coordinates vector of the region proposal that is geometric characteristics that can be obtained according to
not background. The multi-task loss for a region proposal the segmentation masks.
can be defined as follow:
LRPN  Lrcls  Lrreg (10) A. CWC DEFECT DETECTION
During the training of the other three branches, three The CWC is vulnerable to loose and missing faults.
kinds of loss functions are involved, the classification loss When there is missing defects in the CWC, the number of
Lmcls, the bounding-box regression loss Lmreg, the object nuts will be less than four, and this kind of defects can be
mask loss Lmask1 and Lmask2. The Lmcls and the Lmreg are easily detected based on the segmentation masks. In
similar to that defined in equation (7) and (8). Each mask contrast, it is difficult to detect loose defects of the CWC.
branch output N binary mask of resolution m×m, when During the process of dynamic detection, the pose and size
there are N kinds of object to be segmented. In other words, of CWC in the image will vary with the shooting angle and
distance. However, in the inspection image, the lengths of

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2955753, IEEE Access

the two bolt of one CWC are approximately the same, if the
CWC is normal.
dm
The bottom surface of the CWC nuts are almost in the
same plane and the CWC only occupies a small portion of
the inspection image, as shown in Fig 3 stage 1. A1C1 and
A2C2 are the physical distances between the two pairs of (a) (b) (c)
nuts, a1c1 and a2c2 are the corresponding pixel distances. Figure 6. The schematic of SP defect detection, dm is the
For a normal CWC, A1C1 is equal to A2C2. According to the maximum distance between the boundary of SP’s
segmentation mask and its convex hull (green curves), (b)
perspective projection model of the pinhole camera, the enlarged figure, (b) a representative example of defective SP.
mapping relationship between the CWC bottom surface In this paper, the defect state of the CWC is determined
plane and the image plane can be described by the according to the following criterion,
following figure, dm  Ts (14)
A1,C1
where Ts is a predefined threshold, dm is the maximum
distance between the boundary of SP’s segmentation mask
A2,C2
and its convex hull.
V. EXPERIMENTAL RESULTS AND ANALYSIS
CWC Bottom
surface plane We quantify the performance of the proposed CWS
a1,c1
defect detection approach on a catenary image dataset of
a2,c2
the Hefei-Fuzhou high-speed railway line. The dataset
Image plane
a1 c1 contains catenary images with a resolution of 4920×3280
a2 c2 pixels. The dataset was collected by the XLN4C-02
O catenary inspection vehicle, as is shown in Fig. 7. The
Figure 5. The mapping relationship between the CWC bottom experimental environment is as follows: Ubuntu 16.04,
surface plane and the image plane. Python 3.6, Deep learning framework Keras, Intel Core E5-
2630v4, and GTX 1080 Ti GPU with 11GB memory.
There is a rigid transformation between the bottom
surface plane and the image plane, which will vary during
the process of the dynamic detection. However, as the
CWC only occupies a small portion of the inspection image,
A1C1 almost experiences the same transformation process as
A2C2. Therefore, if the CWC is normal, then the length
difference
a1c1  a2 c2  0 (12)
For a defective CWC, the length difference will increase Figure 7. XLN4C-02 catenary inspection vehicle.

with the increase of the looseness degree. Therefore, the


length difference can be used to detect the defects of CWC. A. CWS COMPONENTS SEGMENTATION
If its length difference In this section, we evaluate the performance of the
proposed image segmentation approach CCSN. In the
a1c1  a2 c2  Tc (13)
experiment, 1,000 swivel clevis and 1,000 CWC images
a CWC can be seen as defective, where Tc is a predefined were selected from the localization results of the Faster R-
threshold. CNN. Among them, 1,400 samples are used for training,
However, our criteria cannot detect defects when the and the other 600 samples are used for testing. We trained
degree of looseness of the two bolts are the almost the same, the CCSN using the Algorithm 1 with a batch size of 2. The
but such defects are very rare in practice. learning rate was initially set to 0.001 and decays by a
factor of 0.5 every 50 epochs. The performance of the
B. SP DEFECT DETECTION proposed method and three other state-of-the-art methods
As an important part of the hinge that connects the were evaluated on the CWS image dataset.
registration arm and the cantilever tube, the SP is prone to Deeplab: It uses Atrous convolution to control the
split pin missing defect. When the SP is normal, its feature resolution and uses Atrous spatial pyramid pooling
boundary will be a concave curve, as shown in Fig. 6(b). In to fuse features of different resolution [28].
contrast, when the split pin is missing, the boundary of the UNet: It performs skip connection between multiple
SP will be a convex curve, as shown in Fig. 6(c). layers of the encoder and decoder to fuse high resolution
from the low-level layer the encoder [18].

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2955753, IEEE Access

Mask R-CNN: It uses the RPN to generate RoIs and components. Using skip connection to fuse high resolution
extracts features from different layers of the FPN according from the low-level layers, UNet performs better than
to the size of RoIs [20]. Deeplab. Instead of generating a large mask for all target
CCSN-br1: The first CCSN segmentation branch with components of the same type, Mask R-CNN generates a
dropout rate 0.7, which uses the features of C2 and C4 from small mask for each RoI, which allows Mask R-CNN to
the backbone network focus on the RoI features and achieve obvious performance
CCSN-br2: The second CCSN segmentation branch improvements. CCSN adopts the same two-stage strategy.
with dropout rate 0.7, which uses features from C1 and C3 Without using the complex backbone network FPN, both
of the backbone network. CCSN branches achieved comparable segmentation
CCSN: the proposed architecture that acts as an performance as Mask R-CNN, which is due to that the scale
ensemble of two segmentation branches. of the components does not change much. Taking the
average of the two branches’ segmentation results as its
results, CCSN achieve the best results compared to other
Original methods.
images
To further evaluate the performance of the CCSN, we
calculate mean intersection over union (mIoU) of different
methods on the testing dataset. In Table I, we report the
numerical results of the experiment.
Deeplab TABLE I
SEGMENTATION ACCURACY COMPARISON OF DIFFERENT
SEGMENTATION APPROACHES
Segmentation mIoU (%)
methods CWC SP
UNet Deeplab 75.4 73.5
U-Net 79.6 78.8
Mask R-CNN 88.4 87.8
CCSN-br1 87.3 86.5
CCSN-br2 87.5 86.7
CCSN 90.1 89.6
MaskR-CNN
The SCSN achieves 90.1%, 89.6% mIoU in the CWC
and SP segmentation, which are 1.7%, 2.2% improvement
over that of the Mask R-CNN, respectively. The
experimental results indicate that the CCSN can benefit
CCSN-br1 from the feature fusing that explicitly combines different
level features in the backbone network for the CWS
components segmentation. Even in the complex
environment, the CCSN has achieved a good performance.
CCSN-br2

B. UNCERTAINTY EVALUATION
In the experiment, the testing set is composed of 585
clear images and 15 blur images. In practice, the blur image
CCSN
is rare, which is mainly caused by the lens defocus. Image
blur will make segmentation very difficult and affect defect
detection.
True positive False positive False negative

Figure 8. Comparison of the segmentation masks extracted by


different methods on representative CWS components. For
display purposes, the segmentation masks are cropped and
resized into the same size.

Fig. 8 shows the extracted masks by different methods


for several representative CWS components. It can be
observed that accurate CWS components segmentation is a
hard task due to the uneven illumination, the randomness of
shooting angle, and the similarity between the target Figure 9. The segmentation accuracy of the CCSN’s branches
components and the background components. Deeplab with different dropout rates.
typically cannot catch the accurate boundary of the CWS
As a Bayesian neural network, the CCSN allows us to

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2955753, IEEE Access

evaluate the model uncertainty on predictions through However, the CCSN model is more uncertain about blur
Monte Carlo dropout. Dropout rate has important impacts components than normal components. In the uncertainty
on the performance of the CCSN. We therefore explored a maps of the blur components, the number of the pixels with
number of variants that have different dropout rates and high uncertainty is significantly higher than that of the clear
evaluate their performances on segmentation accuracy and components.
uncertainty evaluation. To further evaluate the effects of dropout rate on the
As illustrated in Fig. 9, the segmentation accuracy of the performance of the CCSN, we calculate the uncertainty on
CCSN’s branches decreases as the dropout rate increases. the testing dataset according to equation (6) with the
When the dropout rate is less than 0.7, the segmentation threshold Tu=0.3, as shown in Fig. 11.
accuracy decreases slowly with the increase of dropout rate,
but when the dropout rate exceeds 0.7, the accuracy of the
model drops sharply. This is because excessive dropout
rates will lead to a too strong regularization, resulting in
CCSN not being able to accurately segment the CWS
components.
Uncertainty maps were generated by CCSN with
different dropout rates to study the effect of dropout rate on Figure 11. the uncertainty values output by the CCSN with
the uncertainty evaluation. As illustrated in section III-B, different dropout rates.
the uncertainty is defined as the variance of the prediction
masks. In this experiment, we performed 3 times forward It can be observed that the uncertainty value increases as
passes through the trained CCSN model to obtain 6 masks the drop rate increases, both for blur images and for clear
whose variance was used as the model uncertainty. images. Under a certain dropout rate, we calculate the mean
value Mu and standard deviation σu of the uncertainty for
the clear images. An image will be classified as a blurred
image, if its uncertainty value exceeds Mu+3σu. Then, we
calculate the F1-score under different dropout rates,
Original
2tp
images
F1 -score  (15)
2tp  fn  fp
Mask with where tp is the number of correctly detected blur images, fp
dropout
rate=0.5 is the number of clear images misclassified as blur images,
and fn is the number of blur images misclassified as clear
images.
Uncertainty
with dropout TABLE II
rate=0.5 EFFECTS OF THE DROPOUT RATE ON UNCERTAINTY
EVALUATION
Dropout rates Threshold tp fp fn F1-score
Mask with
dropout 0.5 0.12 12 0 3 88.9%
rate=0.7
0.7 0.18 15 0 0 100%
0.8 0.36 15 11 0 73.2%
Uncertainty
with dropout
It can be observed that dropout rate has significant
rate=0.7 impact on the uncertainty evaluation performance of CCSN.
The CCSN with dropout rate of 0.7 achieved the best
(a) Blur images (b) Clear images performance. It is due to differences in the regularization
Figure 10. Representative uncertainty maps and segmentation
intensity of different dropout rates. When the dropout rate
masks generated by the CCSN with different dropout rates, (a)
is too small, the CCSN tends to output low uncertainty
blur images, (b) clear images. For display purposes, the
values for all images. And when dropout rate is too large,
segmentation masks and uncertainty maps are cropped and
the CCSN becomes unstable even for clear images. Both of
resized into the same size.
these conditions are not good for uncertainty evaluation.
The uncertainty maps and segmentation masks of the In this experiment, the optimum dropout rate 0.7 was
representative CWS components are shown in Fig. 10. Four determined using the dataset from Hefei-Fuzhou railway
blur components are shown in Fig. 10(a) and four clear line. When the CCSN is applied to a new railway line, its
components are shown in Fig. 10(b). It can be observed that data distribution will be different from that of Hefei-Fuzhou
the CCSN with 0.5 dropout rate and that with 0.7 dropout railway line, and the optimum dropout rate needs to be
rate output similar segmentation masks. They produce determined again based on trails.
similar looking model uncertainty outputs that have high
uncertainty values near the border of segmentation masks.

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2955753, IEEE Access

C. CWS COMPONENTS DEFECT DETECTION inconsistency between the calculation results and their
In the experiment, we first checked segmentation masks labels increases.
of the blur images generated by the three compared Since the defective sample is rare in practice, we use
methods. The segmentation masks are similar to those normal samples to determine the threshold of defect
generated by CCSN, as shown in Fig. 10, which are very detection. For each detection method, we calculate the
inaccurate and unreliable for defect detection. However, mean value Mi and standard deviation σi of length
the three compared methods cannot evaluate the uncertainty difference for the clear images. A CWC is classified as
of the segmentation predictions. Therefore, we defective, if its length difference exceeds Mi+3σi. Then, we
distinguished the blur images from the test set according to calculate the F1-score defined in equation (15) as the metric
the uncertainty value that output by CCSN, and used the to evaluate the performances of different methods.
clear images for defect detection. The testing set was TABLE III
composed of 292 CWC images and 293 SP images, COMPARISON OF DIFFERENT CWC DEFECT DETECTION
including 26 nuts missing CWCs, 21 nuts loosen CWCs, 28 APPROACHES
split pin missing SPs. Detection approaches Threshold tp fp fn F1-score
Deeplab 10.98 11 3 10 62.9%
Unet 9.16 15 3 6 76.9%
Mask RCNN 6.65 19 2 2 90.5%
CCSN 5.09 21 2 0 95.5%
Original
images It is observed that the proposed approach only falsely
reported 2 defects and achieve a 95.5% F1-score. In
addition, all these methods can successfully detect the nut
missing defects.
We also employed the proposed approach on the testing
Mask set to detect the defects of the SPs. Fig. 14 shows several
representative examples of the segmentation results of the
defective SPs.
Figure 12. The segmentation accuracy of the CCSN’s branches
with different dropout rates. Original
images

To detect the defects of the CWCs, it is critical to


determine the accurate boundaries of the nuts. We detect the
defects based on the segmentation results and the criteria
defined in section IV. Fig. 12 shows several representative Mask

examples of the segmentation results of the defective


CWCs. It can be observed that the length of defective nuts
Figure 14. Representative defective SPs’ segmentation masks
is larger than that of normal nuts, and when there is nut
generated by the CCSN. For display purposes, the
missing defect the nuts number will be less than four.
segmentation masks are cropped and resized into the same
To further evaluate the performance of our method, we size.
compared our approach with the alternative methods. We
used the segmentation masks of different methods and the It can be observed that when there is split pin missing
label masks to calculate the length differences defined in defect the boundary of the segmentation mask will be
equation (13), as shown in Fig. 13. approximately a convex curve.

Figure 15. Comparison of the maximum distances calculated by


different segmentation methods.
Figure 13. Comparison of the length differences calculated by
different segmentation methods. To further evaluate the performance of our method, we
compared our approach with the alternative methods. We
It can be observed that the label length differences of used the segmentation masks of different methods and the
normal CWCs are less than 3.5 pixels. In contrast, the label label masks to calculate the maximum distances between
length differences of defective CWCs are larger than 8.5 the mask boundaries and their convex hulls defined in
pixels. As the segmentation accuracy decreases, the equation (14), as shown in Fig. 15.

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2955753, IEEE Access

It can be observed that the label maximum distances of REFERENCES


normal SPs are larger than 7.5 pixels. In contrast, the label [1] Fittings for Overhead Contact System in Electrification Railway,
China Railway Industrial Standard, TB/T 2075.1, 2010.
maximum distances of defective SPs are 0 pixels. As the [2] S. Bruni, G. Bucca, M. Carnevale, et al., “Pantograph–catenary
segmentation accuracy decreases, the inconsistency interaction: recent achievements and future research challenges,”
between the calculation results and the labels increases. International Journal of Rail Transportation, vol. 6, no. 2, pp. 57–82,
Since the defective sample is rare in practice, we use Nov. 2017.
[3] S. Gao, Z. Liu, L. Yu., “Detection and monitoring system of the
normal samples to determine the threshold of defect pantograph-catenary in high-speed railway (6C),” in 7th International
detection. For each detection method, we calculate the Conference on PESA, 2017, pp. 779-788, Hongkong, China.
mean value Mj and standard deviation σj of the maximum [4] E. Karakose, M. T. Gencoglu, M. Karakose, et al., “A new
distances for the normal images. A SP is classified as experimental approach using image processing-based tracking for an
efficient fault diagnosis in pantograph-catenary systems,” IEEE
defective, if its maximum distance less than Mj-3σj. Then, Transactions on Industrial Informatics, vol. 13, no. 2, pp. 635–643,
we use the F1-score defined in equation (15) as the metric Apr. 2017.
to evaluate the performance of different methods. [5] Cao G, Ruan S, Peng Y, et al. “Large-complex-surface defect
TABLE IV detection by hybrid gradient threshold segmentation and image
COMPARISON OF DIFFERENT SP DEFECT DETECTION registration”. IEEE Access, vol. 6, pp. 36235-36246, July. 2018.
APPROACHES [6] D. Carrera, F. Manganini, G. Boracchi, et al., “Defect detection in
Detection approaches Threshold tp fp fn F1-score SEM images of nanofibrous materials,” IEEE Transactions on
Deeplab 2.49 27 4 1 91.5% Industrial Informatics, vol. 13, no. 2, pp. 551–561, Apr. 2017.
Unet 3.88 28 5 0 91.8% [7] Li C, Gao G, Liu Z, et al. “Defect detection for patterned fabric
Mask RCNN 4.79 28 1 1 96.6% images based on GHOG and low-rank decomposition”. IEEE Access,
CCSN 4.91 28 1 0 98.2% vol. 6, pp. 83962-83973, July. 2019.
[8] R. Girshick, “Fast R-CNN,” in IEEE International Conference on
It can be found in Table IV that the CCSN only falsely Computer Vision, 2015, pp. 1440-1448, Boston, USA.
reported one defect. Compared with other approaches listed, [9] X. Wu, P. Yuan, Q. Peng, et al., “Detection of bird nests in overhead
the CCSN has achieved the best performance. In addition, catenary system images for high-speed rail,” Pattern Recognition, vol.
51, pp. 242–254, Sep. 2015.
although the Mask RCNN has similar architecture as the
[10] X. Gibert, V. M. Patel, and R. Chellappa, “Deep Multitask Learning
CCSN, the performance of the Mask RCNN is still worse for Railway Track Inspection,” IEEE Transactions on Intelligent
than that of the CCSN, which indicates that the Transportation Systems, vol. 18, no. 1, pp. 153–164, Jan. 2017.
segmentation network can benefit from feature fusing and [11] S. Faghih-Roohi, S. Hajizadeh, A. Núñez, et al., “Deep convolutional
neural networks for detection of rail surface defect,” in International
model ensemble.
Joint Conference on Neural Networks, 2016, pp. 2584–2589,
In general, it can be observed in Table III and Table IV Budapest, Hungary.
that the defect detection performances of the segmentation [12] J. Chen, Z. Liu, H. Wang, A. Núñez, and Z. Han, “Automatic Defect
networks improve with their segmentation accuracies. On Detection of Fasteners on the Catenary Support Device Using Deep
Convolutional Neural Network,” IEEE Transactions on
the other hand, compared with split pin missing detection,
Instrumentation and Measurement, vol. 67, no. 2, pp. 257–269, Feb.
CWC looseness detection is more sensitive to the 2018.
segmentation accuracy. Such as the performances of [13] Fan J , Yau D K Y , Elmagarmid A K , et al., “Image segmentation
Deeplab and Unet in CWC looseness detection are much by integrating color edge detection and seeded region growing,”
IEEE Transactions on Image Processing, vol. 10, no. 10, pp. 1454–
worse than their performances in split pin missing detection.
1466, Oct. 2018.
VI. CONCLUSIONS [14] Ciresan D, Giusti A, Gambardella L M, et al., “Deep neural networks
This paper presented a method to detect CWS defects. A segment neuronal membranes in electron microscopy images,” in
Bayesian fully convolutional network CCSN that fuses Conference on Neural Information Processing Systems, 2012, pp.
different level features of the backbone network was 2843-2851, Nevada, USA.
[15] V. Badrinarayanan, K. Alex, C. Roberto, “Segnet: A deep
proposed to segment the CWS components. As a Bayesian convolutional encoder-decoder architecture for image segmentation,”
neural network, the CCSN allows us to evaluate the model arXiv preprint arXiv: 1612.08242, 2016.
uncertainty on predictions. Using the geometries of the [16] T. Y. Lin, P. Dollár, R. Girshick, et al., “Feature pyramid networks
components, the criteria were defined to determine the for object detection,” arXiv preprint arXiv: 1612.03144v1, 2016.
[17] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
defect state of the components. We thoroughly evaluated for semantic segmentation,” in IEEE Conference on Computer
the performances of the CCSN and compared it with the Vision and Pattern Recognition, 2015, pp. 3431–3440, Boston, USA.
baseline methods in terms of segmentation accuracy, which [18] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional
was more accurate than the compared methods in CWS Networks for Biomedical Image Segmentation,” In International
Conference on Medical image computing and computer-assisted
components segmentation. The experimental results show intervention, 2015, pp. 234-241, Munich, Germany.
that the proposed criteria can effectively detect the [19] Zou Q, Zhang Z, Li Q, et al., “Deepcrack: Learning hierarchical
components defects. Thus, the proposed approach can be convolutional features for crack detection,” IEEE Transactions on
implemented in the catenary inspection system to detect the Image Processing, vol. 28, no. 3, pp. 1498–1512,March. 2018.
[20] K. He, G. Gkioxari, P. Dollár, et al., “Mask R-CNN, in IEEE
CWS defects. Further research will focus on further Conference on Computer Vision and Pattern Recognition,” 2017, pp.
lightening the proposed framework and the design of proper 2980-2988, Venice, Italy.
criteria for other key catenary components. [21] MacKay, JC. David, “A practical Bayesian framework for
backpropagation networks,” Neural computation, vol. 4, no. 3, pp.
448–472, May. 1992.

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2955753, IEEE Access

[22] Graves A. “Practical variational inference for neural networks,” in


Conference on Neural Information Processing Systems, 2011, pp.
2348-2356, Granada, Spain.
[23] Y. Gal and Z. Ghahramani, “Bayesian convolutional neural networks
with Bernoulli approximate variational inference,” arXiv preprint
arXiv:1506.02158, 2015.
[24] Y. Gal and Z. Ghahramani, “Dropout as a Bayesian approximation:
Representing model uncertainty in deep learning,” in International
Conference on Machine Learning, 2016, pp. 1050-1059, New York,
USA.
[25] Kendall, Alex, Vijay Badrinarayanan, and Roberto Cipolla,
“Bayesian segnet: Model uncertainty in deep convolutional encoder-
decoder architectures for scene understanding,” arXiv preprint arXiv:
1511.02680, 2015.
[26] Zhao G, Liu F, Oler J A, et al., “Bayesian convolutional neural
network based MRI brain extraction on nonhuman primates,”
NeuroImage, no. pp. 32-44, 2018.
[27] He K, Zhang X, Ren S, et al., “Deep residual learning for image
recognition,” in IEEE conference on computer vision and pattern
recognition, 2016, pp. 770–778, Las Vegas, USA.
[28] L. C. Chen, G. Papandreou, Kokkinos, et al., “Deeplab: Semantic
image segmentation with deep convolutional nets, atrous convolution,
and fully connected crfs,” IEEE transactions on pattern analysis and
machine intelligence, vol. 40, no. 4, pp. 257–269, Apr. 2018.

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like