0% found this document useful (0 votes)
30 views13 pages

Densely Connected Deep Neural Network Considering Connectivity of Pixels For Automatic Crack Detection

Uploaded by

hhou82291
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views13 pages

Densely Connected Deep Neural Network Considering Connectivity of Pixels For Automatic Crack Detection

Uploaded by

hhou82291
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Automation in Construction 110 (2020) 103018

Contents lists available at ScienceDirect

Automation in Construction
journal homepage: www.elsevier.com/locate/autcon

Densely connected deep neural network considering connectivity of pixels T


for automatic crack detection

Qipei Mei, Mustafa Gül , Md Riasat Azim
Department of Civil and Environmental Engineering, University of Alberta, Edmonton, Alberta T6G 2W2, Canada

A R T I C LE I N FO A B S T R A C T

Keywords: In order to develop smart cities, the demand for assessing the condition of existing infrastructure systems in an
Crack detection automated manner is burgeoning rapidly. Among all the early signs of potential damage in infrastructure sys-
Deep learning tems, formation of cracks is a critical one because it is directly related to the structural capacity and could
Transposed convolution layer significantly affect the serviceability of the infrastructure. This paper proposes a novel deep learning-based
Densely connected layers
method considering the connectivity of pixels for automatic pavement crack detection which has the potential to
Connectivity of pixels
complement the current practice involving visual inspection which is costly, inefficient and time-consuming. In
the proposed method, the convolutional layers are densely connected in a feed-forward fashion to reuse features
from multiple layers, and transposed convolution layers are used for multiple level feature fusion. A novel loss
function considering the connectivity of pixels is introduced to overcome the issues related to the output of
transposed convolution layers. The proposed method is tested on two datasets, where the first one is collected
from a handheld smartphone and the second one is collected from a high-speed camera mounted on the rear of a
moving car. In both datasets, the proposed method shows superior performance than other available methods.

1. Introduction Generally, there are two branches in the field of image-based crack
detection studies. The first one is based on traditional image processing
The development of smart infrastructure systems to provide re- techniques. The researchers handcrafted a series of filters to distinguish
sidents with high levels of comfort is imperative for the modern age. the cracks from background or noise [13–16]. Since cracks are some-
One crucial component of smart infrastructure systems is timely sensing times very similar to the texture of the material surfaces, it is challen-
and monitoring to ensure the safety of the infrastructures as well as the ging to distinguish the cracks from the background. Abdel-Qader et al.
people who use them. Existing infrastructure systems are experiencing a [17] applied and compared four basic edge detection methods on
variety of risks due to overloading and environmental effects in their images and determined that Haar transform outperformed other
life cycles [1]. It is imperative to identify potential damage in its early methods in terms of crack detection. Sinha et al. [15] developed a
stage, by which not only severe accidents can be prevented but also method to detect cracks in buried pipe images which first extracted
costs can be reduced. A number of researchers in recent years have local features from the pixels and then to find cracks among segment
shown interest in developing crack detection mechanisms to provide candidates through cleaning and linking. In the method proposed by
early signs of potential damage in building materials [2–5]. The current Fujita and Hamamoto [16], they subtracted the median filtered image
common practice for crack detection is primarily based on visual in- from the original one to expose the location of cracks on the concrete
spection which is costly and inefficient. surface and then applied a line filter based on Hessian matrix and
To resolve the issues mentioned above and to automate the crack probabilistic relaxation algorithm to further highlight and link the
detection process, researchers have developed methodologies based on cracks. According to Iyer and Sinha [18], cracks in buried pipes are
different techniques, such as vibration signals, ultrasound, laser or darkest in the images, locally linear and have tree-like geometry. Based
images [6–11]. For the cracks on the surface of building materials, on this observation, they presented a 3-step method based on mathe-
image-based crack detection methods are among the most promising matical morphology and curvature evaluation for crack detection in a
ones because cameras are more accessible than other tools these days noisy environment. Many more relevant studies can be found in
especially after the widespread of digital cameras and smartphones [18–21].
[12]. Despite significant progress made in image processing based crack


Corresponding author.
E-mail address: [email protected] (M. Gül).

https://doi.org/10.1016/j.autcon.2019.103018
Received 20 June 2019; Received in revised form 20 October 2019; Accepted 16 November 2019
Available online 26 November 2019
0926-5805/ © 2019 Elsevier B.V. All rights reserved.
Q. Mei, et al. Automation in Construction 110 (2020) 103018

detection methods, they have their limitations. One of the biggest magnetic resonance imaging of human brains [41]. Recently, the dense
challenges is that the features are usually manually designed for some random neural network was extended to multiple layer architecture
crack detection tasks, and they may not work properly if some condi- (RNN-MLA) in the deep learning field for classification tasks [42]. Al-
tions are changed. In real-life applications, it is almost impossible for though the current work presented in this paper also includes densely
one method to work for all the cases due to the irregularity of cracks, connected layers, the concept used in this paper is different than the
the complex illumination conditions and the varying texture in material previous work [42]. This study includes massive “skip connections”
surfaces [22]. that connect layers that are farther away in the chain-like architecture
Therefore, researchers tend to seek solutions in another branch, i.e. which leads to better reuse of low-level features, while RNN-MLA in-
machine learning-based methods. Among all machine learning techni- cludes numerous local interconnections called soma-to-soma interac-
ques, the methods using deep learning have shown superior perfor- tions.
mance on crack detection tasks. Deep learning-based methods have A new loss function considering the connectivity of pixels in cracks
significant advantages compared to traditional image processing tech- is developed to resolve the discontinuity issue existed in deconvolu-
niques or other machine learning techniques [22–25]. First, there is no tional layers. It is well believed that the loss function would affect the
need to manually design features to distinguish cracks from the back- performance of a deep learning algorithm significantly [43]. In most
ground. The features are learned automatically from the training pro- crack detection algorithms [22,28,29,44], the influence of loss function
cess. Second, the deep neural networks can implicitly consider the was not studied. Regular cross-entropy function was usually used to
complex illumination conditions and irregularity of cracks from the calculate the loss of crack and non-crack pixels equally. However, the
training datasets. Third, the multiple nonlinear layers of the neural training set of crack detection images are usually highly biased, which
networks can represent the data better, which usually yields better means there are many more non-crack pixels than crack pixels. Some
performance. studies used focal loss instead of regular cross-entropy loss to resolve
Recently, significant progress has been made in crack detection this issue and obtained very good performance [45]. Another issue is
methods using deep neural networks. Protopapadakis et al. [13] pro- that cracks on an image are highly local information which means if one
posed an intelligent platform for the tunnel for tunnel assessment. On pixel belongs to a crack the adjacent pixels are more likely to belong to
this platform, they developed a 3-layer convolutional neural network the same crack. In the traditional cross-entropy loss function, all the
(CNN) for crack detection in tunnels. Cha et al. [22] trained an 8-layer pixels were measured independently, and no information about their
CNN on 40,000 image patches with a resolution of 256 × 256 pixels. connectivity was taken into consideration during the training. This
Each patch was classified as crack or non-crack. With the help of CNN, would lead to the sparsely distributed output. The new loss function
they could test the trained neural network on images with different proposed in this paper is to resolve this issue. In the last step of the
resolutions. Zhang et al. [26] proposed a 5-layer deep neural network proposed method, a depth-first search (DFS) method is used to
with more than 1 million parameters. With the help of convolutional threshold crack pixels from the background.
layers and fully connected layers, the input and output of the neural The contribution of this paper includes the following: 1) a novel
network have the same size so that pixel-level prediction could be deep neural network architecture that densely connects multiple layers
achieved. Ni et al. [27] developed a method integrating a feature ex- in a forward fashion is used for crack detection. 2) a novel loss function
traction network and crack delineation network for concrete crack de- is proposed to take into consideration of connectivity of crack pixels; 3)
tection. In their method, the features were extracted from different le- this study designed and implemented a real-life experimental study to
vels of the first neural network and then were fed to the second neural collect video extracted images using a camera mounted at the rear of a
network for detection. Dung and Anh [28] presented a fully convolu- vehicle operating at traffic speed to mimic the behavior of a backup
tional neural network (FCN) for crack detection. Without using pooling camera.
layers and fully connected layers, the input and output could stay the This paper is organized as follows. Section 2 describes the metho-
same size that pixel-level prediction can be achieved. They applied their dology and the proposed method to overcome existing issues. Section 3
method to a concrete specimen during a cyclic loading test. Bang et al. presents two datasets and the corresponding analysis results. Conclu-
[29] designed a deep neural network in an encoder-decoder style. The sions are drawn in section 4.
encoder of their neural network was based on residual neural network
and the decoder was implemented using deconvolutional layers. The 2. Methodology
method was applied to road images collected from a black box camera
on moving cars. Zhang et al. [30] introduced a method combining se- 2.1. Overview
mantic segmentation and neighborhood fusion for crack detection. In
their method, Sobel edge detectors were first applied to find localized Fig. 1 presents the overall procedure of the proposed method. In
patches. Then, SegNet was applied to the patches for crack detection. step 1, the original image is divided into 256 × 256 patches for data
Yang et al. [31] introduced a feature pyramid and hierarchical boosting augmentation. In step 2, the patches are fed into the deep neural net-
network (FPHBN) into a holistically-nested edge detection method for work built mainly from the densely connected neural network and
pixel-level pavement crack detection. The pyramid module can reserve transposed convolution layers. The densely connected neural network
the information from low-level features and hierarchical boosting ar- has 201 layers, which is usually called DenseNet201 [46]. The feature
chitecture can help the deep neural network to pay more attention to fusion is conducted at multiple levels to generate a better prediction
hard examples. In recent years, there is an increasing number of studies map. In step 3, the patches are integrated and binarized to reconstruct
about crack detection using deep learning-based methods. [4,32–37] the crack binary mask in the third step. In step 4, a DFS algorithm is
In this paper, a novel method based on densely connected deep applied to find connected components in the binary mask. Based on the
neural network considering the connectivity of pixels in loss function is logic that cracks usually have a large number of connected pixels while
presented for pixel-level crack detection on road pavements. Feature noise often has much fewer pixels, we threshold the connected com-
fusion is conducted at multiple levels in the densely connected deep ponents with a number of pixels lower than a certain value to separate
neural network. Densely connected neurons were first discussed in the cracks from the background.
[38,39] in a random neural network. The corresponding learning
strategy for this neural network was described in [40]. In these studies, 2.2. Densely connected neural network with multiple level feature fusion
each neuron was connected to at most 8 neighbors which help reserve
the detailed information of the image. Then a recurrent random neural As presented in Fig. 1, the deep neural network consists of con-
network was applied to extract precise morphometric information from volutional layers, max pooling layers, densely connected layers,

2
Q. Mei, et al. Automation in Construction 110 (2020) 103018

Fig. 1. Overview of the proposed method.

transition layers, and transposed convolution layers. The network fol- where i and j are indices of the input features, x and y are indices of the
lows an encoder-decoder schema where the width and height of the output and 2 t + 1 is the kernel size.
features become smaller as the mainstream of the neural work goes
deeper. The feature fusion is done at multiple levels to decode the 2.2.3. Dense block
features and the original size of the image is recovered. First, the output It is believed that deeper neural networks can represent complex
of dense 4 layers (8 × 8 × 1920) is upsampled twice to features with data better with more nonlinearity in previous studies [50]. However,
16 × 16 × 1792. Then, the upsampled features are added to the output traditional CNN is a chain-like architecture, and this configuration will
of a dense 3 layers. After that, the added features are upsampled again make the neural networks harder to train when they become deeper due
to the dimension of 32 × 32 × 512. The features are further added to the increasing number of parameters and gradient vanishing [51]. In
with the output of the dense 2 layers. Eventually, the fused features are order to resolve the training issues and gain a performance boost,
upsampled one last time to 256 × 256 × 2 which is the original size of Huang et al. [46] introduced a standard block called the dense block.
the image to achieve the pixel-level crack detection. The functions of Within such block, every layer is connected to every following layer in a
different layers are summarized in the following sections. feed-forward fashion except the mainstream chain-like structure. In this
way, the gradient vanishing problem can be alleviated, and also the
2.2.1. Convolutional layer features can be reused so that the number of parameters can be sig-
The most important layer in the proposed deep neural network is nificantly reduced.
called the convolutional layer, which was first proposed by LeCun et al. Fig. 2 presents the details of dense block 1 in Fig. 1. The basic
[47]. In this layer, the data is processed in the form of an operation component of a dense block is a 1 × 1 × 128 convolutional layer and a
called convolution [48]. The convolution operation is widely used in 3 × 3 × 32 convolutional layer. The only difference among dense
image processing [49], where the localized features are convolved with blocks 1 to 4 is the number of basic components. The dense block 1 has
the kernel to generate new outputs. Unlike in traditional image pro- 6 such components, and dense blocks 2 to 4 have 12, 48 and 32 basic
cessing techniques where the parameters in the kernel are predefined, components, respectively. Taking dense block 1 as an example, the
the parameters in convolutional layers are determined during the input has a dimension of 64 × 64 × 64, and it is passed through the
training process. The equation for a convolutional layer is presented in basic component to generate a feature of 64 × 64 × 32. Then, feature 1
Eq. (1). is concatenated with input to generate a feature of
64 × 64 × (64 + 32). Similarly, feature 2 is concatenated with feature
O (i, j, k ) = ∑ ∑ ∑ I (i − m, j − n, l) wk (m, n, l) + bk 1 and input to generate a feature of 64 × 64 × (64 + 32 + 32). It is
m n l (1)
seen that the depth of feature is increased by 32 every time a basic
where I and O stand for the input and output features of the convolu- component is passed through. The output of the dense block 1 has a
tional layer. The symbols l and k represent the lth and kth features in dimension of 64 × 64 × (64 + 32 × 6) = 64 × 64 × 256. Similarly,
the input and output. the output of dense block 2 has a dimension of
32 × 32 × (128 + 32 × 12) = 32 × 32 × 512, dense block 3 has
16 × 16 × (256 + 32 × 48) = 16 × 16 × 1792, and dense block 4
2.2.2. Max pooling layer has 8 × 8 × (896 + 32 × 32) = 8 × 8 × 1920.
A max pooling layer modifies the input of the layer in a way that the
value of features at a certain location is replaced with the maximum
value of the nearby features [48]. The max pooling layer makes the 2.2.4. Transition block
features invariant to small translations of the input and can also im- The dense block itself only changes the depth without touching the
prove the computational efficiency of the network. The equation of max height and width of the features. The height and width of the features
pooling layer is are changed using transition blocks. A transition block consists of a
1 × 1 convolutional layer with half the depth of the input and an
t t
O (x , y, l) = maxmax(I (i − p, j − q, l)) average pooling layer with a size of 2 × 2 and a stride of 2. As pre-
p =−t q =−t (2) sented in Fig. 3, height, width and depth are all divided by half by

3
Q. Mei, et al. Automation in Construction 110 (2020) 103018

Dense block 1

Input +
Input + Feature 1 +
Input + Feature 1 + Feature 2 +
Input + Feature 1+ Feature 2 + Feature 3 +
Input + Feature 1 + Feature 2+ Feature 3 + Feature 4 +
Input + Feature 1 + Feature 2 + Feature 3+ Feature 4 + Feature 5 +
Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6

Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6 Output


Input: 32×32×64 32×32×32 32×32×32 32×32×32
32×32×32 32×32×32 32×32×32 32×32×256

Fig. 2. Details of dense block 1.

Transition block 1 In addition, as shown in Eq. (3), regular cross entropy loss function
treats the pixels at different locations indifferently.
64×64×256 64×64×128 32×32×128
Dense 1×1 Conv Avg Pooling Dense Loss = ∑ [−y (i, j ) log 
y (i, j ) − (1 − y (i, j )) log(1 − 
y (i, j ))]
block 1 Depth = 128 Stride = 2 block 2 i, j ∈ image (3)

where y(i,j) is the true label of a pixel at location i and j in the image, in
which 1 represents crack and 0 represents non-crack. And  y (i, j ) is the
label predicted by the deep neural network.
Fig. 3. Details of transition block 1. Fig. 5 presents sample crack annotation with 1 as crack and 0 as
non-crack. It is seen that the pixel in the red box is surrounded by 8
passing through this block. The transition blocks are applied following crack pixels and the one in the green box is next to only 1 crack pixel.
every dense block to reduce the size of the features. Obviously, the pixel that is surrounded by the crack pixels is more likely
to be a crack and predicting it as non-crack would lead to output as
2.2.5. Transposed convolutional layer and feature fusion shown in Fig. 4(b), i.e., pixels that belong to one crack segment are not
In an encoder-decoder schema, the size of features reduces first and connected. However, regular cross-entropy loss function treats the
then increases. All the layers described above either decrease or keep pixels in red and green boxes indifferently, and the wrong prediction in
the size of the features. The increase of the size is achieved by applying both pixels will result in the same loss. The problem with the above loss
transposed convolution layers. Transposed convolution layer was used function is that it does not consider the relationship among annotations
for image segmentation for the first time by Long et al. [52]. Similar but of neighboring pixels.
in a reverse way to a convolutional layer, the transposed layers conduct
the upsampling in a sliding window form. The parameters in transposed 2.3.2. Loss function considering connectivity of pixels
convolution layers are trainable. As can be seen in Fig. 1, transposed In previous studies [28,29,31], the pixel-level crack annotations are
layers are applied three times following the output of dense block 4. treated as a binary mask, where the crack is 1 and non-crack is 0, and
Every time the output is upsampled, it is added with the intermediate the cross-entropy loss function was used to calculate the correctness of
features from previous layers. In this way, the more detailed informa- the prediction of each pixel. The issues regarding this setting have been
tion from the early layers can be directly used for crack detection. discussed in the last section. Although more advanced post-processing
algorithms such as multiscale open-closing by reconstruction [53] can
2.3. A new loss function be applied to improve the performance, this paper considers the con-
nectivity of pixels by designing a new loss function.
2.3.1. Problems with transposed convolution layers and traditional cross To resolve the problem mentioned earlier, we treat the pixel-level
entropy loss function crack detection as a connectivity problem. A new loss function is pre-
As presented in the previous section, the transposed convolution sented to account for the connectivity of pixels while doing crack de-
layers conduct the upsampling to generate binary masks for cracks with tection. First, we convert the binary mask annotation to connectivity
the same width and height as the original patch. The parameters of the maps. In Fig. 6, we can see each pixel P has 8 neighboring pixels.
transposed convolution layers are determined during the training pro- Therefore, 8 connectivity maps can be generated solely based on the
cess. The pixels in the output is corresponding to a local area in the binary mask information. As can be seen in Fig. 7, if a pixel is crack and
input image called the receptive field. However, each pixel is predicted is connected to its top-left neighbor (A1), the corresponding location of
independently even though the spatial relationship is reserved. In other the A1 map is set to 1, and otherwise, it will be set to 0. For instance, in
words, there is no explicit restriction that forces one pixel to crack if all the binary mask of Fig. 7, the pixel at the second row and the second
its neighboring pixels are cracks. The prediction will likely to be spar- column is crack (1) and its top-left neighbor is also crack, so the ele-
sely distributed. This issue was also observed in previous studies ment at the second row and second column of the A1 map is 1. In
[29,44]. Morphological operations, i.e. a combination of dilation and contrast, the top-left neighbor of the pixel at the second row and the
erosion, are a possible option to this problem. However, the perfor- fourth column is non-crack, so the corresponding element in the A1
mance of this method is highly dependent on the chosen size of the map is 0.
operation. As can be seen in Fig. 4(c), the gaps among discontinued The loss function is designed to optimize the neural network para-
pixels cannot be completely filled if the size of morphological opera- meters so that all 8 connectivity maps are closer to the correct labels.
tions is too small. In Fig. 4(d), we can see some unnecessary area are The new loss function is a sum of the cross entropy function of all 8
filled when a large size is selected. connectivity maps.

4
Q. Mei, et al. Automation in Construction 110 (2020) 103018

Fig. 4. Sample Result from the deep neural network with regular cross entropy loss function.

results using our proposed function is more solid and more accurate.
For the inference stage, we first set a threshold, θprob, for the con-
nectivity maps, i.e., any prediction larger than θprob would yield a 1 in
the corresponding feature map. Then, we sum up the values of all 8
feature maps. The largest and smallest values in the summation would
be 8 and 0, which means all the neighboring pixels are crack pixels and
none of the neighboring pixels is crack. Then, we specify another
threshold, θconn. The pixel is predicted as crack if the summation of all 8
feature maps is larger than θconn.

Fig. 5. Sample crack annotation. 2.4. Post-processing

The raw prediction from the proposed neural network may still
include some noise (see Fig. 1) even though the proposed loss function
A1 A8 A7 has improved this aspect significantly. A post-processing algorithm is
proposed based on the fact that cracks often have a number of con-
nected pixels while noise has much fewer. In this algorithm, the raw
A2 N/A A6 binary mask generated by the proposed neural network is first con-
verted to a graph where every pixel predicted as the crack is a node and
there is an edge between every pair of the neighboring nodes. There-
A3 A4 A5 fore, the problem is simplified as looking for connected components
(CC) in the graph and threshold out the CC with a small number of
Fig. 6. Connectivity of Pixel P.
pixels. A standard DFS algorithm is applied to find all the CCs due to its
computational efficiency [54].

New Loss
8 3. Experiments and analysis
= ∑ ∑ [−y Ak (i, j ) log 
y Ak (i, j ) − (1 − y Ak (i, j )) log(1 − 
y Ak (i, j ))]
k = 1 i, j ∈ image Two datasets are employed to verify the proposed method, where
(4) the first one is a publicly available dataset called CFD [11] and the
second one is a dataset created by our team named as EdmCrack1000
where yAk(i, j) is the true label of a pixel at location i and j in the con- which includes 1000 images collected from a commercial-grade sports
nectivity map Ak. And  y Ak (i, j ) is the label predicted by the deep neural camera mounted on the rear of a moving car. In this paper, both da-
network. tasets are trained on a desktop with Intel 8700 k CPU, 32GB memory
In this way, predicting a pixel surrounded by 8 crack pixels would and Nvidia Titan V GPU with 5120 CUDA cores. The details of the
result in roughly 8 times penalty than predicting a pixel which is only datasets and the analysis results are given in the following sections.
connected to one crack pixel. Fig. 8 compares the regular loss function
and our proposed loss function. The corresponding area of the results
3.1. CFD dataset
from two loss functions is marked with the same color. We can see that
the results from regular loss function are more scattered while the
The CFD dataset was first proposed by Shi et al. [11] to verify their

Fig. 7. Generating 8 connectivity maps.

5
Q. Mei, et al. Automation in Construction 110 (2020) 103018

Fig. 8. Comparison between regular loss function and our proposed loss function.

crack forest method. In total, the dataset consists of 118 road surface metrics, precision, recall, and F1 score, are reported to show the per-
images taken with a handheld iPhone 5 in Beijing. The exposure time is formance of the proposed method. The formulae to calculate these three
1/134 s, the focus is 4 mm and the aperture is f/2.4. The distance be- metrics are presented in Eqs. (5), (6) and (7).
tween the smartphone and the road surface is about 1.5 m. The images TP
have a resolution of 480 by 320 pixels. Most of the images intentionally precision =
TP + FP (5)
focus on a small area which includes pavement texture and cracks.
There are stains and shadows in the images, but they are manually TP
recall =
selected to exclude irrelevant objects such as garbage, potholes or cars TP + FN (6)
on the roads. 2 × precision×recall
According to [11], the dataset is randomly split to 70 and 48 images F1 score =
precision + recall (7)
for training and testing following the 60%/40% rule. Since the per-
formance of the deep neural network would not be good if the amount where TP, FP, and FN are the number of true positive, false positive and
of data is not enough, data augmentation techniques are applied in this false negative for each image. Following the definitions of [11], if a
paper. First, all the training images are flipped horizontally and verti- pixel is identified as crack by the proposed method, and it is within 5-
cally. Then, they are cut into 128 × 128 patches with a stride of 16. In pixel distance to a pixel annotated as crack, this pixel is considered as
this way, each image can generate 299 square patches. Eventually, true positive. In contrast, if a pixel is identified as crack but there is no
there are 299 × 70 × 3 = 62,790 patches for the training process. Due true crack pixel within 5-pixel distance, it is considered as false posi-
to the limitation of the GPU memory, the batch size is chosen as 16 in tive. If a pixel is identified as non-crack, but is actually a crack pixel
this paper. We choose this as the first dataset to verify our proposed according to the annotation, then this pixel is a false negative.
method because it is well annotated and many other studies used this In Fig. 9, the new loss function over training epochs is shown in the
dataset so that a direct comparison can be made with their methods. left plot, and three metrics described above are shown in the right plot.
In the right plot, it is shown that as the number of training epochs
increases, the recall first drops gradually, but the recall goes up dra-
3.2. Analysis and results for CFD dataset matically. This is because initial weights identify all the pixels as crack,
and thus there are no FN pixels. As the training goes on, precision,
The training results for the CFD dataset are presented in Fig. 9. Each recall, and F1 score converge to 91.00%, 93.22%, and 91.99%, re-
epoch represents going through all the training data on a batch basis spectively. In Table 1, the proposed method is compared with some
once, which is 62,790/16 = 3925 iterations for the CFD dataset. Three other methods, where Canny, CrackTree, FFA, CrackForest, MFCD,

Fig. 9. Training results for CFD dataset: (left) new loss over training epochs; (right) precision, recall and F1 score over training epochs.

6
Q. Mei, et al. Automation in Construction 110 (2020) 103018

Table 1 might have caused the drop of performance.


Comparison of performance for different methods on CFD dataset. Some sample images and results from the proposed method are
Method Precision Recall F1 score presented in Fig. 10. We can see the proposed neural network with the
novel loss function can identify the cracks very accurately and has
Canny [11] 12.23% 22.15% 15.76% smooth boundaries. The texture changes do not affect the identification
CrackTree [11] 73.22% 76.45% 70.80%
process. The post-processing algorithm successfully removes the noise
FFA [55] 78.56% 68.43% 73.15%
CrackForest [11] 82.28% 89.44% 85.71%
from the real cracks. The sample results from some other methods can
ResNet152-FCN [29] 87.83% 88.19% 88.01% be found in [11,55,56].
MFCD [55] 89.90% 89.47% 88.04%
VGG19-FCN [44] 92.80% 85.49% 88.53%
CrackNet-V [56] 92.58% 86.03% 89.18% 3.3. EdmCrack1000 dataset
UNet [45] 86.80% 77.97% 81.83%
FPHBN [31] 95.88% 87.97% 91.53%
To further verify the proposed method, a more challenging dataset
Proposed method with regular loss function 92.02% 91.13% 91.58%
Proposed method with new loss function 91.00% 93.22% 91.99% collected by our team using a commercial-grade sports camera mounted
on the rear of a moving vehicle is employed (see Fig. 11). Since current
vehicles do not offer access to backup cameras, this experiment is set up
CrackNet-V are implemented by other researchers as cited in the table, to mimic the behavior of such cameras. There are mainly three reasons
and VGG19-FCN [28], ResNet152 [29], UNet [45] and FPHBN [31] are we choose to mount the camera at the rear of the vehicle: 1) the camera
implemented by our team. CrackNet-V, VGG19-FCN, ResNet152-FCN, is installed outside of the car so that the influence of windows or
UNet, FPHBN and our proposed method are based on deep neural windshield is avoided; 2) the camera is close to the ground and is not
networks, while others are based on traditional image processing blocked by the hood so that the spatial resolution of the image is better.
techniques. In all deep learning-based methods discussed in this section, 3) the feasibility of using the backup camera in the vehicle for crack
the transposed convolution layers are applied as a way for feature fu- detection is studied so that no extra device is required if the access to
sion for pixel-level crack detection. It is seen that the proposed method the backup camera becomes more practical in the future. The camera
can outperform other methods in terms of the F1 score. The new loss used in this study is GoPro Hero 7 Black and the car is Honda Pilot
function can boost the F1 score from 91.58% to 91.99%. It should be 2017. Videos are continuously taken while the car is moving. The re-
noted that the UNet performs poorly on the CFD dataset mainly because solution of the videos is 1920 × 1080 pixels, and the frame rate is 240
it only supports 512 × 512 images as input, and the required dimen- fps. The images are extracted from more than 20 h of videos taken from
sions are larger than the dimensions of the images in the CFD dataset. the roads in Edmonton, Alberta, Canada. In this dataset, there are 1000
Therefore, the images have to be scaled to be fed into UNet, which images annotated at the pixel-level by the first and third authors (QM
and MRA) using software Sketchbook. The GoPro Hero 7 Black is 1 m

Fig. 10. Results from the proposed method for CFD dataset.

7
Q. Mei, et al. Automation in Construction 110 (2020) 103018

Fig. 11. The setup of the image collection system.

Fig. 12. Sample images from EdmCrack1000 along with the annotated ground truths.

Table 2
Comparison of performance for different methods on EdmCrack1000 dataset.
Methods Precision Recall F1 score Computational
efficiency (sec/image)

Canny 1.56% 3.48% 2.92% 0.08


Sobel 2.42% 13.85% 3.89% 0.06
CrackIT 14.55% 7.36% 4.47% 5.57
VGG19-FCN 77.54% 61.88% 66.80% 3.05
ResNet152-FCN 78.37% 58.60% 64.05% 3.37
UNet 77.44% 67.83% 70.68% 4.83
FPHBN 61.63% 87.85% 71.41% 8.18
Proposed method with 74.05% 72.31% 70.98% 3.46
regular loss
function
Proposed method 84.85% 70.59% 75.35% 3.96
with new loss
function

Fig. 13. New loss function during training.


can reach about 3 mm. Any cracks with a width larger than 3 mm will
from the ground and is facing downwards with an angle of 45°. There is be distinguishable.
no restriction about which objects should be included in the images to Fig. 12 shows two sample images from the EdmCrack1000 dataset.
make the situation as real as possible. Because of the distance between The cracks are annotated by binary masks, which will be converted to
the camera and the ground, the spatial resolution of the configuration connectivity maps for our method. In addition to cracks, all other

8
Q. Mei, et al. Automation in Construction 110 (2020) 103018

Fig. 14. Sample images and corresponding results.

objects that could appear on the roads are also included in the dataset. Similar to the CFD dataset, the images are split into 256 × 256 patches
Different weather conditions and illumination conditions are also cov- with a stride of 128 for training and testing. Therefore, we generate 112
ered in the dataset. The goal of this dataset is to reflect the real road patches for each image and there are in total 78,400 patches for
condition as much as possible. training, 11,200 for validation and 22,400 for testing.
For verification purposes, the dataset is split into training, valida-
tion and test sets, where the training set includes 700 images, and va-
lidation and test sets consist of 100 and 200 images, respectively.

9
Q. Mei, et al. Automation in Construction 110 (2020) 103018

Fig. 14. (continued)

3.4. Analysis results for EdmCrack1000 dataset second, a novel loss function considering the connectivity of pixels is
introduced; and third, a post-processing technique is introduced to
The results for EdmCrack1000 dataset are shown in this section. improve the performance.
Fig. 13 shows the new loss function values during the training process. Table 2 presents the precision, recall and F1 score as defined in
The new losses on both training and validation sets are reported. They Section 3.2. They are 84.85%, 70.59%, and 75.35% respectively for our
both reduce from 1 to 0.01 level and gradually converge as the training proposed method. It shows that our proposed method outperforms
continues. The test set is applied at the end of 20 epochs. other methods in terms of all three metrics with a large margin. Canny,
The dataset is analyzed using our proposed method as well as 7 Sobel and CrackIT has very poor performance on this dataset, which
other methods among which Canny and Sobel are standard edge de- makes sense because they were not designed to deal with such complex
tection algorithms [49], CrackIT was proposed by [57,58], VGG19-FCN situations. VGG19-FCN [44], ResNet152-FCN [29], UNet [45] and
was proposed by [28], ResNet152-FCN was proposed by [29], UNet was FPHBN [31] give better performance than traditional image processing
proposed by [45] and FPHBN was proposed by [31]. Canny, Sobel and techniques but are still not comparable to our proposed method. In the
CrackIT methods are based on traditional image processing, while above comparison, VGG19-FCN and ResNet152-FCN are replicated by
VGG19-FCN, ResNet152-FCN, UNet, and FPHBN are deep learning- ourselves and the parameters are proposed in original papers were
based methods. Our proposed method has three advantages compared used. UNet and FPHBN are run from open-source codes provided by the
to the other four deep learning based-methods: first, a densely con- authors of the papers. The computational efficiency for each method is
nected neural network is used to better reuse intermediate features; also presented in Table 2. It is seen that the proposed method takes

10
Q. Mei, et al. Automation in Construction 110 (2020) 103018

Fig. 15. Some wrongly identified images.

around 20–30% more time than VGG19-FCN and ResNet152-FCN but cracks, and the method cannot recognize it properly because real and
less time than UNet and FPHBN to process one image. The proposed sealed cracks are indeed very similar in terms of aspect ratio and color
new loss function can increase the F1 score from 70.95% to 75.35% for intensities. In the right image, some parts of a long crack (in the green
our method. It should be noted that previous studies have shown that box) is not identified by the proposed method due to the disturbance of
random neural networks with multiple layer architecture (RNN-MLA) the stains. One possible solution for these issues is to collect a larger
can outperform the CNN methods in classification tasks in terms of dataset with more critical cases.
accuracy and time complexity [40]. However, the pixel-level crack
detection task presented in this paper is different from the classification
problem, and therefore RNN-MLA cannot be directly applied without 4. Conclusions
significant changes. Although the proposed method is not quantita-
tively compared with a random neural network with multiple layer The paper shows the feasibility of using a cost-effective device and a
architecture (RNN-MLA), it is expected that integrating the idea of deep learning-based algorithm for pavement crack detection. In this
random neural network such as soma-to-soma interactions into our paper, a novel deep neural network architecture with a new loss
proposed method could further improve the accuracy and computa- function considering the connectivity of pixels for automatic crack
tional complexity. This will be studied in the future and is beyond the detection is proposed. In the proposed method, features are fused at
scope of this paper. multiple levels in a densely connected neural network to output pixel-
Some sample images from EdmCrack1000 and the results for all 7 level identification for cracks. The new loss function considering the
other methods along with our proposed method with traditional loss connectivity of pixels is introduced to avoid scattered results and to
function are shown in Fig. 14. It is clear that traditional methods, i.e., make the boundaries smoother. The following conclusions are drawn
Canny, Sobel and CrackIT, cannot distinguish the cracks from the from this study:
background very well in such a complex situation. VGG19-FCN, Re-
sNet152-FCN, UNet, and FPHBN perform better on the given dataset, 1. The proposed loss function tackled the issues regarding deep neural
but some other objects like shadows from trees or edges of the roads are networks with transposed convolution layers, i.e., sparsely dis-
also wrongly identified as cracks. The results from our proposed method tributed identification, for pixel-level crack detection.
with new loss function are the best because it clearly identifies the 2. The proposed method outperforms other methods in terms of pre-
cracks and is robust when shadow and illumination changes exist. It cision, recall and F1 score for two datasets.
should be acknowledged that there are still some crack segments that 3. It is feasible to use a deep learning-based method to detect cracks on
are not properly identified in our proposed method. the pavement in a complex environment.
Although the proposed method has achieved state-of-the-art per-
formance, there are still some limitations to the current method. Fig. 15 Even though the proposed method achieves excellent performance
presents some images that are not identified correctly. In the left image on the given datasets, there are still challenges that are yet not resolved.
of Fig. 15, as can be seen in the red box, the sealed cracks are wrongly The proposed method was tested on pavements only. Also, this method
identified as real cracks with the proposed method. The reason could be does not aim to distinguish from different types of cracks or road de-
that the training data do not include too many images with sealed fects. Further studies should be conducted to address these issues. In
addition, the performance of the algorithm, as well as many other

11
Q. Mei, et al. Automation in Construction 110 (2020) 103018

algorithms, is not good if a regular low-speed camera is used during [21] H.K. Jung, G. Park, Rapid and non-invasive surface crack detection for pressed-
driving because of the blurriness of images. Also, the low contrast due panel products based on online image processing, Struct. Health Monit. (2019)
1475921718811157, , https://doi.org/10.1177/1475921718811157.
to low light conditions could degrade the performance. In the future, [22] Y.J. Cha, W. Choi, O. Büyüköztürk, Deep learning-based crack damage detection
the authors will keep working on improving the performance of the using convolutional neural networks, J. Comput. Aided Civ. Infrastruct. Eng. 32 (5)
proposed crack detection method for complex situations as well as re- (2017) 361–378, https://doi.org/10.1111/mice.12263.
[23] K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, Proceedings of the IEEE
ducing its computational complexity. Also, other practical issues en- International Conference on Computer Vision, 2017, pp. 2961–2969, , https://doi.
countered during the application of the proposed method will be stu- org/10.1109/ICCV.2017.322.
died. [24] Y.J. Cha, W. Choi, G. Suh, S. Mahmoudkhani, O. Büyüköztürk, Autonomous
structural visual inspection using region-based deep learning for detecting multiple
damage types, J. Comput. Aided Civ. Infrastruct. Eng. 33 (9) (2018) 731–747,
Declaration of competing interest https://doi.org/10.1111/mice.12334.
[25] F.-C. Chen, M.R. Jahanshahi, NB-CNN: deep learning-based crack detection using
convolutional neural network and naive Bayes data fusion, IEEE Trans. Ind.
The authors declare that they have no known competing financial
Electron. 65 (5) (2018) 4392–4400, https://doi.org/10.1109/TIE.2017.2764844.
interests or personal relationships that could have appeared to influ- [26] A. Zhang, K.C. Wang, B. Li, E. Yang, X. Dai, Y. Peng, Y. Fei, Y. Liu, J.Q. Li, C. Chen,
ence the work reported in this paper. Automated pixel-level pavement crack detection on 3D asphalt surfaces using a
deep-learning network, J. Comput. Aided Civ. Infrastruct. Eng. 32 (10) (2017)
805–819, https://doi.org/10.1111/mice.12297.
References [27] F. Ni, J. Zhang, Z. Chen, Pixel-level crack delineation in images with convolutional
feature fusion, Struct. Control. Health Monit. (2019) e2286, https://doi.org/10.
[1] G. Félio, Informing the future: the Canadian infrastructure report card, http:// 1002/stc.2286.
canadianinfrastructure.ca/en/index.html, (October 18, 2019). [28] C.V. Dung, L.D. Anh, Autonomous concrete crack detection using deep fully con-
[2] R. Montero, J. Victores, E. Menendez, C. Balaguer, The Robot-spect Eu Project: volutional neural network, Autom. Constr. 99 (2019) 52–58, https://doi.org/10.
Autonomous Robotic Tunnel Inspection, UC3M, (2015). 1016/j.autcon.2018.11.028.
[3] K. Loupos, A.D. Doulamis, C. Stentoumis, E. Protopapadakis, K. Makantasis, [29] S. Bang, S. Park, H. Kim, H. Kim, Encoder–decoder network for pixel-level road
N.D. Doulamis, A. Amditis, P. Chrobocinski, J. Victores, R. Montero, Autonomous crack detection in black-box images, J. Comput. Aided Civ. Infrastruct. Eng. (2019),
robotic system for tunnel structural inspection and assessment, International https://doi.org/10.1111/mice.12440.
Journal of Intelligent Robotics and Applications 2 (1) (2018) 43–66, https://doi. [30] X. Zhang, D. Rajan, B. Story, Concrete crack detection using context-aware deep
org/10.1007/s41315-017-0031-9. semantic segmentation network, J. Comput. Aided Civ. Infrastruct. Eng. (2019),
[4] K. Makantasis, E. Protopapadakis, A. Doulamis, N. Doulamis, C. Loupos, Deep https://doi.org/10.1111/mice.12477.
convolutional neural networks for efficient vision based tunnel inspection, 2015 [31] F. Yang, L. Zhang, S. Yu, D. Prokhorov, X. Mei, H. Ling, Feature pyramid and
IEEE International Conference on Intelligent Computer Communication and hierarchical boosting network for pavement crack detection, IEEE Trans. Intell.
Processing (ICCP), IEEE, 2015, pp. 335–342, , https://doi.org/10.1109/ICCP.2015. Transp. Syst. (2019), https://doi.org/10.1109/TITS.2019.2910595.
7312681. [32] C. Feng, M.-Y. Liu, C.-C. Kao, T.-Y. Lee, Deep active learning for civil infrastructure
[5] A. Mohan, S. Poobal, Crack detection using image processing: a critical review and defect detection and classification, J. Comput. Civ. Eng. 2017 (2017) 298–306,
analysis, Alexandria Engineering Journal 57 (2) (2018) 787–798, https://doi.org/ https://doi.org/10.1061/9780784480823.036.
10.1016/j.aej.2017.01.020. [33] L. Zhang, F. Yang, Y.D. Zhang, Y.J. Zhu, Road crack detection using deep con-
[6] J.-T. Kim, N. Stubbs, Crack detection in beam-type structures using frequency data, volutional neural network, IEEE International Conference on Image Processing
J. Sound Vib. 259 (1) (2003) 145–160, https://doi.org/10.1006/jsvi.2002.5132. (ICIP), IEEE, 2016, pp. 3708–3712.
[7] S.-T. Quek, Q. Wang, L. Zhang, K.-K. Ang, Sensitivity analysis of crack detection in [34] A. Doulamis, N. Doulamis, E. Protopapadakis, A. Voulodimos, Combined convolu-
beams by wavelet technique, Int. J. Mech. Sci. 43 (12) (2001) 2899–2910, https:// tional neural networks and fuzzy spectral clustering for real time crack detection in
doi.org/10.1016/S0020-7403(01)00064-9. tunnels, 2018 25th IEEE International Conference on Image Processing (ICIP), IEEE,
[8] Q. Shan, R. Dewhurst, Surface-breaking fatigue crack detection using laser ultra- 2018, pp. 4153–4157, , https://doi.org/10.1109/ICIP.2018.8451758.
sound, Appl. Phys. Lett. 62 (21) (1993) 2649–2651, https://doi.org/10.1063/1. [35] E. Protopapadakis, N. Doulamis, Image based approaches for tunnels’ defects re-
109274. cognition via robotic inspectors, International Symposium on Visual Computing,
[9] E. Glushkov, N. Glushkova, A. Ekhlakov, E. Shapar, An analytically based computer Springer, 2015, pp. 706–716, , https://doi.org/10.1007/978-3-319-27857-5_63.
model for surface measurements in ultrasonic crack detection, Wave Motion 43 (6) [36] H. Kim, E. Ahn, M. Shin, S.-H. Sim, Crack and noncrack classification from concrete
(2006) 458–473, https://doi.org/10.1016/j.wavemoti.2006.03.002. surface images using machine learning, Struct. Health Monit. 18 (3) (2019)
[10] G. Owolabi, A. Swamidas, R. Seshadri, Crack detection in beams using changes in 725–738 doi:10.1177%2F1475921718768747.
frequencies and amplitudes of frequency response functions, J. Sound Vib. 265 (1) [37] K. Jang, N. Kim, Y.-K. An, Deep learning–based autonomous concrete crack eva-
(2003) 1–22, https://doi.org/10.1016/S0022-460X(02)01264-6. luation through hybrid image scanning, Struct. Health Monit. 18 (5-6) (2018)
[11] Y. Shi, L. Cui, Z. Qi, F. Meng, Z. Chen, Automatic road crack detection using random 1722–1737.
structured forests, IEEE Trans. Intell. Transp. Syst. 17 (12) (2016) 3434–3445, [38] V. Atalay, E. Gelenbe, N. Yalabik, The random neural network model for texture
https://doi.org/10.1109/TITS.2016.2552248. generation, Int. J. Pattern Recognit. Artif. Intell. 6 (1) (1992) 131–141, https://doi.
[12] M.R. Jahanshahi, J.S. Kelly, S.F. Masri, G.S. Sukhatme, A survey and evaluation of org/10.1142/S0218001492000072.
promising approaches for automatic image-based defect detection of bridge struc- [39] V. Atalay, E. Gelenbe, Parallel algorithm for colour texture generation using the
tures, Struct. Infrastruct. Eng. 5 (6) (2009) 455–486, https://doi.org/10.1080/ random neural network model, Int. J. Pattern Recognit. Artif. Intell. 6 (02n03)
15732470801945930. (1992) 437–446, https://doi.org/10.1142/S0218001492000266.
[13] E. Protopapadakis, C. Stentoumis, N. Doulamis, A. Doulamis, K. Loupos, [40] E. Gelenbe, Learning in the recurrent random neural network, Neural Comput. 5 (1)
K. Makantasis, G. Kopsiaftis, A. Amditis, Autonomous robotic inspection in tunnels, (1993) 154–164, https://doi.org/10.1162/neco.1993.5.1.154.
ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences 3 [41] E. Gelenbe, Y. Feng, K.R.R. Krishnan, Neural network methods for volumetric
(5) (2016), https://doi.org/10.5194/isprs-annals-III-5-167-2016. magnetic resonance imaging of the human brain, Proc. IEEE 84 (10) (1996)
[14] Y. Turkan, J. Hong, S. Laflamme, N. Puri, Adaptive wavelet neural network for 1488–1496, https://doi.org/10.1109/5.537113.
terrestrial laser scanner-based crack detection, Autom. Constr. 94 (2018) 191–202, [42] E. Gelenbe, Y. Yin, Deep learning with dense random neural networks, International
https://doi.org/10.1016/j.autcon.2018.06.017. Conference on Man–Machine Interactions, Springer, 2017, pp. 3–18, , https://doi.
[15] S.K. Sinha, P.W. Fieguth, Automated detection of cracks in buried concrete pipe org/10.1016/j.procs.2018.07.183.
images, Autom. Constr. 15 (1) (2006) 58–72, https://doi.org/10.1016/j.autcon. [43] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object de-
2005.02.006. tection, Proceedings of the IEEE International Conference on Computer Vision,
[16] Y. Fujita, Y. Hamamoto, A robust automatic crack detection method from noisy 2017, pp. 2980–2988, , https://doi.org/10.1109/ICCV.2017.324.
concrete surfaces, Mach. Vis. Appl. 22 (2) (2011) 245–254, https://doi.org/10. [44] X. Yang, H. Li, Y. Yu, X. Luo, T. Huang, X. Yang, Automatic pixel-level crack de-
1007/s00138-009-0244-5. tection and measurement using fully convolutional network, J. Comput. Aided Civ.
[17] L. Abdel-Qader, O. Abudayyeh, M.E. Kelly, Analysis of edge-detection techniques Infrastruct. Eng. 33 (12) (2018) 1090–1109, https://doi.org/10.1111/mice.12412.
for crack identification in bridges, J. Comput. Civ. Eng. 17 (4) (2003) 255–263, [45] Z. Liu, Y. Cao, Y. Wang, W. Wang, Computer vision-based concrete crack detection
https://doi.org/10.1061/(ASCE)0887-3801(2003)17:4(255. using U-net fully convolutional networks, Autom. Constr. 104 (2019) 129–139,
[18] S. Iyer, S.K. Sinha, A robust approach for automatic detection and segmentation of https://doi.org/10.1016/j.autcon.2019.04.005.
cracks in underground pipeline images, Image Vis. Comput. 23 (10) (2005) [46] G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected con-
921–933, https://doi.org/10.1016/j.imavis.2005.05.017. volutional networks, Proceedings of the IEEE Conference on Computer Vision and
[19] Y.-J. Cha, K. You, W. Choi, Vision-based detection of loosened bolts using the Pattern Recognition, 2017, pp. 4700–4708, , https://doi.org/10.1109/CVPR.2017.
Hough transform and support vector machines, Autom. Constr. 71 (2016) 181–188, 243.
https://doi.org/10.1016/j.autcon.2016.06.008. [47] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to
[20] D. Lecompte, J. Vantomme, H. Sol, Crack detection in a concrete beam using two document recognition, Proc. IEEE 86 (11) (1998) 2278–2324, https://doi.org/10.
different camera techniques, Struct. Health Monit. 5 (1) (2006) 59–68, https://doi. 1109/5.726791.
org/10.1177/1475921706057982. [48] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016.
[49] R.C. Gonzalez, P. Wintz, Digital Image Processing, Addison Wesley, 1987.

12
Q. Mei, et al. Automation in Construction 110 (2020) 103018

[50] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, [54] T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, MIT
V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, Proceedings of the Press, 2009.
IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9, , [55] H. Li, D. Song, Y. Liu, B. Li, Automatic pavement crack detection by multi-scale
https://doi.org/10.1109/CVPR.2015.7298594. image fusion, IEEE Trans. Intell. Transp. Syst. (99) (2018) 1–12, https://doi.org/10.
[51] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, 1109/TITS.2018.2856928.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, [56] Y. Fei, K.C. Wang, A. Zhang, C. Chen, J.Q. Li, Y. Liu, G. Yang, B. Li, Pixel-level
2016, pp. 770–778, , https://doi.org/10.1109/CVPR.2016.90. cracking detection on 3D asphalt pavement images through deep-learning-based
[52] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic seg- CrackNet-V, IEEE Trans. Intell. Transp. Syst. (2019), https://doi.org/10.1109/TITS.
mentation, Proceedings of the IEEE Conference on Computer Vision and Pattern 2019.2891167.
Recognition, 2015, pp. 3431–3440, , https://doi.org/10.1109/CVPR.2015. [57] H. Oliveira, P.L. Correia, CrackIT — an image processing toolbox for crack detection
7298965. and characterization, 2014 IEEE International Conference on Image Processing
[53] A. Doulamis, N. Doulamis, P. Maragos, Generalized multiscale connected operators (ICIP), 2014, pp. 798–802, , https://doi.org/10.1109/ICIP.2014.7025160.
with applications to granulometric image analysis, Proceedings 2001 International [58] H. Oliveira, P.L. Correia, Automatic road crack detection and characterization, IEEE
Conference on Image Processing (Cat. No. 01CH37205), 3 IEEE, 2001, pp. 684–687, Trans. Intell. Transp. Syst. 14 (1) (2012) 155–168, https://doi.org/10.1109/TITS.
, https://doi.org/10.1109/ICIP.2001.958211. 2012.2208630.

13

You might also like