Qy - Automatic Pavement Crack Detection Based Onhierarchical Feature Augmentation

Uploaded by

w1360590955

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Qy - Automatic Pavement Crack Detection Based Onhierarchical Feature Augmentation

Uploaded by

w1360590955

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Automatic Pavement Crack Detection Based on Hierarchical

Feature Augmentation
Wenke Cheng Yinghua Zhou
College of Computer Science and Technology, Chongqing College of Computer Science and Technology, Chongqing
University of Posts and Telecommunications, Chongqing, University of Posts and Telecommunications, Chongqing,
China, [email protected], Corresponding China
author [email protected]

ABSTRACT has a serious impact on the health condition of the road. If the road
As one of the most important infrastructures, road is the foundation maintenance is not carried out in time, cracks may cause other
of transportation. The detection of pavement cracks has become serious secondary damages, which will bring enormous challenges
an important task. However, due to the complexity and diversity to traffic safety and maintenance work. Therefore, it is necessary to
of crack morphology, detecting cracks in the road has become a detect cracks as soon as possible and carry out maintenance work
challenging task. Poor detection effect and weak generalizability in time to prevent increasing harms.
are the problems of most current crack detection methods. Inspired Automatic road crack detection aims to use computer vision tech-
by deep learning related technologies, a crack detection network nology to accurately detect the location and area of cracks from real
based on feature augmentation is proposed to extract crack areas pavement images. Pavement crack is an irregular linear target with
in pavement images. The network uses parallel dilated convolu- a certain width. It usually shows as a slender or multiple complex in-
tion branches to capture different scales image information. The terlaced line in the real road environment. Early studies, [1] and [2],
hierarchical feature augmentation module is proposed to combine utilized the difference of gray values in the image to detect cracks.
key information from feature maps of different levels. And some Edge detection method [3] is used to identify crack pixels with
side networks are introduced to perform prediction individually obvious contrast changes in crack images. Some hand-designed
at each level. To demonstrate the performance of the proposed filters, such as [4], [5] added prior knowledge and used the local
crack segmentation network, experiments are carried out on public features of cracks to detect crack areas. However, due to the diverse
datasets. Compared with other advanced crack detection methods, topology structures and widths, and other factors such as water
the proposed method can effectively predict the crack pixels in stain and blot on road surface, detecting cracks in the road has be-
pavement image. The detection accuracy and generalization are come a challenging task. In addition, the gray value of a crack pixel
improved. is usually close to its surrounding background visually, which also
makes crack detection difficult. For these reasons, the performance
CCS CONCEPTS of these methods remain limited. In recent years, convolutional
neural networks have played a very important role in promoting
• Computing methodologies → Modeling and simulation; Model
the development of image recognition. Compared with traditional
development and analysis.
algorithms based on image processing, the CNN-based crack detec-
KEYWORDS tion methods have achieved outstanding results by training CNN
to learn crack features automatically. Some of these methods [6],
Deep learning, crack detection, feature augmentation [7] used image block classification to detect cracks in image block
ACM Reference Format: or patch level, but this kind of methods do not detect cracks in
Wenke Cheng and Yinghua Zhou. 2021. Automatic Pavement Crack Detec- pixel-level. Recently, some researchers have applied semantic seg-
tion Based on Hierarchical Feature Augmentation. In 2021 2nd International mentation technology to crack detection, [8] and [9] used FCN [10]
Conference on Artificial Intelligence and Information Systems (ICAIIS ’21), and Segnet [11] to crack detection in pixel-level with high accuracy.
May 28–30, 2021, Chongqing, China. ACM, New York, NY, USA, 7 pages.
But those methods only use the convolutional filter with a fixed
https://doi.org/10.1145/3469213.3470392
receptive field, and can not robustly detect crack because of the
diversity of crack morphology.
1 INTRODUCTION
In this paper, a segmentation network is proposed to detect crack
Among all kinds of road damages, the most common one is pave- automatically without any pre-processing and post-processing.
ment crack, which is easily produced on the pavement surface and The proposed method utilizes augmented hierarchical features
Permission to make digital or hard copies of all or part of this work for personal or to optimize the performance of crack detection. Firstly, the crack
classroom use is granted without fee provided that copies are not made or distributed hierarchical features are extracted through the feature extraction
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
network. The feature maps extracted at each convolution stage
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, are retained in order to detect more crack information, and a multi-
to post on servers or to redistribute to lists, requires prior specific permission and/or a dilated convolution module (MDC) is used to extract multi-scale
fee. Request permissions from [email protected].
ICAIIS ’21, May 28–30, 2021, Chongqing, China
image information of different context sizes in parallel. Secondly, a
© 2021 Association for Computing Machinery. hierarchical feature augmentation (HFA) module, which aggregates
ACM ISBN 978-1-4503-9020-0/21/05. . . $15.00 context information of high-level features and low-level features,
https://doi.org/10.1145/3469213.3470392
ICAIIS ’21, May 28–30, 2021, Chongqing, China Wenke Cheng and Yinghua Zhou

is proposed to enhance feature representations. Finally, the into smaller patches and used CNN to predict whether a patch is a
augmented hierarchical features are fed into each side network, crack patch. This type of methods is not pixel-level classification
which performs crack prediction at each level individually. All side and does not take into account the cracks as a whole. In [11], FCN
outputs from side networks are concatenated to produce a final is proposed for end-to-end semantic segmentation at pixel-level.
output. Many subsequent studies on crack detection are based on FCN
The rest of the paper is organized as follows: Related works are framework. The study in [9] utilized FCN to classify all pixels in
reviewed in Section II. The detail of the proposed method is intro- the image into crack category and background, and achieved a high
duced in Section III. The experiments and the results are described accuracy. However, this method can not restore the edge details of
and analyzed in Section IV. The summary is presented in Section V. the crack well in the process of upsampling. In [10], an encoder and
decoder structure based on Segnet network [12] is proposed. Skip
2 RELATED WORK connection fusion is also introduced to improve the precision of
With the development of related technologies, various new tech- crack segmentation. However, networks were made deeper, more
niques and algorithms are gradually applied to crack detection. complex and time-consuming by some researches, and there is still
There are about three mainstream crack detection technologies a lack of study on how to utilize existing hierarchical features to
[12]: traditional methods, machine learning-based methods, deep improve detection result.
learning-based methods. This section mainly introduces the repre-
sentative works of these methods. 3 METHODOLOGY
With a pavement image with cracks, one task is to classify each
2.1 Traditional Methods pixel as either crack pixel or background pixel. It is a binary pixel
Some studies, such as [1] and [2], considered that the gray value classification problem. In this paper, a segmentation network for
of crack area is lower than that of non-crack area. So, a grayscale crack pixel classification is designed. The structure of the network
threshold can be used to distinguish crack and non-crack area. is shown in Figure 1, which consists of four major components: (1)
But it is difficult to select appropriate thresholds. In addition, the hierarchical feature extraction network; (2) multi-dilated convolu-
threshold extraction method lacks the description of the global tion (MDC) module; (3) hierarchical feature augmentation (HFA)
information of cracks, and is not suitable for crack detection in modules; (4) side networks.
complex scenes. In [3], the edge detection method can obtain clear
binary crack image according to morphology operation. Morpho-
logical filters are introduced to extract crack features and a mod-
3.1 Hierarchical Feature Extraction Network
ified median filter is used to remove noise in the image. How- When a crack image is given, the hierarchical feature extraction
ever, these methods are susceptible to complex pavement texture network is used to extract crack features at different scales. The
and noise. Hand-designed feature filters like Gabor filter [4] and first 13 convolution layers of VGG16 classification network are used
wavelet transform [5], have good performance in the detection as the feature extraction network. The fifth pooling layer and full
of simple crack, but cannot detect complex and diversified cracks connection layer of VGG16 are discarded for the following reasons:
well. (1) The last pooling layer will reduce the height and width of the
feature maps size to half, and a low-resolution feature map may
2.2 Machine Learning Methods lose key information of the crack image; (2) The calculation of the
full connection layer is time-consuming, and the full connection
Many researchers have applied relevant machine learning algo-
layer is not suitable for pixel-level prediction. In feature extraction
rithms to crack detection tasks, such as [13] and [14]. A crack
network, each convolution layer is composed of convolution unit,
image is divided into many sub-images and the manual designed
BN layer and ReLU function.
features are extracted based on the pavement texture, then ma-
In the feature extraction network, the number of 3×3 convolutional
chine learning classification algorithms are used to predict whether
filters from Conv1 to Conv5 are set to 64, 128, 256, 512 and 512,
there are cracks in these sub-images. In [15], many nonlinear fil-
respectively. After four Max-pooling with 2×2 pixels, five hierar-
ters are used to capture the pavement textural information. In
chical feature maps with different resolutions are obtained. We
order to obtain the crack distribution area, the method used su-
use {F1 , F2 , F3 , F4 , F5 } to denote hierarchical features generated by
pervised learning based on AdaBoost to predict whether there are
feature extraction network. Those feature maps can be considered
cracks in the road images. The work in [16] introduced random
as a five-level feature pyramid arranged according to the size of the
structured forest to mitigate the influence of noise on detection
resolution.
results, which can overcome the interference of road noise to a cer-
tain extent. However, the manual designed features need to be re-
designed on other pavement backgrounds, this kind of methods can- 3.2 Multi-dilated Convolution
not effectively extract the feature of cracks in a complex pavement Each convolution layer of feature extraction network only uses
environment. a convolution kernel with a fixed receptive field to extract the
crack context information, which can only represent crack infor-
2.3 Deep Learning Methods mation with one scale. In complex road environments, the shape
The work in [6] is the first study to apply deep learning to road crack of cracks will change along the crack curve. Crack is a linear tar-
classification. The studies, [7] and [8] divided the pavement images get with complex topology, and the abstract context information
Automatic Pavement Crack Detection Based on Hierarchical Feature Augmentation ICAIIS ’21, May 28–30, 2021, Chongqing, China

Figure 1: The whole network architecture, which consists of four parts: feature extraction network (Conv 1 ∼ Conv 5), MDC
module, HFA module and Side networks (Side output 1 ∼ Side output 5). (F1 ∼ F5 ) and (N1 ∼ N5 ) represent the output feature
maps of feature extraction network and HFA module, respectively. M5 is obtained from the MDC module. H and W indicate
the size of input image, and {64, 128, 256, 512, 512} indicate the channel number of feature maps.

are named as multi-scale features, which contains rich contextual

information that can detect with various topologies.

3.3 Hierarchical Feature Augmentation

In pavement dataset, crack pixels usually account for less of all
pixels, and crack information is easily lost in the process of down-
sampling. A CNN learns the features of images in a hierarchical
way. The feature maps at low-level, undergone less downsampling,
preserve more original image information but more useless informa-
tion. High-level features, undergone more downsampling, contain
Figure 2: The MDC module, which concatenates the Conv5 less detail information but more semantic information, which is ben-
features, F5, the result of global pooling and the results of eficial for the classification task. So, inspired by [19], we proposed
the four dilated convolutions. a hierarchical feature augmentation module, which is composed
of a feature pyramid network and a bottom-up path augmentation
module.
required for crack detection is different. Inspired by [17], we in- As shown in Figure 3, the multi-scale feature map M5 obtained from
troduce the atrous spatial pyramid pooling into detection model, the multi-dilated convolution module was upsampled twice and
and utilize multi-scale strategy to extract hierarchical features. The concatenated with the feature map of the Conv4, F4 , to obtain the
multi-dilated convolution module can extract features of different feature map M4 . A 1×1 convolutional layer of 512 filters was used to
scales through parallel dilated convolution [18] branches, which ensure that the channel number of M4 is consistent of that of F4 . M4
can expand the receptive field size of the convolution without using was upsampled twice and concatenated with F3 . These operations
filters of more parameters. The dilated rate r is a key hyperparame- are performed progressively until the bottom level is reached. Note
ter in dilated convolution, the receptive field of dilated convolution that the number of 1×1 convolutional filter from M3 to M1 are set to
can be changed by changing the value of dilated rate r. 256, 128 and 64 respectively. The detail of the concatenate operation
As shown in Figure 2, we use four dilated convolution branches can be seen in Figure 4(a). We use {M1 , M2 , M3 , M4 , M5 } to denote
with dilated rates {2, 4, 6, 8} to obtain feature maps with different new hierarchical features generated by feature pyramid network,
contexts. The global pooling branch is used to obtain the average and {N1 , N2 , N3 , N4 , N5 } to denote the augmented feature maps
of the global crack features. The original feature map of Conv5 is corresponding to {M1 , M2 , M3 , M4 , M5 }. The hierarchical feature
also added to the module. Subsequently, the above mentioned six augmented path starts from N1 to N5 . Feature map N1 is simply
feature maps are concatenated to form a fused feature maps, and a M1 without any processing. Feature map Ni concatenates Mi+1
1×1 convolutional layer of 512 filters is used to reduce the channel through lateral connection to generate the new feature map Ni+1 .
number of concatenated feature maps from 512×6 to 512, which The connecting process is similar to the above, except that feature
also reduce the subsequent computation. The output feature maps map Ni was downsampled twice by using a 3×3 convolutional layer
ICAIIS ’21, May 28–30, 2021, Chongqing, China Wenke Cheng and Yinghua Zhou

Figure 5: The illustration of side networks

When the whole network is trained, the side network of each level
performs crack prediction individually, so that a side loss can be
Figure 3: The proposed HFA module, which consists of a fea- calculated.
ture pyramid network and a bottom-up path augmentation
module. 3.5 Training of the Whole Network
When the whole network is trained, the loss function consists of
side loss and fuse loss [20]. For crack segmentation, we define the
training set by S = {(Xn , Yn ), n=1, . . ., N }, where Xn and Yn denote the
raw input image and the corresponding ground truth respectively.
Assume the collection of all standard network layer, with parameter
set W, has M levels. The side network of each level, with parameter
set w (m) , m=1...M, works as a pixel-wise crack image classifier. The
total side loss is defined as:
ÕM
(m)
Lside (X, Y, W, w) = α m lside X, Y, W, w(m) (1)
m=1
where αm is a hyperparameter denoted as the weight for each level-
wise side loss, is usually set to 1/M. During the training, the loss
function is computed over all pixels in a training image X = (xi ,
Figure 4: (a) The feature concatenating operation in feature
i=1, . . ., |X |) and ground truth Y =(yi , i=1, . . ., |X |), yi ∈{0,1}. lside
pyramid network, (b) The feature concatenating operation
representing the level-wise side loss is defined as:
in bottom-up path augmentation module.
(m) (m)
lside X, Y, W,w(m) = − i∈Y+ w0 logPr Pi = 1|X; W, w(m)
Í

(m)
− i∈Y− w1 logPr Pi = 0|X; W, w(m)
Í

with stride 2. The lateral connection is illustrated in Figure 4(b). In (2)

this way, the proposed network aggregates context information of where |Y+ |, |Y− | and |Y| denote the number of positive pixels, neg-
high-level features and low-level features. The context information ative pixels and all pixels of an input image, respectively. Pi (m) ,
is utilized by the side network at each level. which equals 0 or 1, represents to the predicted result, crack or not,
for i-th pixel at the m-th level. Since the pixels of the image are of
3.4 Side Networks different class, crack or not, the network should to weight the loss
differently. We set w0 =1 and w1 = |Y− |/|Y+ | represent the class bal-
The main task of the side network is to evaluate the extracted fea- ance weights for background pixels and crack pixels, respectively.
tures quality of side layer. By adding a loss function to each side Pr(·) refers to the probability of a pixel in the prediction map to be
layer, network optimizes the hidden layer features through side crack or not, as each side output layer generates an independent
supervision learning. The side network at each level processes the prediction map. All side feature maps are concatenated to form the
feature map, Ni, to get the side output feature map. Each side out- final fused feature map, which is then processed to output the final
put feature map can be used to generate a prediction map. Figure prediction map, P, as illustrated in Figure 5. The fused loss, Lfuse is
5 shows the detail of side networks. Except side output 1, all the defined as:
feature maps were upsampled to the same size as the input im-
Lfuse (X, Y, W) = − i∈Y+ w0 logPr (Pi = 1|X; W)
Í
age. All feature maps of hidden layers and the output of first side (3)
− i∈Y− w1 logPr (Pi = 0|X; W)
Í
output layer are concatenated to form final fused features, which
are processed by a 1×1 convolutional layer and a sigmoid function.
where w0 and w1 have the same meanings as those in formula (2).
According to the sigmoid function, a fixed threshold can be used to
The overall loss function for the network training is defined as:
get a predicted label for each pixel. This strategy is proved to be
effective for edge detection in HED [20]. L = Lside (X, Y, W, w) + Lfuse (X, Y, W) (4)
Automatic Pavement Crack Detection Based on Hierarchical Feature Augmentation ICAIIS ’21, May 28–30, 2021, Chongqing, China

4 EXPERIMENTS methods are as follows: HED [20], Segnet [11], RCF [24] and Deep-
Firstly, the implementation details of the proposed method are crack [21]. For the four compared methods, we used the author’s
described. Secondly, the datasets used, the other methods to be parameters and train those methods on three datasets respectively.
compared with ours, and the evaluation criteria are introduced. As shown in Table 1 ∼ Table 3, the experimental results of each
Finally, the experimental results are analyzed. method on each dataset are presented. The F1-score has about 2.1%
improvement compared with the Deepcrack method (the second
4.1 Implementation Details best) and mIoU has about 1.8% improvement on the Deepcrack test
dataset. On the CFD dataset, the proposed method has an improve-
The proposed method is implemented on Pytorch and an open
ment of about 2.6% and 2.3% in F1-score and mIoU, respectively,
implementation of Deepcrack [21]. The SGD method is used to
compared to the Deepcrack method. The proposed method on the
optimize the proposed network. The network hyperparameters: the
Crack500 dataset improves F1-score and mIoU by 2.5% and 2.2%
training iterations are set to 2e5, the batch size is 1, and the initial
compared to the second best. We note that the best F1-score and
learning rate is set to 0.0001, we adjust learning rate through StepLR
mIoU on three datasets are obtained by the proposed method. The
method (reduce learning rate to 1/10 after 5e4), the momentum is
experiment shows that our crack detection network has a better
0.9, the weight decay is 2e-4. The hyperparameters discussed above
detection effect on three public datasets. It also shows that our
are used in subsequent experiments. The experimental environment
method is more capable of recognizing complex cracks in real road
is configured with a GTX1080 of 11G memory.
environment. The introduction of multi-dilated convolution can
effectively increase the ability of the network to detect cracks in
4.2 Dataset different topologies and widths, and the introduction of hierarchical
Three public datasets were used for the study. The first dataset is feature augmentation can aggregate useful information of high-
Deepcrack [21], which contains 537 images of crack images with level features and low-level features, thus the proposed method can
various background. 300 images were divided to training set and get better experimental results.
237 images were divided to test set. The second dataset is CFD [22] The dilated rate in MDC module is an important hyperparame-
that consists of 118 images of 320×480 pixels with pavement cracks. ter, different settings of dilated rate can change the context in-
The images contain noisy background such as blot, water spots formation obtained by the MDC module. Experiments on Deep-
and shadows. 82 images were randomly divided for training and crack dataset were conducted to explore the appropriate setting
other 36 images were used for test. The third dataset is CRACK500 in MDC module. The dilated rate settings {1, 2, 3, 4}, {2, 4, 6, 8}
[23] that contains 3,368 pavement crack images of 640×360 pixels. and {2, 4, 8, 16} are tested, the experimental results can be seen
CRACK500 is the largest public dataset with binary label. 2,500 in Table 4. On Deepcrack dataset, the best F1-score and mIoU are
images were used for training and 868 images were used for test. achieved when the dilated rates are the {2, 4, 6, 8} group. This is
Due to lack of training data, data augmentation has been used in because cracks are usually more elongated, a dilated convolution
Deepcrack and CFD dataset in the process of training. with smaller dilated rate can’t extract enough contextual infor-
mation, and a dilated convolution with larger dilated rate will ex-
4.3 Evaluation Criteria tract some useless background information, which will affect the
To evaluate the performance of the proposed network, the values accuracy.
of Precision, Recall, F1-score and Mean intersection over union
(mIoU) are introduced. Those metrics are computed as:
5 CONCLUSION
TP In this paper, an end-to-end crack segmentation network is pro-
Precision = (5)
TP + FP posed for crack detection. The multi-dilated convolution module is
TP introduced to extract multi-context information of different scale.
Recall = (6) Moreover, a hierarchical feature augmentation module is proposed
TP + FP
to aggregate the key information from the feature maps of differ-
2 × Precision × Recall ent levels. The augmented features learned from HFA module can
F1 − score = (7)
Precision + Recall effectively combine the features of high-level and low-level. Each
side network performs crack prediction individually at each level.
1 ÕK © pii All side outputs are combined to form the final fused output. We
mIoU = (8)
ª
K + 1 i=0 Ík p + Ík p − p
®
ij ji ii evaluate the proposed method on three public datasets and achieve
« j=0 j=0
the best F1-score and mIoU. The detection accuracy is improved
¬
mIoU is used to calculate the ratio of the intersection and union of compared with the other methods.
ground truth and predicted results, we have K+1 class (K=1 in our
task), pii is the number of true positives, pij is the number of false
positives, pji is the number of false negatives. ACKNOWLEDGMENTS
This study was supported by Chongqing Key Laboratory of Big
4.4 Results and Analysis Data Intelligent Computing at Chongqing University of Posts and
In order to verify the effectiveness of the proposed method, exper- Telecommunications. The authors would like to thank all those
iment was performed on three public datasets. The comparison who offered their help to improve this paper.
ICAIIS ’21, May 28–30, 2021, Chongqing, China Wenke Cheng and Yinghua Zhou

Table 1: Detection Results on Deepcrack Dataset.

Methods Precision Recall F1-score mIoU

HED[20] 0.6397 0.7449 0.6883 0.7373
RCF[24] 0.6596 0.7887 0.7184 0.7507
Segnet[11] 0.7545 0.7811 0.7676 0.7908
Deepcrack[21] 0.8233 0.8466 0.8348 0.8493
Ours 0.8497 0.8651 0.8559 0.8675

Table 2: Detection Results on CFD Dataset.

Methods Precision Recall F1-score mIoU

HED[20] 0.5040 0.7141 0.5910 0.6729
RCF[24] 0.5331 0.7476 0.6224 0.6942
Segnet[11] 0.5559 0.7773 0.6482 0.7089
Deepcrack[21] 0.6761 0.7202 0.6975 0.7345
Ours 0.7037 0.7442 0.7234 0.7578

Table 3: Detection Results on Crack500 Dataset.

Methods Precision Recall F1-score mIoU

HED[20] 0.6259 0.6672 0.6459 0.7034
RCF[24] 0.6130 0.7589 0.6782 0.7213
Segnet[11] 0.6615 0.7573 0.7061 0.7416
Deepcrack[21] 0.7658 0.7892 0.7773 0.8018
Ours 0.7753 0.8330 0.8031 0.8242

Table 4: Detection Results for Different Dilated Rates setting [7] L. Pauly, D. Hogg, R. Fuentes, H. Peel, “Deeper networks for pavement crack
on Deepcrack Dataset. detection”, Proceedings of the 34th ISARC (2017), pp. 479-485.
[8] H. Huang, Q. Li, D. Zhang, “Deep learning based image recognition for crack
and leakage defects of metro shield tunnel”, Tunnelling and Underground Space
Dilated Precision Recall F1-score mIoU Technology, vol. 77, pp. 166-176.
[9] Q. Zou, Z. Zhang, Q. Li, X. Qi, Q. Wang, “Deepcrack: Learning hierarchical con-
rates volutional features for crack detection”, IEEE Transactions on Image Processing,
vol. 28, pp. 1498-1512.
{1, 2, 3, 4} 0.8481 0.8509 0.8495 0.8602 [10] J. Long, E. Shelhamer, T. Darrell, “Fully convolutional networks for semantic
{2, 4, 6, 8} 0.8497 0.8651 0.8559 0.8675 segmentation”, Proceedings of the IEEE conference on computer vision and
{2, 4, 8, 16} 0.8503 0.8575 0.8539 0.8647 pattern recognition (2015), pp. 3431-3440.
[11] V. Badrinarayanan, A. Kendall, R. Cipolla, “Segnet: A deep convolutional encoder-
decoder architecture for image segmentation”, IEEE transactions on pattern
analysis and machine intelligence, vol. 39, pp. 2481-2495.
REFERENCES [12] W. Liu, Y. Huang, Y. Li, Q. Chen, “FPCNet: Fast pavement crack detection network
[1] H. Oliveira, P.L. Correia, “Automatic road crack segmentation using entropy and based on encoder-decoder architecture”, arXiv preprint arXiv: 1907.02248. (2019).
image dynamic thresholding”, In Proceedings of 17th European Signal Processing [13] H.N. Nguyen, T.Y. Kam, P.Y. Cheng, “An automatic approach for accurate edge
Conference (2009), pp. 622-626. detection of concrete crack utilizing 2D geometric features of crack”, Journal of
[2] J. Tang, Y. Gu, “Automatic crack detection and segmentation using a hybrid Signal Processing Systems, vol. 77, pp. 221-240.
algorithm for road distress analysis”, In Proceedings of the IEEE International [14] H. Oliveira, P.L. Correia, “Automatic road crack detection and characterization”,
Conference on Systems, Man, and Cybernetics (2013), pp. 3026-3030. IEEE Transactions on Intelligent Transportation Systems, vol. 14, pp. 155-168.
[3] Y. Maode, B. Shaobo, X. Kun, H. Yuyao, “Pavement crack detection and analysis [15] A. Cord, S. Chambon, “Automatic road defect detection by textural pattern recog-
for high-grade highway”, In Proceedings of 8th International Conference on nition based on AdaBoost”, Computer-Aided Civil and Infrastructure Engineering,
Electronic Measurement and Instruments (2007), pp. 4-548-4-552. vol. 27, pp. 244-259.
[4] R. Medina, J. Llamas, E. Zalama, J. Gómez-García-Bermejo, “Enhanced automatic [16] Y. Shi, L. Cui, Z. Qi, F. Meng, Z. Chen, “Automatic road crack detection using
detection of road surface cracks by combining 2D/3D image processing tech- random structured forests”, IEEE Transactions on Intelligent Transportation
niques”, IEEE International Conference on Image Processing (2014), pp. 778-782. Systems, vol. 17, pp. 3434-3445.
[5] P. Subirats, J. Dumoulin, V. Legeay, D. Barba, “Automation of pavement surface [17] L.C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, “Encoder-decoder with
crack detection using the continuous wavelet transform”, International Confer- atrous separable convolution for semantic image segmentation”, Proceedings of
ence on Image Processing (2006), pp. 3037-3040. the European conference on computer vision (2018), pp. 801-818.
[6] L. Zhang, F. Yang, Y.D. Zhang, Y.J. Zhu, “Road crack detection using deep con- [18] F. Yu, V. Koltun, “Multi-scale context aggregation by dilated convolutions”, arXiv
volutional neural network”, IEEE international conference on image processing preprint arXiv:1511.07122. (2015).
(2016), pp. 3708-3712.
Automatic Pavement Crack Detection Based on Hierarchical Feature Augmentation ICAIIS ’21, May 28–30, 2021, Chongqing, China

[19] S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, “Path aggregation network for instance seg- Systems, vol. 17, pp. 3434-3445.
mentation”, Proceedings of the IEEE conference on computer vision and pattern [23] F. Yang, L. Zhang, S. Yu, et al, “Feature pyramid and hierarchical boosting network
recognition (2018), pp. 8759-8768. for pavement crack detection”, IEEE Transactions on Intelligent Transportation
[20] S. Xie, Z. Tu, “Holistically-nested edge detection”, Proceedings of the IEEE inter- Systems, vol. 21, pp. 1525-153.
national conference on computer vision (2015), pp. 1395-1403. [24] Y. Liu, M.M. Cheng, X. Hu, K. Wang, X. Bai, “Richer convolutional features for
[21] Y. Li, J. Yao, X. Lu, R. Xie, L. Li, “DeepCrack: A deep hierarchical feature learning edge detection”, Proceedings of the IEEE conference on computer vision and
architecture for crack segmentation”, Neurocomputing, vol. 338, pp. 139-153. pattern recognition (2017), pp. 3000-3009.
[22] Y. Shi, L. Cui, Z. Qi, F. Meng, Z. Chen, “Automatic road crack detection using
random structured forests”, IEEE Transactions on Intelligent Transportation