An Image-Based System For Pavement Crack Evaluation Using Transfer Learning and Wavelet Transform
An Image-Based System For Pavement Crack Evaluation Using Transfer Learning and Wavelet Transform
International Journal of
Pavement Research and Technology
Journal homepage: www.springer.com/42947
Received 10 April 2020; received in revised form 13 September 2020; accepted 15 September 2020
Abstract
Automatic systems for pavement inspection can significantly enhance the performance of the Pavement Management Systems (PMSs). Cracking is the
most current distress in any type of pavement. Progress of various technologies leads to a lot of effort in developing an automatic system for pavement
cracking inspection. In the early image-based systems, the feature extraction process for crack classification must be done by using various image processing
techniques in an expert-based system. In recent years, the new machine learning techniques such as a deep convolutional neural network (DCNN) provide
more efficient models with the ability of automatic feature extracting, but these models need a lot of labeled data for training. Transfer learning is a technique
that solves this problem using pre-trained models. In this research, several pre-trained models (AlexNet, GoogleNet, SqueezNet, ResNet-18, ResNet-50,
ResNet-101, DenseNet-201, and Inception-v3) have been used to retrain based on pavement images using transfer learning. This study aims to evaluate the
efficiency of retrained DCNNs in the detection and classification of the pavement cracking. Also, it presents a more effective algorithm based on a developed
wavelet transform module with more regulizer parameters for crack segmentation. The result indicated that retrained classifie r models provide reliable
outputs with a range of 0.94 to 0.99 in confusion matrix-based performance, but the speed of some models is significantly higher than others. Also, the
results clarified that the developed wavelet module could segment crack pixels with a high level of clarity.
Keywords: Pavement inspection; Classification; Image processing; Transfer learning; Wavelet transform
1. Introduction inspection procedure. Generally, pavement distress data can be collected and
assessed in three different ways that include: visual inspection, semi-
automatic, and automatic systems (Fig. 1) [5].
Roads are one of the most important components of
transportation infrastructure, providing the possibility of moving
people and goods and having a direct impact on people's daily lives.
Transportation agencies spend a lot of time and money for the
development and maintenance of road infrastructure. Road
maintenance and management increase the service life and driving
experience, and enhance the road safety [1,2].
The Pavement Management System (PMS) comprises several
primary phases (see Fig. 1). It plays a very important role in the
road-infrastructures management system and has a direct influence
on the quality and safety of roads. An efficient pavement
management system leads to optimum time work planning for
pavement maintenance, by a proper maintenance method, and with
optimized cost. These aims become possible when pavement
inspection is performed correctly [3,4]. Pavement distresses
evaluation is one of the most important parts in the pavement
* Corresponding author
E-mail addresses: [email protected] (S. Ranjbara);
[email protected] (F. M. Nejad,); [email protected] (H. Zakeri).
Peer review under responsibility of Chinese Society of Pavement
Engineering.
Fig. 1. Pavement inspection step in PMS.
Due to the defects of visual inspection such as high labor costs, datasets by transfer learning technique. Furthermore, we use image
time-consuming, unreliable results, and unsafe working conditions processing techniques to segment crack pixels in the pavement
for laborers [5-7], most transportation departments are trying to image. In this image processing model, the wavelet transform is
create an automatic data collection and evaluation system. the main part of this process and other techniques such as
Accordingly, a lot of researches have been conducted to apply histogram equalization (HE), image smoothing, thresholding and
different technologies (such as image processing [5,8], laser morphological operations are used.
system [9,10], ground penetration radar [11,12], fiber optic sensor
[13], accelerometer sensor [14,15], ultrasonic sensors [16,17], 2. Related work
hybrid system [18-20], and etc.) to make an automatic system for
the evaluation of pavement distresses. Generally, an early image-based pavement crack evaluation
As can be seen in Fig. 2, pavement distresses can be divided into system consists of three main parts: 1) segmentation, 2) feature
cracking and non-cracking distress. Cracking is the most prevalent extraction from segmented regions, and 3) detection and
distress on the pavement, and cracks have a significant impact on classification based on extracted features.
reducing the design-life [6,21,22]. The cracks can be divided into Segmentation is a method to extract the region of interest from
two general categories named surface cracks and linear cracks, the image. This process is vital and important for creating an
based on their patterns [3]. efficient crack detection and classification system. There are
In recent years, a huge volume of data (such as pavement images) various techniques for implementing image segmentation.
is collected by transportation agencies. On the other hand, Threshold-based [25,26], Matching-based [27], and region-based
Significant growth of computing and processing power lead to techniques [28] respectively are more prevalent in classical
developing more efficient Machine Learning (ML) approaches. systems [5].
Therefore, much attention has been paid to use the new ML In the feature extraction process, significant features were
techniques for more intelligence and efficient systems to pavement extracted from image or segmented regions. These extracted
condition assessment. In the majority of ML techniques, the features can be used for creating detection and classification
features are manually extracted from data. Deep learning (DL) is models. The main problem in this section is to determine which
one of the ML technique that extract features automatically features are more important to create a more efficient model of
[23,24]. classification.
DCNN is one of the DL-based models that have widely been Spatial features [29], transform features [30], edge and
used for object detection in various applications such as pavement boundaries features [31], shape representation features [26], and
distress detection. Because of the high complexity of these models, textures features [8,32] are some of the features that can be used
they generally require large amounts of labeled data for the for feature extraction step in classic crack evaluation systems.
training process. In recent years, the transfer learning (TL) Between the mentioned feature, edge and boundaries, spatial and
technique had been presented to solve this problem. transform features respectively are more prevalent [5].
According to the literature, image processing is a robust tool in The last part of the classic crack assessment system is the detection
pavement distress evaluation. Because most of the pavement and classification. Several types of ML techniques, such as
distresses appear visually such as crack, bleeding, pothole and supervised and unsupervised, can be used to create models for
patch. Accordingly, the use of DCNN and image processing can automatic crack detection and classification. In this part,
help transportation agencies to develop a more efficient pavement supervised learning algorithms (such as neural networks [33,34],
inspection system. support vector machine [30,35], etc.) are more prevalent than the
In this research, transfer learning-based models were developed others.
to detect cracking and classify cracks in two classes called surface In recent years, new systems based on DL widely considered,
cracking and linear cracking. Also, an image processing-based and DCNN is used widely for image-based pavement distress
approach was designed for crack segmentation by using a detection and classification. In these systems, feature extraction
developed wavelet transform module. We use a pre-trained deep from the image and creation of the classifier model have been done
convolutional neural networks (PDCNN) to retrain based on our
automatically by DCNN [36-40]. Generally, they don’t need to do As can be seen in Fig. 4, a new feature map is created by sliding
segmentation before the feature extraction process. In new systems, a local receptive field over the input. Unlike the traditional neural
the segmentation is used after detection and classification for network, each neuron in the layers is not connected to all the nodes
determining the important geometric parameter of distress such as (neurons) in the previous layer, but is only connected to the nodes
length, width, area, etc. in the local receptive field. Also, In a feature map, each unit
As an important problem, DCNN-based systems need a huge generated with same local receptive field (weight), and this
dataset for the training process due to the high complexity of them. property called weight sharing [41,43,45,49].
While in the majority of cases, data collection and labeling are very Polling layers are commonly used after convolutional layers.
time consuming and costly. TL technique solves this problem and These layers were generated by sliding a local receptive field on
makes it possible to create DCNN based on smaller datasets feature maps in previous layer. This operation leads to reduce the
[7,41,42]. dimentionality of feature maps and to simplify the network. These
According to this ability, TL-based systems have received more layers are necessary to reduce the computational time and
attention to creating an efficient image-based system for pavement overfitting issues in CNN. Pooling operation can be performed in
crack detection and classification. On the other hand, choosing an various types such as geometric average, harmonic average,
appropriate pre-trained model is very important to create a more maximum pooling, etc. [41,43,45,49,50] Max-pooling is one of
efficient system. Accordingly, it is necessary to evaluate the the most prevalent pooling processes presented in Fig. 5.
performance of the various pre-trained model. Fully connected layers are placed after a sequence of
convolution and pooling layers that can be one or more layers. This
3. Methodology part of CNN is used to create a classifier based on the extracted
feature from the data in the pervious stage. [41,43,45,49].
The procedure of this research consists of two main stages that An example of the DCNN structure based on the objective of this
include 1) Crack detection and classification, 2) Crack research (classify and detect pavement cracks) has been shown in
segmentation. As can be seen in Fig. 3, the input image is firstly Fig. 6.
analyzed by DCNN model to detect and classify the cracks. This There are two methods for applying DCNN models that include:
model has been created by retraining the PDCNN using TL. In the training from scratch and performing transfer learning using pre-
next stage, the map of cracks is extracted by the use of wavelet trained models. If the first method (training from scratch) was
transforms and other image processing techniques such as HE, applied for training a DCNN model, it would be necessary to use
image smoothing, thresholding, and morphological operations.
massive amounts of data, which are a time-consuming and costly a wide application for image compression [62], denoising [63],
procedure in a real application. Also, the designing of the DCNN and feature extraction [21]. Accordingly, extensive research has
structure to achieve proper results is a big challenge because there been performed for damage and crack detection [64-67] and
are many hyperparameters that affect the efficiency of DCNNs. texture analysis [8, 68-70] based on wavelet transform in civil
Some of these hyperparameters includes The depth of DCNN engineering applications.
(which includes the number of convolutional, pooling, and fully- In the wavelet transform process, an images is decomposed into
connected layers), the number of filters, stride (step-size that the a range of frequency, and extract two sets of frequency
local receptive field must be moved), pooling locations and sizes, with a high-pass and low-pass filter. In this process, an image can
and the number of nodes in fully-connected layers [43,51]. be decomposed in the space Vk, 𝑊𝑘𝐻 , 𝑊𝑘𝑉 , and 𝑊𝑘𝐷 as follows
In the other method (TL), one of the PDCNN models is being [30,61,68]:
used to retrain based on new dataset. Actually in TL process, the
ability of pre-trained model in feature extracting and learning the f k 1 ( x, y ) Vk WkH WkV WkD
predictive rules is used to create efficient classifier model on the 2 k m ,n LLk m, n 2 k x m 2 k y n
new dataset [7,41,52]. The details of applied transfer learning in
this research have been presented in Fig. 7. m ,n HLk m, n 2 k x m 2 k y n (1)
In recent years, several PDCNN is generated for various m ,n LH k m, n 2 k x m 2 k y n
applications. The information on the more prevalent PDCNN is
presented in Table 1. As shown in Table 1, the PDCNNs have m ,n HH k m, n 2 k x m 2 k y n
different depth, number of layers and parameters, and input sizes,
which have resulted from the different architecture of networks where, k is the decomposition level, m and n are image dimensions,
and hyper-parameters. It should be noted that the more depth and φ(.) and ψ(.) are orthonormal bases, and LLk, HLk, LHk, HHk, are
parameters lead to increase the complexity of the model, and this the orthogonal projections in the orthogonal spaces that are
issue increases the overfitting probability when the trainset was not calculated as follows:
very big. Accordingly, SqueezNet and GoogleNet have lower
complexity than the others. In this research, we use the mentioned LLk i, j l m 2i l n 2 j LLk 1 m, n ,
pre-trained model for detection and classification of pavement m,n
cracks, and compare their performance to determine the more HLk i, j h m 2i l n 2 j LLk 1 m, n , (2)
m,n
effective model.
LH k i, j l m 2i h n 2 j LLk 1 m, n ,
m,n
3.2. Wavelet transform HH k i, j h m 2i h n 2 j LLk 1 m, n .
m,n
Table 1
PCNNs characteristics.
PDCNN Input size Depth Number of parameters Size of the Source Target Domain
(number of layers) (Millions) model (MB) domain (number of classes)
AlexNet [53,54] (227×227×3) 8 61 227 ImageNet 1000
SqueezNet [55] (227×227×3) 18 1.24 4.6
GoogleNet [56,57] (224×224×3) 22 7 27
ResNet-18 [58] (224×224×3) 18 11.7 44
ResNet-50 [58] (224×224×3) 50 25.6 96
ResNet-101 [58] (224×224×3) 101 44.6 167
Inception-V3 [59] (299×299×3) 48 23.9 89
DenseNet-201 [60] (224×224×3) 201 20 77
S. Ranjbara et al. / International Journal of Pavement Research and Technology xx (2020) xxx-xxx 5
4. Experimental study
4.1.1. Dataset
In this experiment, 1500 pavement images with a size of
900*1000 pixels have been considered for the training process, 80
percent of these images (1200 images) were used for the training ,
and 20 percent (300 images) for the validation. Also, 750
pavement images (quite different from training and validation sets)
were selected for testing. Thus, 66.6 percent of data (1500 images) Fig. 9. (a) The original images and (b) output images after
were assigned to training process and 33.3 percent of data (750 histogram equalization.
images) were assigned for testing. The detailed information about
datasets has been presented in Table 2.
All of the images were cropped from the main images with a size
of 3850*2764. As can be seen in Fig. 8, we crop crack regions to
collect proper images for train, test, and validation data.
Also, an image enhancement process is being done on data to
provide a more comprehensible image for display and makes better
input for the training process. In this step, histogram equalization
was performed for enhancing image contrast. As can be seen in Fig.
Fig. 10. Samples of the prepared dataset.
9, this process usually increases the local contrast of images and
provides a better image for further image analysis.
In this research, block and fatigue cracking images were put in The performance of the trained models significantly depends on
the surface cracking class, and cracks with a linear shape such as the quality of training datasets. For this reason, the majority of
transverse and longitude cracks were put in the linear cracking experimental works were focused on collecting and preparing the
class. Since that we need to distinguish cracking occurrence, we most appropriate images for the training process. Also, an attempt
collect images from without cracking parts of the pavement and has been made to create datasets with a wide variety to prevent
put in the non-cracking class. Based on the data in this class, the overfitting problems.
model can determine that an image related to a cracking image or 4.1.2. Retraining of pre-trained models
not. Some of the prepared images have been presented in Fig. 10.
Before starting the training process by transfer learning, it is
Table 2 necessary to determine several predefined parameters that
Number of images in prepared datasets. included: maximum epoch, batch size, learning rate, and
momentum. These parameters have a great influence on training
Class Train Validation Test
results and training time.
Surface cracking 400 100 250
The number of epochs is defined as the number of times that the
Linear cracking 400 100 250 training algorithm covers all of the training data. If this number
Non-cracking 400 100 250 was considered too large, it could lead to an overfitted model and
Total 1200 300 750 high training time. On the other hand, few numbers of epochs make
the training process incomplete and lead to underfitted models.
In this experiment, according to the volume of datasets and
processing power of the system, 15 epochs have been considered
for the training process. It should be noted that the evaluation of
the training progress for different PDCNNs shows that the training
process was usually achieved a stable state and converged after 6
to 10 epochs (Fig. 11). Accordingly, 15 can be an appropriate
number of epochs to make well-fitted models.
Another important parameter is the batch size that is defined as
the number of images (data) applied for each iteration of the
learning process. The value of batch size can be selected from 1 to
the data set size. The high value of batch size needs high
processing power. According to the processing power of the used
system, the value of 15 has been considered for batch size. This
means that 80 iterations of batches should be done to perform one
Fig. 8. A sample of captured image and cropped the cracking epoch in the learning algorithm because 1200 images were used as
regions. training data (Table.2).
6 S. Ranjbara et al. / International Journal of Pavement Research and Technology xx (2020) xxx-xxx
In the first step, the histogram equalization was done, and then
image smoothing was performed by utilizing a Gaussian filter. In
the image smoothing process, the intensity of smoothing is related
to the value of the standard deviation of the distribution ( ) in
Gaussian function in both directions (X and Y). If the value of
was very high, the smoothing procedure led to a decrease in the
image’s detail, and the edges in the image were not preserved. Also,
if it was very low, the reduction of noise as the aim of smoothing
was insignificant. After evaluating the results of different values,
2 is the most appropriate value for both directions to smooth the
images in this experiment.
In the next step, the wavelet transform has been used to enhance
defects in images. In this research, a new wavelet module with
three regularizer parameter has been presented as follow:
1
Fig. 11. A sample of training progress. M s,k p, q HLSk p, q LHSk p, q HH Sk p, q N R
(3)
The number of iterations depends on the batch size and the
number of epochs, and very large iterations lead to an overfitting where, HLk(p,q), LHk(p,q), HHk (p,q) are high-frequency sub-
model. In this experiment, 15 epochs were performed with 80 bands in the horizontal, vertical, and diagonal, respectively at the
iterations of the batch sample, and 1200 iterations have been done. position of (p, q) at the kth level. Also, S, N, and R are the
Other training parameters, including base learning rate and regularizer parameter controlling the amount of detail in the output
momentum, have often been considered as 0.001 and 0.9, image.
respectively. In this experiment, the efficiency of different orthogonal wavelet
All of the predefined parameters have been determined equally bases (such as Daubechies, Coiflets, Symlets, Biorthogonal, etc.)
for PDCNNs. But, for ResNet-101 in which the batch size has been was compared, and it was indicated that the Coiflets wavelet
determined as 12 because this model has more number of layers families had better performance than other families to highlight the
and parameters than other, and the limitation of processing power local anomalies in the homogeneous surface. Also according to the
was led to insufficient memory space for performing the retrain literature, Coiflets wavelet filter are efficient in wavelet-base
process with 15 samples in each batch. applications [71-74].
All computations were performed on a personal computer with a The main part of this step is determining the value of k, S, N, and
64-bit operating system, 8.0 GB memory, and Intel(R) Core i7- R. The influence of each parameter on the results after thresholding
4710HQ @ 2.50 GHz processor running a GeForce GTX 850M has been presented in Fig. 13.
graphics processing unit (GPU). Also, image processing and deep According to Fig. 14, several key points about regulizer
learning methods were programmed and performed in MATLAB parameter can be presented as follow:
2018b. 1. k can be specified as a positive integer. If the number of
multi-resolution levels (k) is small, in some cases, they
cannot sufficiently separate distress from the background of
4.2. Crack segmentation
images. However, the large number of multi-resolution
levels yields the fusion effect of the anomalies and may result
As shown in Fig. 3, the segmentation process consists of several in false detection.
steps, and in Fig. 12, an example of this process has been shown. 2. The parameter S can be specified as a positive integer, and
the high value of this parameter can lead to a noisy and
unclear result.
3. R can be specified as a positive number. The value of R has
a direct relation with the amount of detail in the output image.
If the value of R is too small, the amount of separated detail In this research, the opening operation has been applied to
is reduced and may lead to removing cracks. However, the remove objects containing fewer than 30 pixels. In most cases,
very large value of R leads to high separated details and these objects appear as noise in the background. Thus, the outputs
creates a noisy image. are more obvious after performing this filter.
4. The value of N has an inverse relation with the amount of
separated detail in the output image, and different values of 4.3. Results and discussion
N have inverse effects over the effects of R. Also, N can be
specified as a positive number. In this experiment, eight PDCNNs were applied to create
After a try and error process, values of 1 to 3 for multi-resolution classifier models by using transfer learning based on the collected
levels, 3 to 5 for R, 1 or 2 for S, and 2 to 4 for N are the most dataset. The speed of the models in training and testing procedure
appropriate to segment cracks in this experiment. is an important parameter for choosing the appropriate pre-trained
In the last step, the geometric shape of defects was identified, but model in pavement crack detection and classification. The
the output images noisy image. In the last step, the morphological information on the model’s speed has been presented in Table 3.
filter was applied to make a clearer output by performing several A more obvious comparison of training and testing time has been
tasks such as remove noise in the crack area and background. presented in Fig. 14 and Fig. 15 based on the average time for each
The mathematical morphology operations are useful techniques image in seconds.
to create more applicable output in the image processing procedure. As can be seen in Fig. 14, AlexNet, SqueezNet, GoogleNet, and
These operations consider the images as a collection of geometric ResNet-18 are significantly faster than others. These models are
structures and process images based on shapes. In this process, first, faster because the number of training parameters are less than other
a smaller geometrical set is defined that is known as a structure models (Table.1). In other words, the complexity of this model is
element, then each pixel in the image is adjusted based on its less than others. Accordingly, in this experiment, SqueezNet is the
neighborhood and defined structure element. fastest model, and DenseNet-201 is the lowest model.
Dilation and erosion are the most fundamental operations which As can see in Fig. 15, more simple models (with regard to the
have a different effect on images. Erosion removes pixels from the architecture of models, number of layers, number of parameters,
border of objects based on the structural element and shrinks the etc.) are faster than other models. In the testing process, similar to
objects. With A and B as sets in Z 2 , this operation is denoted by training, SqueezNet is the fastest model with 0.017 seconds per
A!B in which A is eroded by the structuring element B as follows: image.
[75-77]. In addition to training and testing speed, the performance of
models in the detection and classification of the pavement’s cracks
A ! B z | B z A is very important. The performance of models has been evaluated
(4)
using the confusion matrix. This matrix determines the
In words, this equation indicates that the erosion of A by B is the performance indices for each class using four components that
set of all points z such that B, translated by z, is contained in A.
Table 3
Dilation adds pixel to borders of objects based on the structural Time spent in training and testing process
element and leads to expanded components. This operation is
denoted by A B in which A is dilated by the structuring Pre-trained Training Train for each Testing Test for each
element B as follows: [76,77] CNN Time image Time image
(on average) (on average)
A B z | B A
z
(5)
AlexNet
SqueezNet
578.333
460
0.482
0.383
14.051
12.898
0.019
0.017
̂ 𝑆 is the reflection of B about its origin that is shifting by z. GoogleNet 764.667 0.637 18.138 0.024
(𝐵)
ResNet-18 773 0.644 15.851 0.021
The result of this operation is the set of all displacements, such that
ResNet-50 2333 1.944 30.984 0.041
𝐵̂ A overlap by at least one element. Other morphological
ResNet-101 4587 3.823 40.462 0.054
operations can be made by a combination of dilation and erosion
DensaNet- 15489 12.908 66.964 0.089
for specific applications such as shape detection, boundary
201
extraction, hole filling, extraction of connected components,
removing small objects, and etc. [76 77]. In this research, the hole Inception-V3 3541 2.951 39.512 0.053
filling operation has been applied to remove holes (noises) in crack
areas. This operation has been defined using dilation as follows:
X k X k 1 B Ac k 1, 2,3,...
(6)
According to Fig. 17, the sensitivity of classes in the trained for detection surface and linear cracking with 0.991 and 0.97,
models are in the range of 0.932 to 1, and the non-cracking class respectively.
has the highest value of sensitivity in all of the models with a range F-score, F1-score, or F-measure is defined as the weighted
of 0.988 to 1. The evaluation of the other two classes (linear and harmonic mean of precision and recall. This criterion provides a
surface cracking) shows various results more realistic criterion for model performance evaluation. This
The general sensitivity of models is in the range of 0.957 to 0.986. parameter is calculated as follows:
The SqueezNet and GoogleNet have the highest value of
2
sensitivity with 0.986 and 0.984, respectively. The ResNet-101 has F score
the lowest sensitivity with 0.957. 1 1
Precision Sensitivity
Specificity or true negative rate has a similar concept to (12)
sensitivity, but for the negative state of model answers. This
parameter determines the probability of the correct answer when Fig. 20 illustrates that similar to previous performance metrics,
the image is actually not in the specified class and is calculated as the non-cracking class has the highest F-score in all of the models
follows: with a range of 0.989 to 1. Also, in most models, the performance
of surface crack detection is better than linear crack detection or
TN have the same performance, except Inception-V3. The best
Specificity
FP TN (10) performance on detection of linear and surface cracking images is
observed in SqueezNet with an F-score value of 0.979, and the
As shown in Fig. 18, the specificity is in the range of 0.962 to 1. worst performance is related to ResNet-18.
In all of the models, the non-cracking class has the highest value According to all of the performance metrics, it can be said that
of specificity. Also, in most of the models, the specificity of the the performance of models with lower complexity such as
surface cracking class is higher than the linear class. SqueezNet and GoogleNet are better than other models because
Precision or positive predictive value is one of the important the simpler structure and less training parameter lead to better
parameters for evaluating the performance of classifier models. training procedures when the datasets are limited.
This parameter indicates that when the model classifies an image By considering all of the important parameters that indicate the
in a specific class, what is the probability that the result is correct. efficiency of the models such as training time, testing time, and
In fact, this parameter indicates the reliability of model results, and value of the F-score as a realistic criterion for crack detection
can be calculated as follows: performance, a comprehensive evaluation has been provided in Fig.
TP 21. In this figure, the size of disks represents the testing time and
Precision axis X and Y related to training time per image and general F-score
TP FP (11) of the models, respectively. Accordingly, the better model has a
smaller disk, upper position in the left corner. As can be seen in
As can be seen in Fig. 19, the precision of the non-cracking class
Fig. 21, the performance of GoogleNet and SqueezNet are better
is higher than other classes in most models (except DenseNet-201).
than other models, and SqueezNet has the best performance with a
This means that when the result of classifier models is non-
value of 0.076, 0.442, and 0.986 in testing and training time per
cracking, the result can be accepted with more confidence. Also,
image and F-score, respectively. It should be noted that because of
in most models except for the ResNet-50, the precision of the
the week performance in training time, the DenseNet-201 has not
surface cracking class is higher than the linear cracking class.
been considered in Fig. 21.
According to precision, the SqueezNet has the best performance
10 S. Ranjbara et al. / International Journal of Pavement Research and Technology xx (2020) xxx-xxx
5. Conclusion and future works Fig. 22. Results of the crack segmentation process on the images
of (a) linear cracking class and (b) surface cracking class.
The pavement management system is one of the most important
parts of the road management system, and pavement inception In recent years, the transport agencies and researchers have
tasks (such as distress detection) provide the main information focused on developing an automatic system to detect pavement
about pavement conditions that are used in the pavement distresses. On the other hand, data mining science is highly
management procedure. regarded because the computation power has been growing
S. Ranjbara et al. / International Journal of Pavement Research and Technology xx (2020) xxx-xxx 11
significantly and a huge volume of data was collected in the past [2] D. A. Noyce, H. Bahia, J. Yambo, J. Chapman, A. J. W. Bill,
years. Incorporating road safety into pavement management:
ML techniques are one of the methods that perform data mining Maximizing surface friction for road safety improvements,
tasks, and deep neural networks are a popular branch of it. Report Number MRUTC 04-04. Traffic Operations and
Convolutional neural networks are a specific form of deep neural Safety Laboratory, University of Wisconsin, Madison, WI,
networks that have been applied for various tasks such as USA, 2007.
classification, prediction, feature extraction, etc. [3] M. Y. Shahin, Pavement management for airports, roads, and
In this research, eight PDCNNs (AlexNet, GoogleNet, parking lots, Springer, NY, USA, 1994, p.2-5.
SqueezNet, ResNet-18, ResNet-50, ResNet-101, DenseNet-201, [4] F. M. Nejad, H. Zakeri, "The Hybrid Method and its
Inception-v3) have been retrained based on pavement images with Application to Smart Pavement Management," in
the transfer learning technique to perform classification tasks for Metaheuristics in Water, Geotechnical and Transport
recognition of two general types of pavement cracks, including Engineering, ed. By X.-S. Yang, A. H. Gandomi, S.
surface cracks and linear cracks. Talatahari, A. H. Alavi, Elsevier, Oxford, 2013, p. 439-484.
The main objectives of this research can be defined as follows: [5] H. Zakeri, F. M. Nejad, A. Fahimifar, Image Based
Creating a TL-based model by retraining of different pre-trained Techniques for Crack Detection, Classification and
models for pavement crack detection and classification Quantification in Asphalt Pavement: A Review, Archives
Comparing and assessing the performance of the different pre- Comput. Methods Eng. 24 (4) (2017) 935-977.
trained model in crack detection and classification [6] V. Ananth, P. Ananthi, V. Elakkiya, J. Priyadharshini, R.
Presenting a more efficient crack segmentation process based on Shiyamili, Automatic Pavement Crack Detection Algorithm,
a new wavelet transform module. Inter. Innov.Res. J. Eng. Technol. 2 (1) (2017) 86-89.
For creating a crack detection and classification model, a dataset [7] K. Zhang, H. Cheng, B. Zhang, Unified Approach to
was prepared, which contains three classes, including linear Pavement Crack and Sealed Crack Detection Using
cracking, surface cracking, and non-cracking. However, using the Preclassification Based on Transfer Learning, J. Comput.
transfer learning to retrain the eight PDCNNs based collected Civ. Eng. 32 (2) (2018) 04018001.
dataset. Then the performance of models was tested by the testing [8] B. Mataei, F. Moghadas Nejad, M. Zahedi, H. Zakeri,
dataset. Evaluation of pavement surface drainage using an automated
The performance of models was evaluated according to training image acquisition and processing system, Autom. Constr. 86
time, testing time, and crack detection performance. Based on (1) (2018) 240-255.
training and testing time, AlexNet, SqueezNet, GoogleNet, and [9] Z. Hong, Exact extraction method for road rutting laser lines,
ResNet-18 have better performance than other models. In this Analysis, vol. 106070, p. 19, 2018.
research, five performance metrics were used to assess and [10] C. Ting, W. Weixing, Y. Nan, G. Ting, W. Fengping,
compare the efficiency of crack detection models, including Detection method for the depth of pavement broken block in
accuracy, sensitivity, specificity, precision, and F-score. cement concrete based on 3D laser scanning technology,
According to performance metrics, SqueezNet and GoogleNet Infrared Laser Engineering, 2 (1) (2017) 013.
generally have better performance than the others. Also, the results [11] S. Dai and K. Hoegh, 3D step frequency GPR Asphalt
indicated that Retraining the PDCNN by utilizing transfer learning pavement stripping detection: Case study evaluating filtering
is an efficient method for pavement crack detection and approaches. In Advanced Ground Penetrating Radar
classification with a range of 0.95 to 0.99 in general models' (IWAGPR), 9th International Workshop, Edinburgh,
performance. Scotland, 2017, pp. 1-7.
In the second part of the research, a wavelet transform-based [12] S. Li, C. Yuan, D. Liu, H. Cai, Integrated processing of
method was present to segment cracking regions in the pavement image and GPR data for automated pothole detection, J.
image. In addition to the wavelet transform, the segmentation Comput. Civ. Eng. 30 (6) (2016) 04016015.
process was performed by using various image processing [13] X. Chapeleau, J. Blanc, P. Hornych, J.-L. Gautier, J. Carroget,
algorithms such as HE, image smoothing, thresholding, and Use of distributed fiber optic sensors to detect damage in a
morphological operations. The presented model could segment pavement, 12th ISAP Conference on Asphalt pavement,
crack pixels with a high level of clarity. Raleigh, North Carolina, USA, 2014.
The results of this work indicate that the developed image-based [14] M. R. Carlos, M. E. Aragón, L. C. González, H. J. Escalante,
system using DCNN and wavelet transformation can be used as an F. Martínez, Evaluation of Detection Approaches for Road
efficient system for detecting, classifying pavement cracks, and Anomalies Based on Accelerometer Readings--Addressing
segmenting crack pixels in pavement images. Who's Who, IEEE Transactions Intelligent Transp. Syst. 19
(10) (2018) 3334 - 3343.
Conflict of interest [15] A. Fox, B. V. Kumar, J. Chen, F. Bai, "Multi-lane pothole
detection from crowdsourced undersampled vehicle sensor
The authors declare that they have no conflict of interest. data, IEEE Transactions Mobile Comput. 16 (12) (2017)
3417-3430.
References [16] S. Nakashima, S. Aramaki, Y. Kitazono, S. Mu, K. Tanaka,
S. Serikawa, Application of ultrasonic sensors in road
[1] C. Y. Chan, B. Huang, X. Yan, S. Richards, Investigating surface condition distinction methods, Sensors 16 (10) (2016)
effects of asphalt pavement conditions on traffic accidents in 1678.
Tennessee based on the pavement management system [17] R. Madli, S. Hebbar, P. Pattar, V. Golla, Automatic detection
(PMS), J. Adv. Transp. 44 (3) (2010) 150-161. and notification of potholes and humps on roads to aid
drivers, IEEE Sensors J. 15 (8) (2015) 4313-4318.
12 S. Ranjbara et al. / International Journal of Pavement Research and Technology xx (2020) xxx-xxx
[18] J. Mehta, V. Mathur, D. Agarwal, A. Sharma, K. Prakasha, [34] H. Ceylan, M. B. Bayrak, K. Gopalakrishnan, Neural
Pothole Detection and Analysis System (Pol) AS) for Real networks applications in pavement engineering: A recent
Time Data Using Sensor Networks, J. Eng. Appl. Sci. 12 (12) survey, Int. J. Pavement Eng. 7 (6) (2014) 434-444.
(2017) 3090-3097. [35] N.-D. Hoang, Q.-L. Nguyen, D. Tien Bui, Image processing–
[19] M. Solla, S. Lagüela, H. González-Jorge, P. Arias, Approach based classification of asphalt pavement cracks using
to identify cracking in asphalt pavement using GPR and support vector machine optimized by artificial bee colony, J.
infrared thermographic methods: Preliminary findings, NDT Comput. Civ. Eng. 32 (5) (2018) 04018037.
& E Inter. 62 (1) (2014) 55-65. [36] T. Wang, K. Gopalakrishnan, O. Smadi, A. K. Somani,
[20] J. Huang, W. Liu, X. Sun, A pavement crack detection Automated shape-based pavement crack detection approach,
method combining 2D with 3D information based on Transp. 33 (3) (2018) 598-608.
Dempster‐Shafer theory, Computer‐Aided Civ. Infrast. Eng. [37] W. R. L. d. Silva and D. S. d. Lucena, Concrete Cracks
29 (4) (204) 299-313. Detection Based on Deep Learning Image Classification. In
[21] Y. O. Ouma and M. Hahn, Wavelet-morphology based Multidisciplinary Digital Publishing Institute Proceedings,
detection of incipient linear cracks in asphalt pavements 18th International Conference on Experimental Mechanics
from RGB camera imagery and classification using circular (ICEM18), Brussels, Belgium, 2018.
Radon transform, Adv. Eng. Informatics 30 (3) (2016) 481- [38] H. Maeda, Y. Sekimoto, T. Seto, T. Kashiyama, H. Omata,
499. Road Damage Detection Using Deep Neural Networks with
[22] S. Mathavan, K. Kamal, M. Rahman, A Review of Three- Images Captured Through a Smartphone, Comput. Aided
Dimensional Imaging Technologies for Pavement Distress Civ. Infras. Eng. 33 (12) (2018) 1127-1141.
Detection and Measurements, IEEE Transactions Intelligent [39] Y.-J. Cha, W. Choi, O. Büyüköztürk, Deep Learning-Based
Transp. Syst. 16 (5) (2015) 2353-2362. Crack Damage Detection Using Convolutional Neural
[23] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature, 521 Networks, Comput. Aided Civ. Infras. Eng. 32 (5) (2017)
(1) (2015) 436. 361-378.
[24] L. Deng, D. Yu, Deep learning: methods and applications, [40] Y. Liu, J. Yao, X. Lu, R. Xie, L. Li, DeepCrack: A Deep
Foundations Trends® in Signal Process. 7 (3–4) (2014) 197- Hierarchical Feature Learning Architecture for Crack
387. Segmentation, Neurocomput. 338 (1) (2019) 139-153.
[25] H. Lokeshwor, L. K. Das, S. Goel, Robust method for https://doi.org/10.1016/j.neucom.2019.01.036
automated segmentation of frames with/without distress [41] K. Gopalakrishnan, S. K. Khaitan, A. Choudhary, A.
from road surface video clips, J. Transp. Eng. 140 (1) (2013) Agrawal, Deep Convolutional Neural Networks with transfer
31-41. learning for computer vision-based data-driven pavement
[26] Y. ZHANG and H. ZHOU, "Automatic pavement cracks distress detection, Constr. Build. Mater. 157 (2017) 322-330.
detection and classification using radon transform, J. Infor. [42] C. V. Dung, Autonomous concrete crack detection using
Comput. Sci. 9 (17) (2012) 5241-5247. deep fully convolutional neural network, Automation Constr.
[27] Y. J. Tsai, V. Kaul, A. Yezzi, Automating the crack map 99 (2019) 52-58.
detection process for machine operated crack sealer, Autom. [43] S. Albelwi and A. Mahmood, A framework for designing the
Constr. 31 (1) (2013) 10-18. architectures of deep convolutional neural networks, Entropy
[28] S. Varadharajan, S. Jose, K. Sharma, L. Wander, C. Mertz, 19 (6) (2017) 242.
Vision for road inspection. In IEEE Winter Conference on [44] Z. Tong, J. Gao, Z. Han, Z. Wang, Recognition of asphalt
Applications of Computer Vision, Steamboat Springs, USA, pavement crack length using deep convolutional neural
2014, pp. 115 - 122. networks, Road Mater. Pavement Des. 19 (6) (2018) 1334-
[29] W. Xu, Z. Tang, J. Zhou, J. Ding, Pavement crack detection 1349.
based on saliency and statistical features. In IEEE [45] A. Bhandare, M. Bhide, P. Gokhale, R. Chandavarkar,
International Conference on Image Processing, Melbourne, Applications of Convolutional Neural Networks, Inter. J.
Australia, 2013, pp. 4093-4097. Computer Sci. Infor. Technol. 7 (5) (2016) 2206-2215.
[30] H. Zakeri, F. M. Nejad, A. Fahimifar, Rahbin: A quadcopter [46] C. Kyriakou, S. E. Christodoulou, L. Dimitriou, Detecting
unmanned aerial vehicle based on a systematic image and Classifying Roadway Pavement Cracks, Rutting,
processing approach toward an automated asphalt pavement Raveling, Patching, and Potholes Utilizing Smartphones, In
inspection, Autom. Constr. 72 (2) (2016) 211-235. Transportation Research Board 97th Annual Meeting,
[31] S. Hongxun, W. Weixing, W. Fengping, W. Linchun, W. Washington DC, USA, 2018.
Zhiwei, Pavement crack detection by ridge detection on [47] S. Gao, Z. Jie, Z. Pan, F. Qin, R. Li, Automatic Recognition
fractional calculus and dual-thresholds, Inter. J. Multimedia of Pavement Crack via Convolutional Neural Network, In
Ubiquitous Eng. 10 (4) (2015) 19-30. Transactions on Edutainment XIV, ed. By Z. Pan, A. D.
[32] C. A. Lettsome, Y.-C. J. Tsai, V. Kaul, Enhanced adaptive Cheok, W. Müller, Springer, Berlin, 2018, p. 82-89.
filter-bank-based automated pavement crack detection and [48] B. Li, K. C. Wang, A. Zhang, E. Yang, G. Wang, Automatic
segmentation system, J. Electronic Imaging 21 (4) (2012) classification of pavement crack using deep convolutional
043008. neural network, Inter. J. Pavement Eng. 21 (4) (2018) 1-7,
[33] F. M. Nejad and H. Zakeri, An optimum feature extraction https://doi.org/10.1080/10298436.2018.1485917.
method based on Wavelet–Radon Transform and Dynamic [49] M. A. Nielsen, Neural networks and deep learning.
Neural Network for pavement distress classification, Expert Determination press, USA, 2015.
Syst. Appl. 38 (8) (2011) 9442-9460. [50] S. Dorafshan, R. J. Thomas, M. Maguire, Comparison of
deep convolutional neural networks and edge detectors for
S. Ranjbara et al. / International Journal of Pavement Research and Technology xx (2020) xxx-xxx 13
image-based crack detection in concrete, Constr. Build. [64] X. Wang and X. Feng, Pavement distress detection and
Mater. 186 (2018) 1031–1045. classification with automated image processing, 2011
[51] D. C. Ciresan, U. Meier, J. Masci, L. Maria Gambardella, J. International Conference on Transportation, Mechanical,
Schmidhuber, Flexible, high performance convolutional Electrical Engineering (TMEE), Changchun, China, 2011,
neural networks for image classification, In Proceedings- pp. 1345-1350: IEEE.
International Joint Conference on Artificial Intelligence [65] B. Sun and Y. Qiu, Automatic Pavement Surface Cracking
(IJCAI), Barcelona, Spain, 2011. Recognition Using Wavelet Transforms Technology, Second
[52] S. J. Pan and Q. Yang, A survey on transfer learning, IEEE International Conference on Transportation Engineering,
Transactions Knowledge Data Eng. 22 (10) (2010) 1345- Chengdu, China, 2009, pp. 2201-2206.
1359. [66] C. Ma, W. Wang, C. Zhao, F. Di, Z. Zhu, Pavement cracks
[53] O. Russakovsky et al., Imagenet large scale visual detection based on FDWT, International Conference on
recognition challenge, Inter. J. Comput. Vision 115 (3) (2015) Computational Intelligence and Software Engineering
211-252. (CiSE), Wuhan, China, 2009, pp. 1-4: IEEE.
[54] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet [67] J. Zhou, P. S. Huang, F.-P. Chiang, Wavelet-based pavement
classification with deep convolutional neural networks, In distress detection and evaluation, Optical Eng. 45 (2) (2006)
Advances in neural information processing systems, Harrah's 027007.
Lake Tahoe, NV, USA, 2012. [68] F. M. Nejad, N. Karimi, H. Zakeri, Automatic image
[55] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. acquisition with knowledge-based approach for multi-
Dally, K. Keutzer, Squeezenet: Alexnet-level accuracy with directional determination of skid resistance of pavements,
50x fewer parameters and< 0.5 mb model size, Computer Autom. Constr. 71 (2) (2016) 414-429.
Vision and Pattern Recognition, Cornell University, USA, [69] G. Yang, Q. J. Li, Y. J. Zhan, K. C. Wang, C. Wang, Wavelet
2016. based macrotexture analysis for pavement friction prediction,
[56] C. Szegedy et al., Going deeper with convolutions, IEEE KSCE J. Civ. Eng. 22 (1) (2018) 117-124.
conference on computer vision and pattern recognition, [70] R. Abbasnia and A. Farsaei, Corrosion detection of
Boston, USA, 2015, pp. 1-9. reinforced concrete beams with wavelet analysis, Inter. J. Civ.
[57] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Eng., Transaction A: Civ. Eng. 11 (3) (2013) 160-169.
Surpassing human-level performance on imagenet [71] A. Dixit and S. Majumdar, Comparative analysis of coiflet
classification, IEEE international conference on computer and daubechies wavelets using global threshold for image
vision, Santiago, Chile, 2015, pp. 1026-1034. denoising, Inter. J. Adv. Eng. Technol. 6 (5) (2013) 2247-
[58] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for 2252.
image recognition, IEEE conference on computer vision and [72] D. Wei and A. C. Bovik, Generalized coiflets with nonzero-
pattern recognition, Las Vegas, USA, 2016, pp. 770-778. centered vanishing moments, IEEE Transactions on Circuits
[59] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Systems II: Analog Digital Signal Process. 45 (8) (1998)
Densely connected convolutional networks, IEEE 988-1001.
Conference on Computer Vision and Pattern Recognition [73] D. Wei and H. Cheng, Representations of stochastic
(CVPR), Honolulu, USA, 2017. processes using coiflet-type wavelets, in Proceedings of the
[60] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Tenth IEEE Workshop on Statistical Signal and Array
Rethinking the inception architecture for computer vision, in Processing, Pocono Manor, USA, 2000, pp. 549-553.
IEEE conference on computer vision and pattern recognition, [74] R. Nigam and S. K. Singh, Crack detection in a beam using
Las Vegas, USA, 2016, pp. 2818-2826. wavelet transform and photographic measurements, Struct.
[61] P. S. Addison, The illustrated wavelet transform handbook: 25 (2020) 436-447.
introductory theory and applications in science, engineering, [75] V. L. Fox, M. Milanova, S. Al-Ali, Scene Analysis Using
medicine and finance. CRC press, 2017. Morphological Mathematics and Fuzzy Logic, in Computer
[62] P. Prasad and G. Umamadhuri, Biorthogonal Wavelet-based Vision in Control Systems-1, ed. By M.N. Favorskaya,
Image Compression, in Artificial Intelligence and L.C. Jain, Springer, Switzerland , 2015, p. 239-259.
Evolutionary Computations in Engineering Systems, ed. By [76] P. Soille, Morphological image analysis: principles and
S. Dash, P.Chandra, B.Naidu, R.Bayindir, S.Das, Springer, applications, Springer Science & Business Media,
Singapore, 2018, pp. 391-404. Switzerland, 2013.
[63] P. Luo, X. Qu, X. Qing, J. Gu, CT Image Denoising Using [77] R. C. Gonzalez and R. E. Woods, Digital image processing,
Double Density Dual Tree Complex Wavelet with Modified 2nd edn. Pearson Education International, London, UK,
Thresholding, 2nd International Conference on Data Science 2007.
and Business Analytics (ICDSBA), Changsha, China, 2018,
pp. 287-290: IEEE.