Comparison of Deep Concolutional Neural Networks and Edge Detectors For Image-Based Crack Detection in Concrete
Comparison of Deep Concolutional Neural Networks and Edge Detectors For Image-Based Crack Detection in Concrete
Comparison of Deep Concolutional Neural Networks and Edge Detectors For Image-Based Crack Detection in Concrete
h i g h l i g h t s
Investigating the performance of six edge detectors for concrete crack detection.
Studying the performance of a DCNN trained in three modes to detect the same cracks.
Comprehensive comparison between the edge detectors and the DCNNs.
Proposing a new hybrid crack detector by combining the DCNN and the edge detector.
The hybrid method had 24 times less noise than the least noisy edge detector.
a r t i c l e i n f o a b s t r a c t
Article history: This paper compares the performance of common edge detectors and deep convolutional neural net-
Received 22 March 2018 works (DCNN) for image-based crack detection in concrete structures. A dataset of 19 high definition
Received in revised form 30 July 2018 images (3420 sub-images, 319 with cracks and 3101 without) of concrete is analyzed using six common
Accepted 3 August 2018
edge detection schemes (Roberts, Prewitt, Sobel, Laplacian of Gaussian, Butterworth, and Gaussian) and
using the AlexNet DCNN architecture in fully trained, transfer learning, and classifier modes. The relative
performance of each crack detection method is compared here for the first time on a single dataset. Edge
Keywords:
detection methods accurately detected 53–79% of cracked pixels, but they produced residual noise in the
Concrete
Crack detection
final binary images. The best of these methods was useful in detecting cracks wider than 0.1 mm. DCNNs
Deep learning were used to label images, and accurately labeled them with 99% accuracy. In transfer learning mode, the
Neural network network accurately detected about 86% of cracked images. DCNNs also detected much finer cracks than
Edge detection edge detection methods. In fully trained and classifier modes, the network detected cracks wider than
Image processing 0.08 mm; in transfer learning mode, the network was able to detect cracks wider than 0.04 mm.
Vision-based Computational times for DCNN are shorter than the most efficient edge detection algorithms, not consid-
Structural health monitoring ering the training process. These results show significant promise for future adoption of DCNN methods
for image-based damage detection in concrete. To reduce the residual noise, a hybrid method was pro-
posed by combining the DCNN and edge detectors which reduced the noise by a factor of 24.
Ó 2018 Elsevier Ltd. All rights reserved.
https://doi.org/10.1016/j.conbuildmat.2018.08.011
0950-0618/Ó 2018 Elsevier Ltd. All rights reserved.
1032 S. Dorafshan et al. / Construction and Building Materials 186 (2018) 1031–1045
without any additional processing [5,18]. The number of images autonomous crack detection in visual images using combined
collected depends on a number of factors, but is commonly in image processing techniques and artificial neural networks
the hundreds of thousands [5,18]. Manual identification of flaws [30,37]. Deep convolutional neural networks (DCNNs) have been
in such large images sets is time consuming and prone to inaccu- recently used for concrete crack detection [38–40].
racy due to inspector fatigue or human error. Enhanced image Despite the abundance of image-based crack detection studies,
inspection refers to the use of some image processing algorithm direct comparisons between these methods is a gap. Save two
to make it easier to identify flaws in inspection images. This is typ- noteworthy exceptions, most research focuses on developing new
ically performed using one of several edge detection algorithms, methods for crack detection rather than comparing the perfor-
which greatly magnify the visibility of cracks within images. In mance of existing methods. Abdel-Qader et al. [27] compared the
doing so, the aforementioned problems with inspector fatigue performance of the fast Haar transform, Fourier transform, Sobel
can be mitigated to some degree. Finally, autonomous image pro- filter, and Canny filter for crack detection in 25 images of defected
cessing refers to the use of an algorithm that detects cracks within concrete and 25 images of sound concrete. The fast Haar transform
images. This is typically accomplished using machine learning was the most accurate method, with overall accuracy of 86%, fol-
algorithms or other artificial intelligence schemes. lowed by the Canny filter (76%), Sobel filter (68%), and the Fourier
This paper discusses the latter two approaches and compares transform (64%). he processing time was not considered in the
their performance. Image enhancement methods includes the analysis and the criteria for recoding true of false positives in the
application of a variety of image processing techniques on visual binary images were not clear. Lack of definition for metrics such
images to detect cracks including but not limited to morphological as true positive has seen in the past studies. Mohan and Poobal
operations [19], digital image correlation [20,21], image binariza- [41] reviewed a number of edge detection techniques for visual,
tion [22,23], percolation model [24], wavelet transforms [25], thermal, and ultrasonic images, but the information presented
fractal analysis [28] and edge detectors [12,27,29,31–35]. The was from several studies that considered vastly different data sets,
autonomous approach for crack detection on the other hand and so the results are not directly comparable. A comparison
requires a set of training images to learn the features of cracks. between two edge detectors, Canny and Sobel, and a convolutional
Similarly, several researchers have shown the feasibility of neural network is done in [39]. However, the comparison was
Table 1 Table 2
Number of cracked and sound sub-images in training, validation, and testing datasets. Number of Cp and Up pixels in the testing dataset.
2. Dataset
crack identification. 2D images are represented mathematically
The dataset used in this study consisted of 100 images of con-
by matrices (one matrix, in the case of greyscale images, or three
crete panels that simulated reinforced concrete bridge decks for
matrices in the case of red/green/blue color images). An ideal edge
the purpose of verifying various non-destructive testing. These
is defined as a discontinuity in the greyscale intensity field. Crack
panels were constructed previously in Systems, Materials, and
detection algorithms can emphasize edges by applying filters in
Structural Health laboratory (SMASH Lab) at Utah State University.
either the spatial or frequency domain. Edge detection algorithms
Images are collected with a 16 MP digital single lens reflex camera
purport to make manual crack detection more reliable. In general,
with 35 mm focal length and no zoom. The target was normal to
such image processing algorithms follow three steps: (1) edge
the axis of the lens at a distance of approximately 0.5 m. The back-
detection, (2) edge image enhancement, and (3) segmentation
ground illumination was in the range 400–1000 lx, as measured by
(sometimes called binarization or thresholding). Edge detection
a NIST traceable digital light meter purchased new just prior to
involves the application of various filters in either the spatial or
measurement. The finest crack width was approximately 0.04
frequency domain to a grayscale image in order to emphasize dis-
mm and the widest was 1.42 mm. The original image size was
continuities. Edge image enhancement scales the image and
2592 4608 px and the field of view was approximately 0.3
adjusts contrast to improve edge clarity. Segmentation transforms
0.55 m. Images were stored as JPEG with average file size near 5
the enhanced edge image into a binary image of cracked and sound
MB. In order to comply with the architecture of the DCNN, each
pixels.
original image was divided into 180 sub-images with size of 256
In the spatial domain, the convoluted image E is the sum of the
256 px. The sub-images were labeled in two categories, 1574
element-by-element products of the image intensity I and the ker-
sub-images with cracks and 16,426 sub-images without cracks.
nel K in every position in which K fits fully in I. For I MN (image
Fig. 1 illustrates the studied dataset with one example of high-
dimensionM N) and K mn (kernel size m n):
resolution image, a sub-image labeled as C from the original image
if it had a crack, and a sub-image labeled as U from the original X
m X
n
Eði; jÞ ¼ I ði þ k 1; j þ l 1ÞKðk; lÞ ð1Þ
image if it did not. For DCNN applications, this dataset was divided
k¼1 l¼1
into training dataset, validation dataset, and testing dataset as
shown in Table 1. The testing dataset was selected randomly from E is of size ðM m þ 1Þ ðN n þ 1Þ. Filters kernels may
100 original images. The images in this dataset are a portion of the include x and y components (corresponding to image spatial
bridge deck images of the structural defect dataset (SDNET2017 dimension in horizontal and vertical dimensions), K x and K y , in
[42]). The sub-images in the testing dataset have also been seg- which case the edge image E is the hypotenuse of Ex and Ey .
mented in the pixel-level as Cp and Up for semantic comparison Four edge detector filters in the spatial domain were employed
where Cp stands for pixels with cracks and Up stands for sound in this study: Roberts in x and y directions, denoted as K Rx and K Ry
pixels. The results of the pixel-level segmentation on the testing in Eq. (2), Prewitt in x and y directions, denoted as K Px and K Py in
dataset are presented in Table 2. In this table, the Cp ratio stands Eq. (3), Sobel in x and y directions, denoted as K Sx and K Sy in Eq.
for the number of pixels in each image labeled as crack to total (4), and Laplacian-of-Gaussian (LoG) denoted as K LoG in Eq. (5). A
number of pixels in that image. 10 10 LoG filter was employed here with standard deviation of
r ¼ 2.
3. Edge detection
1 0 0 1
K Rx ¼ K Ry ¼ ð2Þ
0 1 1 0
In this paper, edge detection refers to the use of filters (edge
detectors) in an image processing algorithm for the purpose of 2 3 2 3
1 0 1 1 1 1
detecting or enhancing the cracks in an image such that they can 6 7 6 7
be more easily and efficiently located within a large image dataset. K Px ¼ 4 1 0 1 5 K Py ¼4 0 0 0 5 ð3Þ
Cracks in a two-dimensional (2D) image are classified as edges, and 1 0 1 1 1 1
thus existing edge detection algorithms are likely candidates for
1034 S. Dorafshan et al. / Construction and Building Materials 186 (2018) 1031–1045
Fig. 2. The effect of edge enhancement on the final image of the edge detectors, Sobel, (a) original image, (b) final binary image superimposed on the original image (b)
without the edge enhancement, (c) with the edge enhancement.
Fig. 3. Closing operation illustration (a) first level binary image, (b) dilation, and (c) erosion using a disk structuring element with diameter of 4 px. (LoG edge detector was
used).
2 3 2 3
1 0 1 1 2 1 order and cut-off frequency in the Butterworth filter; and r is
6 7 6 7
K Sx ¼ 4 2 0 2 5 K Sy ¼4 0 0 0 5 ð4Þ the user-defined parameter to define the standard deviation of
1 0 1 1 2 1 the Gaussian filter.
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 2
M N
2 Dðu; v Þ ¼ u þ1 þ v þ1 ð9Þ
x2 þ y 2 2 r 2 x þ y2 2 2
K LoG ¼ r2 ðGðx; yÞÞ ¼ exp ð5Þ
4r4 2r2
and K B , and K G , are Butterworth and Gaussian filters.
Edge detection in the frequency domain requires transforma- The scaled edge image Esc is E scaled such that 0 Esc 1. The
tion of the spatial domain image I into the frequency domain enhanced edge image is then:
image F by fast Fourier transform (FFT). The edge image E is the
2rEsc
element-wise product of the filter kernel K and the frequency Ee ðx; yÞ ¼ ½Esc ðx; yÞ minðEsc Þ þ lEsc ð10Þ
max ðEsc Þ min ðEsc Þ
domain image F:
where minðEsc Þ, maxðEsc Þ, rEsc , and lEsc are minimum, maxi-
Eðu; v Þ ¼ Kðu; v Þ Fðu; v Þ ð6Þ mum, standard deviation, and mean of the scaled edge image,
respectively. Edge enhancement is a crucial part of the proposed
where u and v are the dimensions of the transformed image in
method by improving the segmentation of pixels with cracks from
the frequency domain. Two edge detector filters in the frequency
the background pixels. Fig. 2 shows an example of the effect of the
domain were employed in this study: Butterworth denoted as K B
edge enhancement in the proposed crack detection algorithm
in Eq. (7) and Gaussian denoted as K G in Eq. (8).
when the Sobel edge detector was used.
The final binary image B is constructed by segmentation, which
1
K B ðu; v Þ ¼ 1 h i2n ð7Þ assigns a value of one to all pixels in which the intensity exceeds
Dðu;v Þ
1þ D0
some threshold T and a value of zero to all other pixels. In this
study, a two level binarization is introduced: the first is based on
a pixel intensity threshold T 1 in the enhanced edge image and then
D2 ðu;v Þ
based on an area connectivity threshold T 2 on the binary image
K G ðu; v Þ ¼ 1 e 2r2 ð8Þ
from the first level. The first threshold operation filters the weak
where Dðu; v Þ is the distance between the pixel ðu; v Þ and the edges from the enhanced edge image (Eq. (11)). By applying T 1
origin of the frequency (the center of the M N image) as defined the strong edges in the enhanced edge image (80% or stronger than
by Eq. (8),D0 and n are the user-defined parameters to define the the maximum intensity, 0:8maxðEe Þ) are preserved as cracks. At
S. Dorafshan et al. / Construction and Building Materials 186 (2018) 1031–1045 1035
Fig. 4. Crack in the (a) ground true, 1391 px, (b) without the closing operation 391 px correct detection (c) with closing operation 1215 px correct detection (LoG edge
detector).
Fig. 5. Crack in the (a) ground true, 2325 px, (b) without second level threshold operation 3672 pixels false detection (c) with second level threshold operation: 214 px false
detection (results of the Gaussian edge detector in the frequency domain).
this point, the strong edges have been identified in the first binary threshold is to control the noise in the final binary image as shown
image; however, the surface roughness of the concrete can cause in Fig. 5 for the results of the Gaussian high pass filter.
residual noise.
T 2 ¼ maxðAc Þ ð12Þ
T 1 ¼ 0:8maxðEe Þ ð11Þ
In order to gain more effective segmentations, the morphologi- 4. DCNN
cal operation closing was carried out on the first level binary
image. Closing consists of a dilation followed by an erosion using Using direct image-processing techniques for concrete crack
an identical structuring element for both operations (see Fig. 3). detection has several drawbacks. First, the algorithms are tailored
The purpose of the closing operation is to unify possibly the dis- for certain images in the studied datasets which affects their
crete parts of the crack in the first binary image. Structuring ele- performance on new datasets. These algorithms may not be as
ments define the spatial domain on the binary image in which accurate when tested on new datasets taken in more challenging
the morphological operation will be carried out. Circle-shaped situations such as low lighting condition, presence of shadows,
structuring elements with generic dimensions were used to per- low quality cameras, etc. Second, the image processing algorithms
form the closing operation. The radius of the structural element are often designed to aid the inspector in crack detection and still
was defined as the minimum Euclidean distance between the cen- rely on human judgement for final results [29]. One solution is
troids of connected components in each binary image. The closing using machine learning algorithms to analyze the inspection
operation improved the results of each individual edge detector in images [43,44]. Deep convolutional neural networks (DCNNs) are
terms of true positives. Fig. 4 shows an example where not apply- a type of feedforward artificial neural networks which have revolu-
ing the closing operation cause the LoG edge detector to miss the tionized autonomous image classification and object detection in
more than 70% the cracked pixels after applying the second thresh- the past 5 years [45]. A DCNN uses a set of annotated, e.g. labeled,
old operation. images for training and calculates the learning parameters in the
The second binarization operation was designed to segment the learning layers between the input and output layers thorough
cracks from the residual noises in the first binary image based on thousands to millions iterations.
the area of the connected components in the first level binary A number of architectures have been employed to create neural
image (Eq. (12)). The connected area Ac ðx; yÞ is the number of con- networks providing excellent accuracy on open-source labeled
tiguous pixels in a connected component, considering eight- datasets, such as ImageNet and MNIST, in the past 4 years
neighbor connectivity. max ðAc Þ is the area of the largest connected [46–48]. Each architecture includes a number of main layers. The
component in the first level binary image. The idea for the area main layers are composed of sub-layers. The total number of layers
1036 S. Dorafshan et al. / Construction and Building Materials 186 (2018) 1031–1045
defined in a software program, like MATLAB, to build an architec- The AlexNet DCNN architecture, illustrated in Fig. 6 comprises
ture is referred to as ‘‘Programmable Layers” in this study. Kriz- five convolution layers (C1—C5), three max pooling layers (MP1—
hevsky [46] proposed one of the first architectures of a DCNN, i.e. MP3), seven nonlinearity layers using the rectified linear unit
AlexNet. This architecture has 8 main layers (25 programmable (ReLU) function (ReLU1—ReLU7), two normalization layers (Nor-
layers) and was the winner of the image classification competition m1—Norm2), three fully connected layers (FC1—FC3), two dropout
in 2012 (ImageNet [49]). Szegedy et al. proposed another architec- layers (DP1—DP2), one softmax layer (SM), and one classification
ture called GoogleNet with 22 main layers (144 programmable lay- layer (CL). Each layer is applied to the image using the convolution
ers) and improved the accuracy by introducing inception module operation (Eq. (1)). Fig. 6 shows the architecture of the
in the learning layers which won the 2014 competition [50]. Deep AlexNet along with its corresponding filter number and size. The
residual learning neural network, ResNet, was introduced in 2016 kernel values are determined iteratively through training, but the
[51]. ResNet has 50 and 101 main layers (177 and 347 pro- size, number, and stride of the kernels are predetermined. The non-
grammable layers) and was the winner of 2016 competition. linearity layers operate on the result of each convolution layer
DCNNs have been used in vision-based structural health moni- through element-wise comparison. The ReLU function used for
toring in recent years for crack detection [39], road pavement nonlinearity is defined as the maximum value of zero and the
cracks [52,53], corrosion detection [54,55], multi-damage detec- input:
tion [38,56] structural health monitoring [58], and localizing
0; x < 0
acoustic emissions sources in plates [57]. Due to popularity of f ð xÞ ¼ ð13Þ
Unmanned Aerial Systems (UASs) for structural health monitoring x; x 0
and bridge inspection [5,6,18] applications of DCNNs in UAS- Following the non-linearity layer, a max pooling layer intro-
assisted inspections has begun to attract researchers for more duces a representative for a set of neighboring pixels by taking
robust non-contact damage detection [40,59,60]. their maximum value. The max pooling layers are essential to
In general, DCNN architecture includes an input layer, learning reduce the computational time and overfitting issues in the DCNN.
layers, and an output layer [61]. The input layer reads the image After the max pooling layer, one or several fully connected layers
and transfers it to the learning layers. The learning layers perform are used at the end of the architecture. The fully connected layer
convolution operations, applying filters to extract image features. is a traditional multi-layer perceptron followed by a softmax layer
The output layer classifies the image according to target categories to classify the image. The mission of the fully connected layers is to
using the features extracted in the learning layers. The neural net- connect the information from the past layers together in way that
work can be trained by assigning target categories to images in a the softmax layer can predict the results correctly during the train-
training dataset and modifying filter values iteratively through ing process. The optimum combination is achieved from a process
back propagation until the desired accuracy is achieved. called backpropagation algorithm (partial derivatives of the soft-
DCNN can be used for crack detection in three ways: image clas- max layer output with respect to weights). The purpose of the soft-
sification [39], object localization [38], or pixel segmentation. The max layer is to ensure the sum of probabilities for all labels is equal
goal of classification is to label each image as cracked or sound. to 1. In addition to these basic layers, a DCNN also includes normal-
The training and validation datasets comprise pre-classified ization, dropout, and classification layers. Normalization layer nor-
cracked and sound images. The goal of localization is to determine malizes the response around a local neighborhood to compensate
bounding coordinates that identify the location of an object, e.g. a with the possible unbounded activations from the ReLu layer.
crack, within an image. As before, the training and validation data- The dropout layer is a probability-based threshold layer that filters
sets include both cracked and sound images, but the cracked responses smaller than a threshold probability (50% is common).
images have bounding boxes drawn around the location of the The classification layer is similar to the fully connected layers.
crack. The goal of segmentation is to classify each pixel as cracked For detailed explanations of function of each layer and their inter-
or sound, and the training and validation datasets comprise a very action, readers can refer to Ref. [62].
large number of pre-classified pixels. The computational intensity Three modes are used for applying the network on the training
of DCNN normally necessitates subdivision of images to reduce dataset. The first mode is to fully train the network from scratch
computational requirements. (FT mode) on the training dataset. In this mode all the weights
are assigned with random numbers and the computed through weights in the high-level layers are updated through training on
iterations based on the training dataset. Obtaining an annotated the new dataset.
dataset for concrete cracks as big as ImageNet is not currently fea-
sible. Even if a large concrete crack dataset was available, training
process from scratch could take days to complete on hardware 5. Experimental program
with several graphic processor units (GPUs), and would therefore
be prohibitively time consuming. However, it is possible to apply 5.1. Computational resources
a previously trained network (pre-trained network) on a small
dataset and obtain reasonable accuracy [63]. Pre-trained networks All computations were performed on a desktop computer with
can be applied on a new dataset in different ways [64]. These 64-bit operating system, 32 GB memory, and 3.40 GHz processor
methods are usually referred to as ‘‘domain adaptation” in the deep running a GeForce GTX 750 Ti graphics processing unit (GPU).
learning literature. One can use an already trained DCNN on the Image processing and deep learing methods were programmed
ImageNet dataset as a classifier for new images. This type of and performed in MATLAB2018a.
domain adaptation is referred to as classifier (CL mode). In the CL
mode, only the last fully connected layer needs to be altered in 5.2. Edge detection
order to match with the target labels in concrete dataset. The net-
work then uses the pre-trained weights and forms a classifier The testing dataset of 319 C and 3101 U sub-images was itera-
based on the training dataset. Note that no actual training happens tively processed using each of the six edge detection schemes dis-
when CL mode is used. Another studied domain adoption method cussed in Section 3. Unlike the past studies [30,26,58], the metrics
is to partially retrain a pre-trained network and modify the layers to evaluate the performance of each edge detector was defined
according to a new dataset. This approach is called fine-tuning or very clearly on a pixel level. The final binary images were com-
transfer learning (TL mode). In the TL mode, the network has to pared to the ground truth. True positive (TP) is when the edge
be re-trained since both classifier and weights have to be updated detector identified a pixel on the crack pixels (Cp). False negative
based on the new dataset. In the TL mode, the weights of the is when the edge detector did not identify a pixel on the crack
lower-level layers (closer to the input image layer) are preserved. pixels (Cp). True negative (TN) is when the edge detector did not
These weights are computed from training on millions of images identify a pixel on the sound pixels (Up), and false positive is when
and consist of generic feature extractors such as edge detectors. the edge detector identified a pixel on the sound pixels (Up). Note
Therefore, the determined lower-level weights can be applied on all comparisons were performed on the final binary images pro-
any dataset for feature extraction. On the other hand, the classifier duced by each edge detector. Fig. 7 shows examples of how metrics
layers (close to end of network) are more sensitive to the training are calculated: (a) the original image is segmented into 1582 Cp
dataset and its labels. To adjust the network to the new dataset, the pixels (highlighted) and 63,954 Up pixels, (b) the final binary
Fig. 7. Examples of metric, (a) ground truth, Cp = 1582 px, Up = 63,954 px, (b) final binary image using Roberts edge detector, Cp = 2276 px, Up = 63,260 px (c) TP = 1367 px,
(d) FN = 215 px, (e) TN = 63,045 px, (f) FP = 909 px.
1038 S. Dorafshan et al. / Construction and Building Materials 186 (2018) 1031–1045
image super imposed on the original image, Roberts edge detector, network was set to stop iterating once the accuracy in the valida-
identified 2276 Cp pixels (highlighted) and 63,260 Up pixels, (c) tion dataset stopped improving in three consecutive epochs. If the
1367 pixels in the final binary image were TP, (d) 215 pixels in validation criterion is not met by the end of 30th epoch, more iter-
the final binary image were FN, (e) 63,046 pixels in the final binary ations cycles should be considered for the training.
image were TN, and (f) 909 pixels in the final binary image were FP. The network in each mode is used to classify the sub-images in
The metrics in the Fig. 7c through f are shown in white. Note that the testing dataset and the results are compared to the ground
for the U class sub-images, TP and FN are meaningless and only TN truth. TP is when the network correctly labeled a sub-image as C,
and FP were recorded. and an FN when the network failed to do so. A TN is when the net-
The team then rated each edge detection scheme in terms of work correctly labeled a sound sub-image as U and an FN when the
true positive rate (TPR), true negative rate (TNR), accuracy (ACC), network labeled a sound sub-image as C. TPR, TNR, ACC, PPV, NPV,
positive predictive value (PPV), negative predictive value (NPV), and F1 are calculated according to Eq. (14) through Eq. (19). T and
and F1 score, defined as follows MCW are evaluated in the same manner as the edge detector
approach except that the training time is not considered when cal-
TP
TPR ¼ ð14Þ culating the T for the DCNNs.
TP þ FN
TN
TNR ¼ ð15Þ 6. Results and discussion
TN þ FP
6.1. Edge detection
TP þ TN
ACC ¼ ð16Þ
TP þ FN þ TN þ FP A summary of results for the six edge detectors applied on the C
class and U class sub-images are shown in Tables 3 and 4, respec-
TP tively. The metrics for comparison are shown Fig. 8a in terms of
PPV ¼ ð17Þ
TP þ FP TPR, PPV, and in Fig. 8b in terms of TNR, ACC, and NPV. The latter
metrics were significantly affected by the data imbalance between
TN Cp and Up pixels. Nevertheless, the evaluated metrics in this paper
NPV ¼ ð18Þ
TN þ FN are on the pixel-level which makes the comparison unique com-
pared to previous crack detection studies. LoG produced the high-
est TPR with 79% followed by Sobel and Prewitt with 76% and 69%.
2TP
F1 ¼ ð19Þ In the spatial domain, Robert edge detector produced lowest TPR,
2TP þ FN þ FP
53%, which was still higher that the TPRs produced by the fre-
In addition, missed crack width (MCW), and computational quency domain edge detectors, where Butterworth detected 41%
time (T) are also compared between different edge detectors. and Gaussian detected only 31% of the crack pixels. LoG edge
MCW is defined as the coarsest crack that went undetected by a detector also produced the highest PPV, 60%, followed by Sobel
particular edge detection scheme, as determined by crack width and Prewitt with 56% and 54%. Gaussian high pass filter had only
measurement using a crack width microscope with 0.02 mm reso- 18% PPV which was the lowest among the studied methods. F1
lution. Computational time is defined as the average processing scores ranged from 23% in sub-images segmented by Gaussian high
time for ten runs of a particular edge detection scheme, normalized pass filter to 68% in sub-images segmented by LoG. Roberts and
by the number of images (180 sub-images). Gaussian high pass filter produced the lowest TNR values, 96%
and 97%, respectively and the lowest ACC, both 95%. As for NPV,
5.3. DCNN the lowest values were 95% and 96% when Gaussian and Butter-
worth edge detectors were used, respectively. Again LoG was the
Crack detection using DCNN was performed by classification of most accurate, 98%, and produced the highest TNR = 99% and
sub-images in the fully trained, transfer learning, and classifier NVP = 99.5%. The difference in metrics in Fig. 8b is only 2%–4%
modes as described in Table 1.
Batch size number and validation criterion determine the num-
Table 4
ber of iterations in training process. Larger batch sizes result in fas-
Summary of edge detector performance on sub-images in the U class.
ter convergence, but batch size is limited by the available GPU
memory. The selected batch size was 10. The training dataset has Domain Edge Detector TNR T (s)
12,809 sub-images. Number of iterations to cover all sub-images Spatial Roberts 0.93 5.46
was simply calculated by dividing the total sub-images to the Prewitt 0.95 4.71
batch size, i.e. 1281 iterations. This number of iterations is known Sobel 0.95 4.83
LoG 0.95 4.05
as an epoch. A maximum of 30 epochs were considered for back
propagation on the network, meaning that the network performs Frequency Butterworth 0.95 5.98
Gaussian 0.93 5.86
as many 30 1281 ¼ 38; 430 iterations to finish the training. The
Table 3
Summary of edge detector performance on sub-images in the C class.
Domain Edge Detector TPR TNR ACC PPV NPV F1 MCW (mm) T (s)
Spatial Roberts 0.53 0.96 0.95 0.23 0.99 0.32 0.40 5.15
Prewitt 0.69 0.98 0.97 0.42 0.99 0.52 0.20 4.13
Sobel 0.76 0.98 0.97 0.44 0.99 0.56 0.20 4.64
LoG 0.79 0.99 0.98 0.60 1.00 0.68 0.10 3.79
Frequency Butterworth 0.41 0.97 0.96 0.25 0.99 0.31 0.20 5.76
Gaussian 0.32 0.97 0.95 0.18 0.98 0.23 0.20 5.70
S. Dorafshan et al. / Construction and Building Materials 186 (2018) 1031–1045 1039
but note that these metrics are affected by the gigantic class imbal- Factoring Roberts, overall the spatial domain edge detectors
ance between Cp and Up pixels (less than of 1% of the pixels were produced better binary images for crack detection compared to
Cp). To see this difference better, percentage of reported FP pixels the frequency domain ones. The same trend can be seen for values
per sub-image, noise ratio (NR), for each edge detector is shown in of T in Tables 3 and 4 where the fastest method was LoG. Finally,
Fig. 8c. To calculate the noise ratio, first the average FN for each LoG detected finer cracks than the rest of studied method with
method was calculated by dividing total number of FNs to the MCW of 0.1 mm. Fig. 9 shows an example of crack detection using
number of sub-images in each class, 319 in C class, and 3101 in different edge detectors along with the original image and ground
U class. The NR is then calculated as the average FNs divided by truth. LoG edge detector performed better than all the other stud-
total number of pixels in each sub-image, i.e. 256 256. ied detectors in all considered metrics.
As seen for sub-images in the C class, the NR values were 2.4%
on average and were almost half of the ones in the U class, 5.3% 6.2. DCNN
on average. This is due to the fact that the proposed methodology
for crack detection is based on the assumption that there is a crack 6.2.1. Training and validation
in the inspected image and it is the largest connected component Fig. 10 shows the achieved accuracy of the DCNN under fully
in the first level binary image. Therefore, noise and irrelevant trained and transfer learning during training and validation. In fully
objects are preserved in the final binary image in the U class as trained mode, the validation criterion was met after 14 epochs
FN. In addition, the LoG edge detector produced the lowest NR val- (17934 iterations), which required 6200 s processing time. The
ues, 1.1% in the C class and 4.5% in the U class while Roberts and resulting validation accuracy was 97.5%. In transfer learning mode,
frequency domain detectors were the worst ones in both classes. the validation criteria were met after 7 epochs (8967 iterations),
Fig. 9. An example of edge detector performance on a 0.02 mm crack (a) original image, (b) GT = 1145 px, (c) Roberts, TPR = 39% (d) Prewitt, TPR = 60%, (e) Sobel, TPR = 55%,
(f) LoG, TPR = 71%, (g) Butterworth, TPR = 38%, (h) Gaussian, TPR = 17%.
1040 S. Dorafshan et al. / Construction and Building Materials 186 (2018) 1031–1045
Epoch 1.00
0 5 10 15
100 0.90
0.80
0.70
95
0.60
Accuracy (%)
0.50
0.40
90
0.30
0.20
FT mode-Training
85 0.10
TL mode-Training
FT mode-Validation 0.00
TPR TNR ACC PPV NPV F1
TL mode-Validation
80 FT TL CL
0 5000 10000 15000 20000
Iteration Fig. 11. Metrics for the DCNN in FT, TL, and CL modes.
Table 5
Summary of DCNN results.
Mode TP FN TN FP TPR TNR ACC PPV NPV F1 MCW (mm) Time (s)
FT 212 107 3099 2 0.66 1.00 0.97 0.99 0.97 0.80 0.08 2.65
TL 275 44 3077 24 0.86 0.99 0.98 0.92 0.99 0.89 0.04 2.81
CL 267 52 3034 67 0.84 0.98 0.97 0.80 0.98 0.82 0.08 2.75
S. Dorafshan et al. / Construction and Building Materials 186 (2018) 1031–1045 1041
Fig. 12. DCNN results for a crack of width 0.08 mm: (a) FT mode, (b) TL mode, and (c) CL mode.
its inefficiency. Several noteworthy results become apparent. First, The reader should note that the results presented here are for
the true positive rate for transfer learning was 20% higher than for high quality images taken in good lighting and free of vibration.
fully trained. At the same time, the true negative rate for transfer The extension of these results to noncontact image-based inspec-
learning was only one percent lower than for fully trained. This, tion and damage detection will require application of the same
combined with smaller missed crack width and similar computa- methods to images with imperfections resulting from poor light-
tion time requirements, make transfer learning a clear winner ing, vibration, or other issues [40]. This work is ongoing, but the
among DCNN modes. F1 scores and PPV values were significantly results presented here show promise for autonomous crack detec-
for DCNN in all modes were significantly greater than the edge tion in concrete structures using noncontact image-based
detector techniques. methods.
This analysis also shows that DCNN methods performed better Despite being recently introduced to structural health monitor-
at image based concrete crack detection than any of the edge ing and inspection, DCNNs have improved the vision-based struc-
detection methods (expect for FT mode). The LoG edge detector tural defect detection. This study shows the superiority of an
exhibited the highest true positive rate of all six edge detectors, AlexNet DCNN over traditional edge detectors for concrete crack
accurately identifying nearly 79% of cracked pixels. LoG also detection. The performance of the network can be further
detected the finest cracks of any edge detector, with MCW of 0.1 enhanced if more powerful architectures such as GoogleNet or
mm. The TPR among DCNN methods was about 86% and 84% in RestNet are implemented for crack detection. Unlike edge detec-
TL and CL modes, respectively, which was a significant improve- tors, the DLCCNs can be used for any types of defect in structures,
ment over LoG. In addition, the TFR for the DCNN approach had if enough annotated images are available for training. Formation an
superiority over the edge detectors due to the high NR ratios (refer annotated image dataset for structural defects, such as ImageNet,
to Fig. 8c). Furthermore, DCNN methods were able to detect finer is vital for further applications of DCNNs in structural engineering.
cracks than edge detectors. In fully trained and classifier modes, With this dataset available, new architectures can be developed to
the MCW was 0.08 mm, a marginal improvement over LoG. In focus on finding a handful of structural defects instead of 1000 dif-
transfer learning mode, the MCW was an impressive 0.04 mm. ferent objects, which will reduce the computational time associ-
Computational times also show the superiority of DCNN over ated with training process. In addition, domain adaptation
edge detectors; computational time was almost 50% less for the methods such as transfer learning, will be more effective if the net-
DCNNs over edge detectors. However, crack detection using DCNN work is previously trained on the structural defects dataset.
requires time for training (in FT and TL modes) and classifier con- Improving the performance of domain adaptation techniques
struction (in CL mode), which are not taken into account when makes real-time defect detection in robotic vision-based inspec-
reporting the computational time. The assumption is that, in the tions feasible. In other words, a pre-trained DCNN on the structural
future, pre-trained DCNN will be available for this purpose, so it defect dataset, can be directly used to accurately classify new
is not necessarily appropriate to include training time in this com- images taken by an unmanned aerial system to different structural
parison. In fact, DCNN can be trained using a very large dataset defects as the inspection is taking place.
with images of varying quality (e.g., resolution, lighting condition,
focus), making it more robust and applicable to most situations.
Edge detectors are typically manually tuned to maximize perfor- 7. Hybrid crack detector
mance for a particular dataset or subset, diminishing their
robustness. Unless semantic networks are used for crack detection, edge
These results highlight the significant promise of DCNN meth- detectors are the better choice to provide segmentation in the pixel
ods for image based crack detection in concrete. The evidence pre- level. This information puts the edge detector in favor of the DCNN
sented here shows that edge detection methods—which represent for fine monitoring and measurements of cracks but creating the
the current state of practice—perform reasonably well. DCNN training dataset with classified pixels can be very time-
methods provide autonomous crack detection and provide signifi- consuming and challenging. On the other hand, the sole use of edge
cant performance enhancements over edge detection schemes. The detectors has the disadvantage of residual noise or non-crack
results presented here for DCNN are only a preliminary step in the objects misidentified as cracks. Even with the most effective edge
development of DCNN methods for concrete crack detection. detector, LoG, there was more than 4% of TN (combined of FNs of
Future work will demonstrate the use of more advanced DCNN the images in both C class and U class) which is 9,457,066 sound
for the same problem in the hopes that more advanced networks pixels identified as cracks in the testing dataset. Fig. 14 shows
will provide even better crack detection performance. examples of TN (highlighted in red) in the three C class
1042 S. Dorafshan et al. / Construction and Building Materials 186 (2018) 1031–1045
Fig. 13. Results of (a) fully trained DCNN crack detection, (b) transfer learning DCNN, and (c) classifier DCNN for crack detection on the original full scale images in the testing
dataset.
sub-images after the final binary image from the LoG edge detector the images in the testing dataset. Combining the two approaches,
was super-imposed on the original images. number of FNs were reduced to 70% of the ones reported only by
Since the DCNN in FT mode provided such accurate classifica- the LoG edge detector (Fig. 15). This leads to an average reduction
tion for the U class sub-images, only two cases of FP, the network of the NR values from 2.45% to 0.11%.
was first used to label all the sub-images in U and C classes. No Using this technique also improved the overall performance of
edge detector was applied on the sub-images identified as U class the of the edge detectors. As mentioned before, the edge detectors
by the network. The LoG edge detector was applied on the rest of performed better on the sub-images with cracks due the effect of
S. Dorafshan et al. / Construction and Building Materials 186 (2018) 1031–1045 1043
Table 6
Comparison of DCNN and edge detection performance considering sub-images.
Method TPR TNR ACC PPV NPV F1 MCW (mm) Time (s)
DCNN FT mode 0.66 1.00 0.97 0.99 0.97 0.80 0.08 2.65
TL mode 0.86 0.99 0.98 0.92 0.99 0.89 0.04 2.81
CL mode 0.84 0.98 0.97 0.80 0.98 0.82 0.08 2.75
Edge Detector Roberts 0.53 0.96 0.95 0.23 0.99 0.32 0.40 5.30
Prewitt 0.69 0.98 0.97 0.42 0.99 0.52 0.20 4.42
Sobel 0.76 0.98 0.97 0.44 0.99 0.56 0.20 4.74
LoG 0.79 0.99 0.98 0.60 1.00 0.68 0.10 3.92
Gaussian 0.41 0.97 0.96 0.25 0.99 0.31 0.20 5.87
Butterworth 0.32 0.97 0.95 0.18 0.98 0.23 0.20 5.78
Fig. 14. Examples of FNs in the U class images (a) non-crack edge, (b) different surface finish, (c) noise due to the coarse concrete surface (results of the LoG in the spatial
domain).
Fig. 15. Combination of DCNN and edge detectors (a) the superimposed image with crack using LoG on all sub-images, (b) the superimposed image with crack without using
LoG on U class sub-images, (c) the superimposed image without crack using LoG on all sub-images, (d) the superimposed image without crack without using LoG on U class
sub-images.
second level threshold which was the reason to evaluate their per- 8. Conclusions
formance on C class and U class sub-images separately in Tables 3
and 4. However, PPV and F1 score metrics would be considerably This paper presents a comparison of edge detection and DCNN
lower if the both classes were considered in calculating them. algorithms for image based concrete crack detection. The dataset
For the best edge detector, i.e. LoG, PPV = 6% and F1 = 11% were consisted of 3420 sub-images of concrete cracks. Several common
achieved when both classes were used. However, using the hybrid edge detection algorithms were employed in the spatial (Roberts,
technique resulted in the almost the same PPV and F1 score Prewitt, Sobel, and LoG) and frequency (Butterworth and Gaussian)
provided in Table 3 for the LoG since only C class images were domains. AlexNet DCNN architecture was employed in its fully
analyzed (with exception of two sub-images in the U class). trained, classifier, and fine-tuned modes. Edge detection schemes
1044 S. Dorafshan et al. / Construction and Building Materials 186 (2018) 1031–1045
performed reasonably well. The best method—LoG—accurately [10] S. Dorafshan, M. Maguire, Bridge inspection: human performance, unmanned
aerial vehicles and automation, J. Civil Struct. Health Monit. 8 (3) (2018) 443–
detected about 79% of cracked pixels and was useful in detecting
476.
cracks coarser than 0.1 mm. In comparison, the best DCNN [11] N. Metni, T. Hamel, A UAV for bridge inspection: visual servoing control law
method—the network in transfer learning mode—accurately with orientation limits, Autom. Constr. 17 (1) (2007) 3–10.
detected 86% of cracked images and could detect cracks coarser [12] S. Dorafshan, M. Maguire, X. Qi, Automatic Surface Crack Detection in Concrete
Structures using OTSU Thresholding and Morphological Operations (UTC 01-
than 0.04 mm. This represents a significant performance enhance- 2016), Utah Transportation Center, Logan, UT, 2016.
ment over edge detection schemes and shows promise for future [13] S. German, I. Brilakis, R. DesRoches, Rapid entropy-based detection and
applications of DCNN for image based crack detection in concrete. properties measurement of concrete spalling with machine vision for post-
earthquake safety assessments, Adv. Eng. Inf. 26 (4) (2012) 846–858.
In addition, a methodology was proposed to reduce the FNs reports [14] K. Vaghefi, T.T.M. Ahlborn, D.K. Harris, C.N. Brooks, Combined imaging
by 70% by applying the edge detectors only on sub-images not technologies for concrete bridge deck condition assessment, J. Perform.
labeled as uncracked. In addition, a hybrid crack detector was Constr. Facil. 29 (4) (2013).
[15] H. Sohn, D. Dutta, J.Y. Yang, M. DeSimio, S. Olson, E. Swenson, Automated
introduced which combines the advantages of both approaches. detection of delamination and disbond from wavefield images obtained using
In the hybrid detector, the sub-images were first labeled by the a scanning laser vibrometer, Smart Mater. Struct. 20 (2011) 4.
network in the fully trained mode. Since it produced the highest [16] T. Omar, M.L. Nehdi, Remote sensing of concrete bridge decks using unmanned
aerial vehicle infrared thermography, Autom. Constr. 83 (2017) 360–371.
TNR, the edge detector is not applied on the sub-images labeled [17] A. Ellenberg, A. Kontsos, F. Moon, I. Bartoli, Bridge related damage
as U (uncracked) by the network. This technique reduced the noise quantification using unmanned aerial vehicle imagery, Struct. Control Health
ratio of the LoG edge detectors from 2.4% to 0.11% and has the sim- Monit. 23 (9) (2016) 1168–1179.
[18] S. Dorafshan, R. Thomas, M. Maguire, Fatigue crack detection using unmanned
ilar effect on the other edge detectors as well.
aerial systems in fracture critical, J. Bridge Eng. 23 (10) (2018).
This study shows the superiority of an AlexNet DCNN over tra- [19] M.R. Jahanshahi, S.F. Masri, Adaptive vision-based crack detection using 3D
ditional edge detectors for concrete crack detection. This superior- scene reconstruction for condition assessment of structures, Autom. Constr. 22
ity can be further improved when architectures such as GoogleNet (2012) 567–576.
[20] M. Hamrat, B. Boulekbache, M. Chemrouk, S. Amziane, Flexural cracking
or RestNet are implemented for crack detection. DLCCNs are able to behavior of normal strength, high strength and high strength fiber concrete
classify multiple defects if enough annotated images are available beams, using digital image correlation technique, Constr. Build. Mater. 106
for training. Formation an annotated image dataset for structural (2016) 678–692.
[21] A. Rimkus, A. Podviezko, V. Gribniak, Processing digital images for crack
defects, such as ImageNet, is vital for further applications of DCNNs localization in reinforced concrete members, Procedia Eng. 122 (2015) 239–243.
in structural engineering. With this dataset available, new archi- [22] L. Li, Q. Wang, G. Zhang, L. Shi, J. Dong, P. Jia, A method of detecting the cracks
tectures can be proposed to focus on finding structural defects of concrete undergo high-temperature, Constr. Build. Mater. 162 (2018) 345–
358.
instead of random objects, which will reduce the computational [23] H. Kim, E. Ahn, S. Cho, M. Shin, S.H. Sim, Comparative analysis of image
time associated with training process. In addition, domain adapta- binarization methods for crack identification in concrete structures, Cem.
tion methods such as transfer learning, will be more effective if the Concr. Res. 99 (2017) 53–61.
[24] T. Yamaguchi, S. Nakamuran, R. Saegusa, S. Hashimoto, Image-based crack
network is previously trained on the structural defects dataset. detection for real concrete surfaces, IEEJ Trans. Electr. Electr. Eng. 3 (1) (2008)
Improving the performance of domain adaptation techniques 128–135.
makes real-time defect detection in robotic vision-based inspec- [25] M.R. Taha, A. Noureldin, J.L. Lucero, T.J. Baca, Wavelet transform for structural
health monitoring: a compendium of uses and features, Struct. Health Monit. 5
tions feasible. In other words, a pre-trained DCNN on the structural
(3) (2006) 267–295.
defect dataset, can be directly used to accurately classify new [26] J. Kittler, R. Marik, M. Mirmehdi, M. Petrou, J. Song, Detection of defects in
images taken by an unmanned aerial system to different structural colour texture surfaces, in: IAPR Workshop on Machine Vision Applications,
defects as the inspection is taking place. Kawasaki, 1994.
[27] I. Abdel-Qader, P. Abudayyeh, M.E. Kelly, Analysis of edge-detection
techniques for crack identification in bridges, J. Comput. Civil Eng. 17 (4)
(2003) 255–263.
[28] A. Ebrahimkhanlou, A. Farhidzadeh, S. Salamone, Multifractal analysis of crack
9. Conflict of interest
patterns in reinforced concrete shear walls, Struct. Health Monit. 15 (1) (2016)
81–92.
There is no conflict of interest. [29] J.K. Oh, G. Jang, S. Oh, J.H. Lee, B.J. Yi, Y.S. Moon, J.S. Lee, Y. Choi, Bridge
inspection robot system with machine vision, Autom. Constr. 18 (7) (2009)
929–941.
[30] H. Moon, J. Kim, Intelligent crack detecting algorithm on the concrete crack
References image using neural network, in: Proceedings of the 28th International
Symposium on Automation and Robotics in Construction, Seoul, 2011.
[1] Federal Highway Administration, National Bridge Inventory, FHWA, McLean, [31] R.S. Lim, H.M. La, W. Sheng, A robotic crack inspection and mapping system for
VA, 2017. bridge deck maintenance, IEEE Trans. Autom. Sci. Eng. 11 (2) (2014) 367–378.
[2] Federal Highway Administration, National Bridge Inspection Standards [32] J.W. Kim, S.B. Kim, J.C. Park, J.W. Nam, Development of crack detection system
(FHWA–FAPG 23 CFR 650C), FHWA, McLean, VA, 2017. with unmanned aerial vehicles and digital image processing, in: Advances in
[3] B. Chan, H. Guan, J. Jo, M. Blumenstein, Tuwards UAV-based bridge inspection Structural Engineering and Mechanics (ASEM15), Incheon, 2015.
systems: a reivew and an application perspective, Struct. Monit. Maint. 2 (3) [33] A.M.A. Talab, Z. Huang, F. Xi, L. HaiMing, Detection crack in image using Otsu
(2015) 283–300. method and multiple filtering in image processing techniques, Optik-Int. J.
[4] C.H. Yang, M.C. Wen, Y.C. Chen, S.C. Kang, An optimized unmanned aeiral Light Electron Opt. 127 (3) (2016) 1030–1033.
system for bridge inspection, in: Proceedings of the Insternational Symposium [34] S. Dorafshan, M. Maguire, M. Chang, Comparing automated image-based crack
on Automation and Robotics in Construction, Vilnius, Lithuania, 2015. detection techniques in spatial and frequency domains, in: Proceedings of the
[5] S. Dorafshan, M. Maguire, N. Hoffer, C. Coopmans, Fatigue Crack Detection 26th American Society of Nondestructive Testing Reseach Symposium,
Using Unmanned Aerial Systems in Under-Bridge Inspection, Idaho Jacksonville, FL, 2017.
Transportation Department, Boise, ID, 2017. [35] S. Dorafshan, M. Maguire, Autonomous detection of concrete cracks on bridge
[6] S. Dorafshan, M. Maguire, N. Hoffer, C. Coopmans, Challenges in bridge decks and fatigue cracks on steel members, in: Digital Imaging 2017,
inspection using small unmanned aerial systems: results and lessons learned, Mashantucket, CT, 2017.
in: Proceedings of the 2017 International Conference on Unmanned Aircraft [36] Y. Noh, D. Koo, Y. M. Kang, D. Park, D. Lee, Automatic crack detection on
Systems, Miami, FL, 2017. concrete images using segmentation via fuzy C-means clustering, in:
[7] N. Gucunski, S.H. Kee, H.M. La, B. Basily, A. Maher, Delamination and concrete Proceedings of the 2017 International Conference on Applied System
quality assessment of concrete bridge decks using a fully autonomous RABIT Innovation, Sapporo, 2017.
platform, Int. J. Struct. Monit/. Maint. 2 (1) (2015) 19–34. [37] G.K. Choudhary, S. Dey, Crack detection in concrete surfaces using image
[8] R.S. Lim, H.M. Lag, W. Sheng, A robotic crack inspectionand mapping system processing, fuzzy logic, and neural networks, in: In 2012 IEEE Fifth
for bridge deck maintenance, ICCC Trans. Autom. Sci. Eng. 11 (2) (2014) 367– International Conference on Advanced Computational Intelligence (ICACI),
378. Nanjing, China, 2012.
[9] N. Gucunski, S.H. Kee, H. La, B. Basily, A. Maher, H. Bhasemi, Implementation of [38] Y.J. Cha, W. Choi, G. Suh, S. Mahmoudkhani, O. Büyüköztürk, Autonomous
a fully autonomous platform for assessment of concrete bridge decks RABIT, structural visual inspection using region-based deep learning for detecting
in: Structures Congress 2015, Portland, OR, 2015. multiple damage types, Comp. Aided Civil Infrastruct. Eng. (2017).
S. Dorafshan et al. / Construction and Building Materials 186 (2018) 1031–1045 1045
[39] Y.J. Cha, W. Choi, O. Büyüköztürk, Deep learning-based crack damage [53] A. Zhang, K.C. Wang, B. Li, E. Yang, X. Dai, Y. Peng, Y. Fei, Y. Liu, J.Q. Li, C. Chen,
detection using convolutional neural networks, Comput.-Aided Civil Automated pixel-level pavement crack detection on 3D asphalt surfaces using
Infrastruct. Eng. 32 (5) (2017) 361–378. a deep-learning network, Comp.-Aided Civil Infrastruct. Eng. 32 (10) (2017)
[40] S. Dorafshan, C. Coopmans, R.J. Thomas, M. Maguire, Deep learning neural 805–819.
networks for suas-assisted structural inspections: feasibility and application, [54] F.C. Chen, M.R. Jahanshahi, NB-CNN: deep learning-based crack detection
in: ICUAS 2018, Dallas, TX, 2018. using convolutional neural network and Naïve bayes data fusion, IEEE Trans.
[41] A. Mohan, S. Poobal, Crack detection using image processing: a critical review Indus. Electron. (2017).
and analysis (, Alexandria Eng. J. (2017). in press). [55] D.J. Atha, M.R. Jahanshahi, Evaluation of deep learning approaches based on
[42] M. Maguire, S. Dorafshan, R. Thomas, SDNET2018: A Concrete Crack Image convolutional neural networks for corrosion detection, Struct. Health Monit.
Dataset for Machine Learning Applications, Utah State Universiy, Logan, 2018. (2017). p. 1475921717737051.
[43] M. O’Byrne, F. Schoefs, B. Ghosh, V. Pakrashi, Texture analysis based damage [56] S.S. Kumar, D.M. Abraham, M.R. Jahanshahi, T. Iseley, J. Starr, Automated defect
detection of ageing infrastructural elements, Comput.-Aided Civil Infrastruct. classification in sewer closed circuit television inspections using deep
Eng. 28 (3) (2013) 162–177. convolutional neural networks, Autom. Constr. 91 (2018) 273–283.
[44] L. Wu, S. Mokhtari, A. Nazef, B. Nam, H.B. Yun, Improvement of crack-detection [57] A. Ebrahimkhanlou, S. Salamone, Single-sensor acoustic emission source
accuracy using a novel crack defragmentation technique in image-based road localization in plate-like structures using deep learning, Aerospace 5 (2)
assessment, J. Comput. Civil Eng. 30 (1) (2014) pp. (2018) 50.
[45] J. Schmidhuber, Deep learning in neural networks: an overview, Neural [58] Y. Bao, Z. Tang, H. Li, Y. Zhang, Computer vision and deep learning–based data
Networks 61 (2015) 85–117. anomaly detection method for structural health monitoring, Struct. Health
[46] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep Monit. (2018). p. 1475921718757405.
convolutional neural networks, in: Advances in Neural Information Processing [59] D. Kang, Y.J. Cha, Autonomous UAVs for structural health monitoring using
Systems, 2012, pp. 1097–1105. deep learning and an ultrasonic beacon system with geo-tagging, Comp.-Aided
[47] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, V. Vanhoucke, A. Civil Infrastruct. Eng. (2018).
Rabinovich, Going deeper with convolutions, CVPR, 2015. [60] D. Kang, Y.J. Cha, Damage detection with an autonomous UAV using deep
[48] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: learning, Sensors and Smart Structures Technologies for Civil, Mechanical, and
Proceedings of the IEEE conference on computer vision and pattern Aerospace Systems 2018, International Society for Optics and Photonics,
recognition, pp. 770–778. Denver, CO, 2018 (Vol. 10598, p. 1059804).
[49] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale [61] V. Dumoulin, F. Visin, A guide to convolution arithmetic for deep learning,
hierarchical image database, in: IEEE Conference on Computer Vision and arXiv preprint arXiv:1603.07285.
Pattern Recognition (CVPR), Miami, FL, 2009. [62] Fei-Fei L., J. J and Y. S., Stanford University, 2017. [Online]. Available: http://
[50] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: cs231n.stanford.edu/. [Accessed 21 03 2018].
IEEE Conference on Computer Vision and Pattern Recognition, Seattle. WA, [63] H.C. Shin, H.R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, R.M. Summers, Deep
2016. convolutional neural networks for computer-aided detection: CNN
[51] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, A. Rabinovich, architectures, dataset characteristics and transfer learning, IEEE Trans. Med.
Going Deeper with Convolutions, in CVPR, 2015. Imag. 35 (5) (2016) 1285–1298.
[52] L. Zhang, F. Yang, Y.D. Zhang, Y.J. Zhu, Road crack detection using deep [64] Z. Li, D. Hoiem, Learning without forgetting, IEEE Trans. Pattern Anal. Mach.
convolutional neural network, in: Image Processing (ICIP), 2016 IEEE Intell. (2017).
International Conference on, 2016.