ABinocular Vision-Based Crack Detection and Measurement Method Incorporating Semantic Segmentation 2023
ABinocular Vision-Based Crack Detection and Measurement Method Incorporating Semantic Segmentation 2023
Article
A Binocular Vision-Based Crack Detection and Measurement
Method Incorporating Semantic Segmentation
Zhicheng Zhang 1 , Zhijing Shen 1 , Jintong Liu 1 , Jiangpeng Shu 1 and He Zhang 1,2, *
1 College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China;
[email protected] (Z.Z.); [email protected] (Z.S.); [email protected] (J.L.); [email protected] (J.S.)
2 Center for Balance Architecture, Zhejiang University, Hangzhou 310058, China
* Correspondence: [email protected]
Abstract: The morphological characteristics of a crack serve as crucial indicators for rating the con-
dition of the concrete bridge components. Previous studies have predominantly employed deep
learning techniques for pixel-level crack detection, while occasionally incorporating monocular de-
vices to quantify the crack dimensions. However, the practical implementation of such methods
with the assistance of robots or unmanned aerial vehicles (UAVs) is severely hindered due to their
restrictions in frontal image acquisition at known distances. To explore a non-contact inspection
approach with enhanced flexibility, efficiency and accuracy, a binocular stereo vision-based method
incorporating full convolutional network (FCN) is proposed for detecting and measuring cracks.
Firstly, our FCN leverages the benefits of the encoder–decoder architecture to enable precise crack
segmentation while simultaneously emphasizing edge details at a rate of approximately four pictures
per second in a database that is dominated by complex background cracks. The training results
demonstrate a precision of 83.85%, a recall of 85.74% and an F1 score of 84.14%. Secondly, the
utilization of binocular stereo vision improves the shooting flexibility and streamlines the image
acquisition process. Furthermore, the introduction of a central projection scheme achieves reliable
three-dimensional (3D) reconstruction of the crack morphology, effectively avoiding mismatches
between the two views and providing more comprehensive dimensional depiction for cracks. An
experimental test is also conducted on cracked concrete specimens, where the relative measure-
Citation: Zhang, Z.; Shen, Z.; Liu, J.; ment error in crack width ranges from −3.9% to 36.0%, indicating the practical feasibility of our
Shu, J.; Zhang, H. A Binocular proposed method.
Vision-Based Crack Detection and
Measurement Method Incorporating Keywords: non-contact measurement; crack width; deep learning; image processing; binocular vision
Semantic Segmentation. Sensors 2024,
24, 3. https://doi.org/10.3390/
s24010003
(NDT) techniques to assist manual inspection. Huston et al. [11], for instance, were able
to successfully detect concrete cracks with a width as narrow as 1 mm using a ground
penetrating radar (GPR) equipped with a good impedance matching antenna (GIMA).
Chen et al. [12] deployed a three-dimensional laser radar, also referred to as 3D LiDAR, to
quantify the length of cracking on bridge components, while Valenca et al. [13] incorporated
terrestrial laser scanning (TLS) to characterize large-scale structural cracks. In recent years,
there has been a growing interest in the utilization of advanced nanomaterials to achieve
the self-monitoring of concrete cracks [14,15]. Roopa et al. [16] conducted a study where
they incorporated carbon fiber (CF) and multiwalled carbon nanotubes (MWCNT) as
nanofillers in the cementitious matrix, aiming to develop self-sensing sensors. These sensors
exhibit piezoelectric properties that correspond to the structural response, enabling them to
autonomously detect damage. At the microscale, the nanocomposite sensors demonstrate
exceptional sensitivity to small cracks, thereby facilitating real-time monitoring of crack
formation and propagation. However, it is important to note that this method is relatively
susceptible to environmental factors such as temperature and humidity, which can impact
its performance. Additionally, while the self-monitoring methods based on nanomaterials
can provide estimates of crack width and location, it cannot provide precise information
on crack morphology. In general, the exorbitant cost and limited applicability of these
abovementioned methods impede their promotion, rendering it arduous to satisfy the
demand for crack detection in huge-volume concrete bridges.
Over the past two decades, non-contact, high-precision and low-cost machine vision-
based NDT methods have emerged as the potentially viable alternative to manual visual
inspection. In this context, camera-mounted unmanned aerial vehicles (UAVs) or robots can
function as image sensing-based inspection platforms [17–20]. The automatic crack detec-
tion in large volumes of acquired image data thus poses a significant challenge. Previously,
researchers have utilized traditional image processing techniques (IPTs) for crack extraction,
proposing hybrid approaches that integrate thresholding, morphological operators or filter
concepts [21–27], as well as approaches based on mathematical transformations [28–32]. A
considerable proportion of crack measurements in these studies were conducted on binary
images, which can be broadly categorized into three distinct groups. The first group adopts
pixel count as a quantitative metric for representing cracks. Payab et al. [33] expressed the
crack area and length values in pixel numbers of crack region and skeleton, respectively,
and took the ratio of the two as the average crack width. The second type entails a scale
factor to convert the output of the first group into actual physical dimensions. After de-
tecting thermal cracks on fire-affected concrete via wavelet transform, Andrushia et al. [34]
adopted the unit pixel size, i.e., pixel resolution, to convert the morphological characteris-
tics from pixel units to physical units. The final category achieves measurement by means
of crack reconstruction. Liu et al. [35] employed the structure from motion (SFM) algorithm
to conduct 3D reconstruction, enabling not only the acquisition of crack width but also the
integration of cracks from multiple perspectives into a unified 3D scene.
The attainment of anticipated outcomes through IPT-based methods suitable for
simple cracks (i.e., high contrast and good continuity) is a challenging task due to the
presence of diverse noises in actual inspection data, necessitating further enhancement in
their robustness [36]. Therefore, modified solutions in combination with machine learning
(ML) have been proposed. Specifically, the image features extracted by IPTs pass through
the supervised learning-based classifier to determine whether they are indicative of a
crack. The study conducted by Prasanna et al. [37] focused on the detection of noise-
robust line segment features that accurately fit cracks. They employed support vector
machines, Adaboost and random forests as classifiers, utilizing spatially tuned multi-feature
appearance vectors. The performance of various feature combinations was evaluated,
demonstrating that integrating multiple design features into a single appearance vector
yields superior classification results. Peng et al. [38] developed a cascade classifier for
determining the positivity and negativity of crack detection windows by extending diverse
Haar-like features and employed a monocular vision technique, which belongs to the
Sensors 2024, 24, 3 3 of 23
second category of measurement methods, to calculate the actual crack width. While the
incorporation of ML into such methodologies strengthens their adaptability to real-world
scenarios, it is inevitable that the results will still be influenced by IPTs.
Deep learning (DL) is an emerging and powerful alternative to the above methods,
with the advantage of not depending on expert-dominated heuristic thresholds or hand-
designed feature descriptors, thereby greatly enhancing the accuracy and robustness of
feature extraction [39]. During recent years, a multitude of researchers have extensively
investigated the potential of DL-based models, particularly convolutional neural networks
(CNNs), for concrete crack detection. The aforementioned studies demonstrated successful
applications of CNNs in image classification [40] and object identification tasks, specifi-
cally pertaining to crack detection at both the image level/patch level [41–44] and object
level [45–47]. However, neither the grid-like detected results nor the bounding boxes with
class labels provide a precise description of the crack topology. In contrast, semantic seg-
mentation categorizes each pixel into a possible class (e.g., crack or background), offering
the highest level of detail in features. To detect cracks at the pixel level, Li et al. [48] trained
a CNN-based local pattern predictor for coarse analysis on crack pixels. Kim et al. [49]
adopted Mask R-CNN for instance segmentation of concrete cracks but not complete seman-
tic segmentation, hence having limited precision. Zhang et al. [50] developed CrackNet-R,
an effective semantic segmentation network for detecting cracks in asphalt pavement but
also prone to technical isolation in practice.
With the widespread adoption of the encoder–decoder architecture in semantic seg-
mentation, various CNNs have been proposed for pixel-level crack detection based on
different variations of this structure, including fully convolutional network (FCN) [51,52],
U-Net [53–56], SegNet [57–59], DeepLab series [60,61] and ResNets [62,63]. These architec-
tures consist of two components, namely the encoder module responsible for extracting
multi-scale features and the decoder module dedicated to restoring the feature informa-
tion. On the one hand, the decoders upscale the final output of the encoder network to
match the original input size, thereby facilitating the orientation of crack pixels. On the
other hand, the encoders supply the local information during the decoding process to
minimize loss of details from the input. Although the mentioned classical neural networks
demonstrate proficiency in executing fundamental segmentation operations, they remain
confronted with difficulties in achieving precise object edge segmentation and addressing
class imbalance. Consequently, researchers have started integrating various cutting-edge
methods to optimize the performance of segmentation models. In light of the requirement
for both semantic understanding and fine-grained detail in segmentation tasks, a suite of
attention-based methodologies [64,65] have been developed. These methods are designed
to assimilate multi-scale and global contextual information, thereby enhancing the accuracy
of defect identification. Chen et al. [66] have demonstrated impressive recognition accuracy
in identifying different types of cracks by incorporating the Convolutional Block Attention
Module (CBAM) into MobileNetV3 as the backbone network. Du et al. [67] have proposed
an Attention Feature Pyramid Network that enhances the precise segmentation of road
cracks within the YOLOv4 model. Similarly, Yang et al. [68] introduced a multi-scale,
tri-attention network, termed MST-NET. Other advanced computational modules, such as
separable convolution [69] and deformable convolution [70], have been introduced to fur-
ther enhance model performance. Recognizing that the training of semantic segmentation
models heavily relies on accurately annotated data, numerous researchers have also begun
exploring approaches to enhance the generalization and adaptability of segmentation meth-
ods from the perspective of dataset optimization and learning strategies. For instance, Que
et al. [71] have proposed a crack dataset expansion method based on generative adversarial
networks (GANs), resulting in higher recall rates and F1 scores for the same model. Nguyen
et al. [72] have introduced the Focal Tversky loss function to tackle class imbalance issues
in crack segmentation, shedding light on the role of loss functions during model training.
Furthermore, Weng et al. [73] have devised an unsupervised adaptive framework for crack
Sensors 2024, 24, 3 4 of 23
detection, effectively mitigating domain shift problems among various civil infrastructure
crack images.
On this basis, the first category of crack measurements was completed by Yang et al. [51],
Ji et al. [60] and Kang et al. [74]. Regrettably, these results are inadequately cited for crack
evaluation purposes. To make sense of the measure values, Li et al. [36] and Chen et al. [65]
employed a monocular vision technique to accurately quantify the crack indicators such as
area, max width and length. However, these methods rely on calibrated pixel resolution and
the similar triangle relationship for unit conversion, which necessitates frontal photography
of the target crack at known distances with a monocular device. As a result, restricted
shooting postures increase the difficulty of remotely manipulating inspection platforms,
leading to complications in image acquisition and unstable measurements.
The third category of binocular stereo vision-based measurement emerges as a promis-
ing solution to tackle the aforementioned challenges. In contrast to monocular vision,
which calculates physical dimensions mapped on pixels, binocular stereo vision recon-
structs the 3D coordinates of a crack in a datum coordinate system based on internal
imaging geometries and the external relative posture of two cameras, as well as matching
relations between two captured images. This enables a more comprehensive and reliable
quantification of morphological characteristics. Furthermore, binocular vision is not con-
strained by a fixed photogrammetric geometry and offers greater flexibility in capturing
cracks within its depth of field. Previously, Guan et al. [56] designed a vehicle-mounted
binocular photography system to generate 3D pavement models and precisely estimated
the volume of pavement potholes by integrating pixel-level predictions of a U-Net but
failed to further quantify the segmented cracks. Yuan et al. [75] and Kim et al. [76] up-
graded the automation of non-contact inspection through a robot and a UAV equipped
with binocular devices, respectively, despite their crack predictions not being derived from
semantic segmentation networks. Recently, Chen et al. [77] optimized DeeplabV3+ to
deliver a detailed crack morphology for measurement based on binocular stereo vision,
resulting in satisfactory outcomes.
In this paper, a novel non-contact crack detection and measurement method in combi-
nation with an encoder–decoder FCN and binocular stereo vision is proposed for efficient
and accurate evaluation of concrete cracks in bridge structures. The proposed method not
only enhances the flexibility of crack data acquisition but also enables rapid and precise
extraction of crack morphology, which facilitates 3D reconstruction in the form of spatial
discrete points, thereby obtaining a more comprehensive set of dimensional information
regarding cracks. The limitations on shooting attitude imposed by the monocular measure-
ment method are thus effectively addressed, along with the issues related to accuracy and
robustness in traditional crack detection methods. Moreover, in contrast to conventional
binocular vision-based 3D reconstruction methods that rely heavily on feature matching
prior to point cloud computation, the proposed method employs projective reconstruction,
which significantly alleviates computational expenses and eliminates potential mismatches
between the two views.
2. Methodology
2.1. Overview
The proposed method consists of three parts, as depicted in Figure 1, which illustrates
the overall workflow schematically. (I) Crack data acquisition: a tailored binocular system
is constructed for capturing visible cracks from multiple angles at flexible distances, ren-
dering it ideal for UAV-aided crack inspection. The captured image pairs subsequently
serve as primary data to detect cracks. (II) Crack pixel-level detection: to achieve precise
segmentation of cracks in the main images from primary data, a semantic segmentation
network (i.e., the encoder–decoder FCN) is constructed with a VGG19-based encoder net-
work and a decoder network featuring the deconvolution layer as its core. The resulting
binary image is further exploited to extract pixels that characterize the morphology of the
crack. (III) Crack quantitative assessment: at this stage, a binocular vision-based projection
Sensors 2023, 22, x FOR PEER REVIEW 5 of 26
Sensors 2024, 24, 3 encoder network and a decoder network featuring the deconvolution layer as its core. The 5 of 23
resulting binary image is further exploited to extract pixels that characterize the morphol-
ogy of the crack. (III) Crack quantitative assessment: at this stage, a binocular vision-based
projection reconstruction
reconstruction model is employed
model is employed for spatialfor spatial localization
localization of the concrete
of the cracked cracked con-
surface
cretesubsequent
and surface and 3D
subsequent 3D crack reconstruction
crack reconstruction by projecting
by projecting pixels extracted
pixels extracted in the in the
previous
previous
stage ontostage onto it.the
it. Finally, Finally, the morphological
morphological characteristics
characteristics of cracksofare
cracks are quantita-
quantitatively calcu-
tivelybased
lated calculated based
on the on thereconstructed
discrete discrete reconstructed
points. points. A detailed
A detailed description
description of each
of each part is
part is presented
presented below. below.
Figure 1.
Figure 1. The
The overall
overall workflow
workflowofofthe
themethod.
method.(The # represents
(The thethe
# represents specific numerical
specific results
numerical for for
results
different cracks.).
different cracks.).
2.2. Crack
2.2. Crack Data Acquisition
Acquisition
To facilitate
To facilitate the
the UAV
UAV assistance,
assistance,aapair
pairofofidentical industrial
identical industrialcharge-coupled device
charge-coupled device
(CCD) cameras from Microvision, a supplier specialized in visual products,
(CCD) cameras from Microvision, a supplier specialized in visual products, are rigidly are rigidly
assembled for
assembled foraalightweight
lightweightand
andcompact
compactbinocular
binocularphotography
photography system.
system. The
The specifications
specifica-
tions for each component are comprehensively presented in Table 1,
for each component are comprehensively presented in Table 1, where the outgoingwhere the outgoingfocal
length f is 16 mm, with a pixel size ∆u·∆v of 3.75 × 3.75 µm2 . According to the pinhole
model depicted in Figure 2a, the resolution of a single camera at an operating distance D of
200 ± 50 mm is approximately 0.047 ± 0.012 mm/pixel, which is adequate for capturing
crack details. Moreover, to take into account the public field of view (Figure 2b), the relative
pose of two cameras is adjusted with a narrow baseline (denoted as B and set to 5 cm)
focal length f is 16 mm, with a pixel size ∆u·∆v of 3.75 × 3.75 μm2. According to the pinhole
model depicted in Figure 2a, the resolution of a single camera at an operating distance D
Sensors 2024, 24, 3 of 200 ± 50 mm is approximately 0.047 ± 0.012 mm/pixel, which is adequate for capturing 6 of 23
crack details. Moreover, to take into account the public field of view (Figure 2b), the rela-
tive pose of two cameras is adjusted with a narrow baseline (denoted as B and set to 5 cm)
andthe
and theintersecting
intersectingoptical
opticalaxes
axes(realized
(realizedbybyaaleft
leftdeviation
deviationofofthe
theright
rightcamera
cameraatatangle
angle
of roughly
θθ of 20◦ ),as
roughly20°), asshown
shownin inFigure
Figure2c.
2c.For
Forthe
thesubsequent
subsequentdescription,
description,the
theleft
leftcamera
camera
isisdesignated
designatedas asthe
themain
maincamera
cameraalong
alongthe
theshooting
shootingdirection,
direction,while
whilethe
theright
rightcamera
cameraisis
designated as the positioning camera. These two cameras capture images
designated as the positioning camera. These two cameras capture images of target cracks of target cracks
synchronously to form stereo image pairs, which are then transmitted in real time totothe
synchronously to form stereo image pairs, which are then transmitted in real time the
inspector’s laptop.
inspector’s laptop.
Detailedspecifications
Table1.1.Detailed
Table specificationsof
ofthe
thebinocular
binocularsystem.
system.
Component Model
Model Specification
Sensor
Sensor resolution:1280
resolution: 1280××960
960pixels
pixels
Pixel
Pixelsize:
size:3.75
3.75××3.75
3.75(μm)
(µm)
CCD grayscale
CCD camera@2
grayscale camera@2 MV-EM120M
MV-EM120M
Size: 29××3535
Size:29 × 48.9
× 48.9 (mm)(mm)
Weight:50
Weight: 50gg
Focal
Focallength:
length:16
16mmmm
Industrial fixed-focus
Industrial lens@2
fixed-focus lens@2 BT-118C1620MP5
BT-118C1620MP5 Size: φ27.2 × 26.4
Size: φ27.2 × 26.4(mm)
(mm)
Weight:75
Weight: 75gg
Figure
Figure2.2.Considerations
Considerationsofofthe thebinocular
binocularsystem:
system:(a)(a)aapinhole
pinholemodel
modelfor
forresolution
resolutionand
anddistance
distance
trade-off;
trade-off;(b)
(b)public
publicfield
fieldofofview
viewofoftwo
twospecifically
specificallymounted
mountedcameras;
cameras;and
and(c)
(c)overhead
overheadperspective
perspective
ofof(b).
(b).
2.3.Crack
2.3. CrackPixel-Level
Pixel-LevelDetection
Detection
Theaccurate
The accurateand
andefficient
efficient characterization
characterization of crack
of crack morphology
morphology is a is a prerequisite
prerequisite for
for real-time image measurement of concrete cracks. To accomplish
real-time image measurement of concrete cracks. To accomplish this, a specialized this, a specialized
en-
encoder–decoder
coder–decoder FCNFCN is developed
is developed forfor detecting
detecting cracks
cracks atat thepixel
the pixellevel.
level.Subsequently,
Subsequently,an an
integratedcomputer
integrated computervision
vision(CV)
(CV)program
programisiswritten
writtentotoenable
enablerapid
rapidextraction
extractionofofthe
theedges
edges
andskeletons
and skeletonsthat
thatcharacterize
characterizethe
thecrack
crackmorphology
morphologyfrom fromthe
theFCN
FCNpredictions.
predictions.
employing transfer learning [78,79] based on pre-trained parameters of VGG can not only
significantly reduce the overall training time of the FCN model but also effectively enhance
its performance in scenarios with limited training data, the VGG19-based encoder network
is adopted to extract essential features for semantic segmentation. As shown in Figure 3a,
the encoder network is topologically identical to the first 16 layers of VGG19, consisting
of five convolutional blocks (also referred to as encoders in this paper) that include all
convolutional layers, nonlinear activation layers utilizing the ReLU function and pooling
layers. Since the encoder module does not involve neuron classification, the final softmax
Sensors 2023, 22, x FOR PEER REVIEW 8 of 26
layer of VGG19 is excluded, while the fully connected layers are replaced by convolutional
layers with two dropout layers added in between to prevent overfitting.
Figure3.3.(a)
Figure (a)Encoder
Encodernetwork
networkand
and(b)
(b)decoder
decodernetwork
networkofofFCN.
FCN.
The decoder
Inheriting thenetwork
strengths employs
of VGG19,deconvolutional
each encoder upsampling to generate aoperations
conducts convolution dense out-
through
put andthe stacking
rescales the of 3 ×to3the
data filters (i.e., convolution
original input size. To kernels)
minimizewiththe
a fixed stride
loss of length
details of
during
1the
pixel, which ensures
decoding process, the equivalent receptive field as larger-size filters, while
skip connection structure proposed by Bang et al. [62] is adoptedextracting
higher-level
to facilitate features
the flowwith fewer maps
of feature parameters of the
from the convolution
upstream kernel.
encoders to Moreover, ReLU ac-
their corresponding
tivation is applied
downstream following each
counterparts, which convolution to introduce
enables effective nonlinearity,
integration thereby enhancing
of multi-scale and multi-
the nonlinear
level fitting capability
local information. of the each
Specifically, encoder network.
decoder To eliminate
selectively fuses theredundant informa-
local feature map
tion
withand
the to accelerate
upstream computational
feature map at thespeed,
expensetheofmax pooling
increased operation
memory is subsequently
consumption.
performed
Referring 2 ×the
on a to 2 pixel window
decoder withdepicted
network a stride ofin2,Figure
which3b, results in downsampling
the max of
pooling outputs
labeled as ①, ②, ③ and ④ are initially individually convolved with a 1 × 1 kernel for
densification purposes. The subsequent outputs are considered to hold local information
originating from the upstream network (i.e., the encoder network) and are then arithmet-
ically added (represented by “⊕” in Figure 3b) to the upsampling results of identical res-
olution obtained through deconvolution with a 4 × 4 kernel with a two-pixel stride. The
Sensors 2024, 24, 3 8 of 23
the output by a factor of 2. It is noteworthy that the outputs of the first four max pooling
layers, numbered ⃝, 4 ⃝,3 ⃝ 2 and ⃝,1 will also be recycled by the decoder network. Due to
the three newly substituted convolution layers, namely Conv_layer 17,18 and 19, the final
output is transformed from the initial class probabilities into a low-resolution feature map
that characterizes the crack, which is subsequently fed into the decoder module.
The decoder network employs deconvolutional upsampling to generate a dense output
and rescales the data to the original input size. To minimize the loss of details during the
decoding process, the skip connection structure proposed by Bang et al. [62] is adopted
to facilitate the flow of feature maps from the upstream encoders to their corresponding
downstream counterparts, which enables effective integration of multi-scale and multi-level
local information. Specifically, each decoder selectively fuses the local feature map with the
upstream feature map at the expense of increased memory consumption.
Referring to the decoder network depicted in Figure 3b, the max pooling outputs
labeled as ⃝, 1 ⃝,
2 ⃝ 3 and ⃝ 4 are initially individually convolved with a 1 × 1 kernel for
densification purposes. The subsequent outputs are considered to hold local information
originating from the upstream network (i.e., the encoder network) and are then arith-
metically added (represented by “⊕” in Figure 3b) to the upsampling results of identical
resolution obtained through deconvolution with a 4 × 4 kernel with a two-pixel stride.
The entire decoder network integrates the outputs from the final layer and the first four
max pooling layers of the encoder network, wherein each fused feature map undergoes
a doubling in resolution through upsampling with a stride of 2. After five upsamplings,
the output of conv_layer 19 is expanded to match the dimensions of the original input and
then proceeds through the softmax layer, where the softmax function value determines the
probability of a single pixel belonging to either the “crack” or “background” categories.
Ultimately, a binary image is exported as the final prediction, with “crack” pixels assigned
a value of 1, while the “background” pixels assigned a value of 0.
Figure 4. Procedures for crack edge and skeleton extraction: (a) flow chart; (b) FCN prediction;
Figure 4. Procedures for crack edge and skeleton extraction: (a) flow chart; (b) FCN prediction; (c)
(c) refined
refined crack
crack region;
region; (d) crack
(d) crack edges;
edges; (e) original
(e) original crack skeleton
crack skeleton (The red(The red
lines lines represent
represent the
the pruned
pruned excess crack branches and the yellow lines represent the crack skeletons.); and (f) outputs
excess crack branches and the yellow lines represent the crack skeletons.); and (f) outputs of crack of
crack morphology.
morphology.
Figure
Figure 5.5. Crack
Crack plane
plane location:
location: (a)
(a) stereo
stereo image
image pair;
pair; (b)
(b) feature
feature point
point extraction;
extraction; (c)
(c) feature
feature point
point
matching with randomly
matching with randomly selected three-point pair; and
and (d) binocular vision model to calculate
(d) binocular vision model to calculate the
the
spatial location
spatial location points.
points.
Previously,
The binocular the scale-invariant
photography feature systemtransform
is simplified (SIFT) algorithm
into proposed
a binocular by model,
vision Lowe [81]
as
was successfully applied to extract
l features
l l from crack images [56,82], showcasing its ro-
illustrated in Figure 5d. Here, OC - XC YC ZC represents the main camera coordinate sys-
l
bustness to rotationl and translation, as well as its capability to handle variations in lighting
tem (m-CCS), O1 - x l y l and O0l - ul v l denote the physical and pixel coordinate systems
conditions and viewpoints. Our approach employs the SIFT algorithm for scale space
on the main
filtering image,
of stereo respectively;
image the positioning
pairs, facilitating cameraofcoordinate
the detection system
feature points (p-CCS),
across i.e.,
multiple
OC - XCFor
r r r r
n o
YC Zthe ( k ) ( k ) ( k ) ( k )
C , is situated on thepair
rightI(kside
scales. kth stereo image ) = with
I , theI two corresponding
, with image coordinate
I and I representing the k
1 2 1 2
systems
main O1l -and
image and O0l - ul v l ; image,
x l y l the and p1( p1(upr , vpr ) represent
upl , vpl ) and the the projected
n positioning respectively,
o n extracted feature point
o sets are
(k) (k) (k) (k) (k) (k) (k)
denoted aF
pixels ofas 1 = point
specific P1()|Xi P=,Y1P ., .Z.PP) on
( p1,i , f1,i theF2crack
and ( p2,j , fin
= plane )| j = 1coordinate
2,j world . . . Q , where f1,i
system
OW -f(X
and
k) Y Z (WCS), as captured by the two imagingto planes,
featurerespectively.
(k)
2,j Ware
W the
W local feature descriptors corresponding point positions p1,i and
(k) Taking point P1 as an example for calculation, assuming WCS (coincides k) (k) with
(k) m-
p2,j , respectively. On this basis, the first two nearest neighbors of l(p1l , f1 ) ∈ F1 are
CCS, the projection relationship between P1(XP ,YP(,kZ )P
) and p1(up , vp ) is given by the
searched with Euclidean distance in the query set F2 by applying the nearest neighbor
following:
algorithm. The optimal matches are then obtained through a threshold of 0.5 to the ratio
between the Euclidean distances
upl of the nearest n Xand P second-nearest lneighbors.oThe matching
(k) (k)f k(k) 1 (k)u0 (k)XP (k)
l l
result is a set of feature point pairs, i.e., ( pY , p ) p ∈ I , p ∈ I , from which
ZP vpl = A1 I 3 O31 1P =2 0 1 f l l1l v0l 2 YP 2 (1)
are randomly selected.
three pairs of location points ZP
system is simplified 0 0 1 Z
P vision model, as
1
The binocular photography
l 1 l
into a binocular
l l
illustrated in Figure 5d. Here, OC − XC YC ZC represents the main camera coordinate
system A1 is the O
where (m-CCS), intrinsic
l − x l ylmatrix
and Oof l −theul vmain
l denotecamera, with f l and
the physical the pixel length, (u0l ,sys-
focal coordinate v0l )
1 0
tems on the
the pixel main image,
coordinates respectively;
of the principal point O1l , as wellcamera
the positioning as k l coordinate
and l l thesystem
physical(p-CCS),
length
r r r r
i.e., OC − XC YC ZC , is situated on the right side with the two corresponding image co-
of the pixel unit along the u -axis and v -axis directions, respectively; 1 is the param-
l l
ordinate systems O1l − x l yl and O0l − ul vl ; and p1(ulp , vlp ) and p1′ (urp , vrp ) represent the
eter characterizing the skew of the two image axes, which is typically zero; I 3 denotes
projected pixels of a specific point P1( XP , YP , ZP ) on the crack plane in world coordinate
the 3×3Ounit
system matrix, while O31asrepresents
W − XW YW ZW (WCS), captured by thethe3×1
twozero vector.
imaging planes, respectively.
The projection formula from P1(XP ,YP , ZP ) to p1(up , vp ) is coincides
Taking point P1 as an example for calculation, assuming r
WCS
r
with m-CCS,
simultaneously estab-
the projection relationship between P1( XP , YP , ZP ) and p1(ulp , vlp ) is given by the following:
lished by utilizing the relative pose of the two cameras, as demonstrated below:
XP1 X P XP1
ulp uPr 1 f Rf 11 /kf R12 γf1 R13 uf0l t x XP
r l l r r r
Y
r=A I O YP1 P = r Y
f R23 vf0r t y YPP1
ZP ZvPl1p l rl l
vP1 =1 A23 R t3×1Z Z = A2 f R210 f r R22f /l (1)
(2)
P Z
1 1 1 0 0 1 ZPP1
t z
P1
R31 R32 R33
1 1
Sensors 2024, 24, 3 11 of 23
where A1 is the intrinsic matrix of the main camera, with f l the focal length, (u0l , v0l ) the
pixel coordinates of the principal point O1l , as well as kl and l l the physical length of
the pixel unit along the ul -axis and vl -axis directions, respectively; γ1 is the parameter
characterizing the skew of the two image axes, which is typically zero; I3 denotes the
3 × 3 unit matrix, while O3×1 represents the 3 × 1 zero vector.
The projection formula from P1( XP , YP , ZP ) to p1′ (urp , vrp ) is simultaneously estab-
lished by utilizing the relative pose of the two cameras, as demonstrated below:
XP1 X
urP1 f r t x P1
r
f r R12 f r R13
YP1 – f R11
YP1
ZP1 vrP1 = A2 R = A2 f r R21 f r R22 f r R23 f r ty
t
ZP1 ZP1 (2)
1 R31 R32 R33 tz
1 1
where A2 represents the positioning camera intrinsic matrix, which is structurally and
–
parametrically equivalent to A1 ; A2 = A2 × diag(1/ f r , 1/ f r , 1), with diag symbolizing the
diagonal matrix; and R = [ Rij ]3×3 and t = [t x , ty , tz ] T are the rotation matrix and translation
vector, respectively, of the main camera relative to the positioning camera in the binocular
system, serving as its external parameters.
From Equations (1) and (2), the spatial coordinates of the point P1 can be obtained:
x lp
X P = ZP (3)
fl
ylp
YP = ZP (4)
fl
f l ( f r t x − xrp tz )
ZP =
xrp ( x lp R31 + ylp R32 + f l R33 ) − f r ( x lp R11 + ylp R12 + f l R13 )
(5)
f l ( f r ty − yrp tz )
=
yrp ( x lp R31 + ylp R32 + f l R33 ) − f r ( x lp R21 + ylp R22 + f l R23 )
where ( x lp , ylp ) and ( xrp , yrp ) are the physical coordinates of the projected pixels p1(ulp , vlp )
and p1′ (urp , vrp ), respectively, which can be expressed as follows:
x lp
l
kl 0 0 0 −kl u0l up
yl vl
p ll 0 0 l l
− l v0 p
r
kr −kr u0r
r
x p = 0 u (6)
p
lr r r
r
y p −l v0 vrp
1 1 1
According to Equations (5) and (6), the mapping relationship between a pair of ho-
mologous pixels to its spatial source point is established. With the internal and external
parameters obtained from calibration, the location of the cracking plane can be determined
in m-CCS.
Figure 6.6.Central
Figure Centralprojection for crack
projection reconstruction:
for crack (a) central
reconstruction: (a) projection model; (b)model;
central projection coordinate
(b) coordinate
transformation on the main image; and (c) projection point calculation.
transformation on the main image; and (c) projection point calculation.
After establishing a unified reference system with Equation (8), the projection points
After establishing a unified reference system with Equation (8), the projection points
on the easel plane are calculated. As shown in Figure 6c, n = (nx →, ny , nz ) is the normal
on the easel plane are calculated. As shown in Figure 6c, n = (n x , ny , nz ) is the normal
vector of the spatial cracking plane, determined by vectors P1, P 2 and →P1, P 3 ; the crack
→
vector of the spatial cracking plane, determined by vectors P1, P2 and P1, P3; the crack
pixel pi (xi , yi , zi ) serves as a particular point on the projection line li , while →
li = ( xip,iy(i x, zi ,i )yi is
pixel i ) serves
, zthe as vector
direction a particular point on
of li , pointing thethe
from projection
projectionline
center OCl tol ip=
li , while
i
( xi , yi , zi )
l
is the direction vector of li , pointing from the projection center OC to pi ; and Pi ( Xi , Yi , Zi ) is
the desired projection point. The equation for the intersection point is as follows:
→
→ ( Xi − XP1 , Yi − YP1 , Zi − ZP1 ) · (n x , ny , nz ) = 0
P1, Pi ⊥ n
→ ⇒ Xi − x i Y − yi Z − zi (9)
→
p , P // l = i = i =λ
xi yi zi
i i
Sensors 2024, 24, 3 13 of 23
3. Training FCN
3.1. Crack Segmentation Database
To train the FCN models, 50 photos of cracked concrete taken using a smartphone
with a resolution of 4032 × 3024 × 3 and saved in JPG format are manually labeled at the
pixel level using the MATLABR tool Image Labeler. Figure 7 depicts this labeling process,
in which logical variables 0 and 1 are, respectively, assigned to background and crack pixels
through pixel labels, with annotations saved in PNG-8 format. Subsequently, 110 images
are cropped from these photos, each featuring either a crack or an intact background with
448 × 448 pixel resolution. These images, along with 334 web images of the same resolution,
undergo data augmentation techniques including horizontal and vertical flips, resulting in
a total of 1332 images. According to the fivefold cross-validation principle, the generated
images are randomly divided into training, validation and test sets with 998, 110 and
224 images, respectively, in each set. Notably, a network trained on small-sized images can
scan any image larger than that designed size [36]. Therefore, the randomly selected
Sensors 2023, 22, x FOR PEER REVIEW 15 ofimages
26
and their annotations are resized to 224 × 224 pixels prior to being fed into the models.
3.2. Implementation
3.2. Implementation Parameters
Parameters
The learning
The learning rate
rateplays
playsaapivotal
pivotalrole
roleinin
balancing
balancingconvergence
convergence speed andand
speed stability in in
stability
training a CNN. In order to choose an appropriate initial value for this key hyperparam-
training a CNN. In order to choose an appropriate initial value for this key hyperparameter,
eter, three
three sets sets of models
of models areare meticulouslytrained,
meticulously trained, each
eachwith
withdistinct
distinct initial learning
initial rates:
learning rates:
0.001, 0.0001 and 0.00001, respectively. Throughout these training sessions,
0.001, 0.0001 and 0.00001, respectively. Throughout these training sessions, exponential exponential
stepwise decay,
stepwise decay,a acommon
commontechnique
techniqueforfor annealing
annealing learning
learning rates,
rates, is employed
is employed postpost
epochs
epochs to reduce oscillations in the loss function around the global optimum. The decay
function is as follows:
t
t = 0 rd tmax (13)
Sensors 2024, 24, 3 14 of 23
to reduce oscillations in the loss function around the global optimum. The decay function
is as follows: t
ηt = η0 × rd ⌊ tmax ⌋ (13)
where the initial learning rate is denoted by η0 , rd is the decay rate with t as the current
count of iterations and tmax as the preset iterations for decay. ⌊·⌋ represents the floor
operation, returning the largest integer not greater than the input value.
To assess the discrepancy between the prediction and the ground truth, cross entropy
is utilized as the loss function on pixels. With exponential decay rates set to β 1 = 0.9 and
β 2 = 0.999, the Adam optimizer is then run for training loss optimization by iteratively
updating the model parameters. The FCN models are trained with 20 epochs, and the batch
size is set to 2 (taking into account the limitations of GPU memory). In addition, a dropout
rate of 0.5 is implemented to activate only half of the hidden nodes or feature detectors
during each iteration, thereby weakening their interactions and effectively preventing
overfitting [83,84].
TP
Precision = (14)
TP + FP
TP
Recall = (15)
TP + FN
2 × Precision × Recall
F1 − score = (16)
Precision + Recall
where TP, FP and FN denote the number of pixels with True Positives, False Positives and
False Negatives in the predicted outcomes, respectively.
loss curves associated with the other two learning rates, i.e., 1 × 10−3 and 1 × 10−5 , also
demonstrate satisfactory convergence results, remaining stable at around 0.021
Sensors 2023, 22, x FOR PEER REVIEW and 0.018,
17 of 26
respectively, which are sufficient for attaining global optimization.
To test the effectiveness of the proposed FCN in detecting cracks of various morpho-
logical types and background complexities, the crack images in the test set are pre-divided
into four categories. (I) Hairline crack: the cracks are narrowly developed and susceptible
to changes in illumination, often resulting in fuzzy or discontinuous patterns. (II) Block
crack: the crack region exhibits a blocky pattern and occupies a significantly substantial
portion of the image. (III) Intersecting crack: the interconnected cracks show an intricate
morphology. (IV) Complex background crack: the cracks in backgrounds with complex tex-
tures, speckling, shadows caused by uneven lighting, or clutter are challenging to discern
through traditional methods.
Figure 9 depicts the FCN predictions of the above four crack types. Figure 9a–c
demonstrates the segmentation results for different types of crack morphologies. The
test results indicate that the proposed model exhibits good performance in accurately
segmenting hairline cracks, block cracks and intersecting cracks. The segmentation of
cracks under diverse and challenging conditions, including complex backgrounds and
varied lighting scenarios, is also tested and compared (Figure 9e–i). In addition, Figure 9j,k
display the prediction results for intact surfaces. The results demonstrate
Sensors 2023, 22, x FOR PEER REVIEW 18 of 26 the robustness
of the proposed model in handling various noise interference. Therein, the predictions
ofdisplay
Figure 9a,c,d,g–j exhibit a significant level of agreement with ground truth. However,
the prediction results for intact surfaces. The results demonstrate the robustness
there are minor
of the proposed model inaccuracies in Figure
in handling various 9b (the left
noise interference. sample)
Therein, and Figure
the predictions of 9f, which might be
attributed to the
Figure 9a,c,d,g–j insufficient
exhibit a significantvariation in gradient
level of agreement of pixel
with ground values,
truth. However, leading to oversight of
there are minor inaccuracies in Figure 9b (the left sample) and Figure 9f, which might be
the microcracks located at the bottom. In Figure 9k, a few pixels of the backgrounds are
attributed to the insufficient variation in gradient of pixel values, leading to oversight of
falsely classified
the microcracks as at
located cracks, possibly
the bottom. due
In Figure 9k,to thepixels
a few combined interference
of the backgrounds are of overexposure and
overlapping
falsely classifiedblack markings.
as cracks, possibly due to the combined interference of overexposure and
overlapping black markings.
Figure 9. FCN predictions: (a) hairline crack; (b) block crack; (c) intersecting crack; (d) complex
Figure 9. FCN
background predictions:
crack (mottling); (a) hairline
(e) complex backgroundcrack; (b) block (f)
crack (interference); crack; (c)background
complex intersecting crack; (d) complex
crack (clutter); (g) complex background crack (void); (h) different light condition (overexposure); (i)
background crack (mottling); (e) complex background crack (interference); (f) complex background
different light condition (uneven illumination); (j) intact surface (correct sample); and (k) intact sur-
crack (clutter);
face (some (g)
pixels are complex
False Positives).background crack (void); (h) different light condition (overexposure);
(i) different light condition (uneven illumination); (j) intact surface (correct sample); and (k) intact
Although the overall accuracy of FCN segmentation is somewhat compromised due
surface
to these (some
omissions pixels are False
in detail, Positives).
the extracted crack edges and skeletons still maintain an ac-
ceptable level of validity (Figure 10).
Sensors 2024, 24, 3 17 of 23
Figure 10. Extracted crack morphologies (The green lines represent the detected crack edges and the
Figure 10. Extracted
yellow lines crack
represent the morphologies
detected (The green
crack skeletons.): linescrack;
(a) hairline represent thecrack;
(b) block detected crack edges and the
(c) intersect-
yellow lines
ing crack; (d) represent the detected
complex background crack
crack skeletons.):
(mottling); and (e) (a) hairline
complex crack; (b)
background block
crack crack; (c) intersecting
(clutter).
crack; (d) complex background crack (mottling); and (e) complex background crack (clutter).
4. Experiment
4. Experiment
In this section, an experiment is conducted to detect cracks in concrete specimens
subjected to static
In this load an
section, tests, with the aimisofconducted
experiment verifying thetopractical
detectfeasibility
cracks inof concrete
the pro- specimens
subjected to static load tests, with the aim of verifying the practicalside
posed method. The damaged concrete beams and slabs are neatly arranged on one of
feasibility of the
the laboratory, and the binocular photography system is positioned approximately 0.2 m
proposed method. The damaged concrete beams and slabs are neatly arranged on one
away from these cracked concretes. The aperture is adjusted accordingly to optimize ex-
side of and
posure the capture
laboratory,cracksand the binocular
in natural photography
indoor lighting, system is positioned
while simultaneously approximately
recording the
0.2
manually measured values of both crack width gauges with a 0.01 mm accuracy and crack to optimize
m away from these cracked concretes. The aperture is adjusted accordingly
exposure and capture
ruler as reference values cracks in natural
for the actual crack indoor
width. lighting, while simultaneously recording the
The experimental setup is illustrated
manually measured values of both crack width in Figure 11, and a total
gauges of four
with cracks
a 0.01 mm have been
accuracy and crack
identified. Among them, three complex background
ruler as reference values for the actual crack width. cracks, designated as CrackⅠ, CrackⅡ
and CrackⅢ, respectively, originating from the same beam specimen are artificially di-
The experimental setup is illustrated in Figure 11, and a total of four cracks have been
vided into multiple fragments before photographing, that is, the crack areas between black
identified. Among them, three complex background cracks, designated as CrackI, CrackII
dashed lines in Figure 11a, to enhance the quantity of control groups for comparison. Ad-
and CrackIII, respectively,
ditionally, as shown in Figure originating
11b, the fourthfrom the
block same
crack beam specimen
is denoted are artificially
as CrackIV_01, which divided
into multiple fragments before photographing, that is, the crack areas between black dashed
lines in Figure 11a, to enhance the quantity of control groups for comparison. Additionally,
as shown in Figure 11b, the fourth block crack is denoted as CrackIV_01, which is observed
on a slab specimen and shot from the overhead perspective at a certain angle between the
Sensors 2024, 24, 3 18 of 23
optical axis plane and the structural surface normal. The measured results are summarized
in Tables 3–5, where the maximum error is 0.144 mm, corresponding to a relative error of
36.0%. This is attributed to the non-negligible prediction bias of FCN
Sensors 2023, 22, x FOR PEER REVIEW
for CrackI_01. Hence,
21 of 26
it is imperative to further optimize the performance of FCN for detecting hairline cracks.
Figure 11.
Figure 11.Concrete crack detection
Concrete and measurement
crack detection experiment: (a) divided
and measurement crack fragments
experiment: (the crack fragments
(a) divided
crack segment numbering corresponds to the numbering in the bottom left corner of the crack im-
(the crack segment numbering corresponds to the numbering in the bottom left corner of the crack
ages in (c)); (b) binocular device overlooking a crack; and (c) visualization of the results for certain
images in (c)); (b) binocular device overlooking a crack; and (c) visualization of the results for certain
fragments.
fragments.
5. Conclusions and Discussion
In this paper, a non-contact method for detecting and measuring cracks is proposed
Table 3. Results of maximum width measurement for CrackI, CrackIII_06 and CrackIV_01.
by combining a semantic segmentation network, specifically the encoder–decoder FCN,
Figure 11c presents the visible outcomes of certain crack fragments, among which the
refined red region effectively demonstrates the generalization capability of our FCN, while
the low error level further substantiates the validity of the proposed measurement method.
Specifically, CrackII_03 has achieved the most accurate quantification, with an error of
only 0.006 mm. As anticipated, CrackIV_01, exhibiting a calculated error of −0.069 mm,
confirms the binocular vision-based approach’s capability to maintain high measurement
accuracy even under oblique shooting conditions, thereby highlighting its superiority over
the monocular vision method in terms of shooting posture. Although the morphology
of CrackIII_06 is successfully extracted despite the interference of the strain gauge wire
and the shadow caused by this wire in the lower left corner, the associated error exhibits
a substantial increase in comparison to CrackIII_01, reaching 0.093 mm. One possible
explanation for this is that the uneven concrete surface renders the proposed method
inapplicable. Apart from displaying maximum values of crack width, their specific location
are also indicated through white bidirectional arrows, thereby offering a valuable reference
for re-inspection.
Author Contributions: Conceptualization, Z.Z. and H.Z.; Writing—original draft, J.L.; Writing—review
& editing, Z.Z., Z.S. and H.Z.; Supervision, Z.Z., J.S. and H.Z. All authors have read and agreed to
the published version of the manuscript.
Funding: The authors acknowledge the support from the National Key R&D Program of China (grant
No. 2020YFA0711700), the National Natural Science Foundation of China (grant Nos. U23A20659,
52122801, 11925206, 51978609 and U22A20254) and the Foundation for Distinguished Young Scientists
of Zhejiang Province (grant No. LR20E080003).
Data Availability Statement: Data are contained within the article.
Acknowledgments: The authors express their appreciation to Feilei Chen for assistance with
this work.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Kayondo, M.; Combrinck, R.; Boshoff, W.P. State-of-the-art review on plastic cracking of concrete. Constr. Build. Mater. 2019, 225,
886–899. [CrossRef]
2. Wang, H.L.; Dai, J.G.; Sun, X.Y.; Zhang, X.L. Characteristics of concrete cracks and their influence on chloride penetration. Constr.
Build. Mater. 2016, 107, 216–225. [CrossRef]
3. Zhang, H.; Zhou, Y.H.; Quan, L.W. Identification of a moving mass on a beam bridge using piezoelectric sensor arrays. J. Sound.
Vib. 2021, 491, 115754. [CrossRef]
4. Aboudi, J. Stiffness Reduction of Cracked Solids. Eng. Fract. Mech. 1987, 26, 637–650. [CrossRef]
5. Chupanit, P.; Roesler, J.R. Fracture energy approach to characterize concrete crack surface roughness and shear stiffness. J. Mater.
Civil. Eng. 2008, 20, 275–282. [CrossRef]
6. Güllü, H.; Canakci, H.; Alhashemy, A. Use of ranking measure for performance assessment of correlations for the compression
index. Eur. J. Environ. Civ. Eng. 2018, 22, 578–595. [CrossRef]
7. Güllü, H.; Canakci, H.; Alhashemy, A. A Ranking Distance Analysis for Performance Assessment of UCS Versus SPT-N
Correlations. Arab. J. Sci. Eng. 2019, 44, 4325–4337. [CrossRef]
8. Jahanshahi, M.R.; Kelly, J.S.; Masri, S.F.; Sukhatme, G.S. A survey and evaluation of promising approaches for automatic
image-based defect detection of bridge structures. Struct. Infrastruct. Eng. 2009, 5, 455–486. [CrossRef]
Sensors 2024, 24, 3 21 of 23
9. Jiang, W.B.; Liu, M.; Peng, Y.N.; Wu, L.H.; Wang, Y.N. HDCB-Net: A Neural Network With the Hybrid Dilated Convolution for
Pixel-Level Crack Detection on Concrete Bridges. IEEE Trans. Ind. Inform. 2021, 17, 5485–5494. [CrossRef]
10. Zhang, H.; Zhou, Y.H.; Huang, Z.Y.; Shen, R.H.; Wu, Y.D. Multiparameter Identification of Bridge Cables Using XGBoost
Algorithm. J. Bridge Eng. 2023, 28. [CrossRef]
11. Huston, D.; Hu, J.Q.; Maser, K.; Weedon, W.; Adam, C. GIMA ground penetrating radar system for monitoring concrete bridge
decks. J. Appl. Geophys. 2000, 43, 139–146. [CrossRef]
12. Chen, S.-E.; Liu, W.; Bian, H.; Smith, B. 3D LiDAR Scans for Bridge Damage Evaluations. In Forensic Engineering 2012; ASCE
Library: New York, NY, USA, 2013; pp. 487–495. [CrossRef]
13. Valenca, J.; Puente, I.; Julio, E.; Gonzalez-Jorge, H.; Arias-Sanchez, P. Assessment of cracks on concrete bridges using image
processing supported by laser scanning survey. Constr. Build. Mater. 2017, 146, 668–678. [CrossRef]
14. Zhang, B.N.; Zhou, Z.X.; Zhang, K.H.; Yan, G.; Xu, Z.Z. Sensitive skin and the relative sensing system for real-time surface
monitoring of crack in civil infrastructure. J. Intell. Mater. Syst. Struct. 2006, 17, 907–917. [CrossRef]
15. Hurlebaus, S.; Gaul, L. Smart layer for damage diagnostics. J. Intell. Mater. Syst. Struct. 2004, 15, 729–736. [CrossRef]
16. Roopa, A.K.; Hunashyal, A.M.; Mysore, R.R.M. Development and Implementation of Cement-Based Nanocomposite Sensors for
Structural Health Monitoring Applications: Laboratory Investigations and Way Forward. Sustainability 2022, 14, 12452. [CrossRef]
17. Dorafshan, S.; Thomas, R.J.; Maguire, M. Comparison of deep convolutional neural networks and edge detectors for image-based
crack detection in concrete. Constr. Build. Mater. 2018, 186, 1031–1045. [CrossRef]
18. Zhang, H.; Zhou, Y. AI-Based Modeling and Data-Driven Identification of Moving Load on Continuous Beams. Fundam. Res.
2022, 3, 796–803. [CrossRef]
19. Yeum, C.M.; Dyke, S.J. Vision-Based Automated Crack Detection for Bridge Inspection. Comput.-Aided Civ. Infrastruct. Eng. 2015,
30, 759–770. [CrossRef]
20. Oh, J.K.; Jang, G.; Oh, S.; Lee, J.H.; Yi, B.J.; Moon, Y.S.; Lee, J.S.; Choi, Y. Bridge inspection robot system with machine vision.
Autom. Constr. 2009, 18, 929–941. [CrossRef]
21. Iyer, S.; Sinha, S.K. A robust approach for automatic detection and segmentation of cracks in underground pipeline images. Image
Vis. Comput. 2005, 23, 921–933. [CrossRef]
22. Fujita, Y.; Hamamoto, Y. A robust automatic crack detection method from noisy concrete surfaces. Mach. Vis. Appl. 2011, 22,
245–254. [CrossRef]
23. Lee, B.Y.; Kim, Y.Y.; Yi, S.T.; Kim, J.K. Automated image processing technique for detecting and analysing concrete surface cracks.
Struct. Infrastruct. Eng. 2013, 9, 567–577. [CrossRef]
24. Zhang, H.; Shen, M.Z.; Zhang, Y.Y.; Chen, Y.S.; Lu, C.F. Identification of Static Loading Conditions Using Piezoelectric Sensor
Arrays. J. Appl. Mech. 2018, 85, 011008. [CrossRef]
25. Nguyen, H.N.; Kam, T.Y.; Cheng, P.Y. An Automatic Approach for Accurate Edge Detection of Concrete Crack Utilizing 2D
Geometric Features of Crack. J. Signal Process. Syst. 2014, 77, 221–240. [CrossRef]
26. Sohn, H.G.; Lim, Y.M.; Yun, K.H.; Kim, G.H. Monitoring crack changes in concrete structures. Comput.-Aided Civ. Infrastruct. Eng.
2005, 20, 52–61. [CrossRef]
27. Ni, T.Y.; Zhou, R.X.; Gu, C.P.; Yang, Y. Measurement of concrete crack feature with android smartphone APP based on digital
image processing techniques. Measurement 2020, 150, 107093. [CrossRef]
28. Abdel-Qader, L.; Abudayyeh, O.; Kelly, M.E. Analysis of edge-detection techniques for crack identification in bridges. J. Comput.
Civ. Eng. 2003, 17, 255–263. [CrossRef]
29. Wang, K.C.P.; Li, Q.; Gong, W.G. Wavelet-based pavement distress image edge detection with a trous algorithm. Transp. Res. Rec.
2007, 2024, 73–81. [CrossRef]
30. Xiang, T.; Huang, K.X.; Zhang, H.; Zhang, Y.Y.; Zhang, Y.N.; Zhou, Y.H. Detection of Moving Load on Pavement Using
Piezoelectric Sensors. Sensors 2020, 20, 2366. [CrossRef]
31. Yamaguchi, T.; Hashimoto, S. Fast crack detection method for large-size concrete surface images using percolation-based image
processing. Mach. Vis. Appl. 2010, 21, 797–809. [CrossRef]
32. Adhikari, R.S.; Moselhi, O.; Bagchi, A. Image-based retrieval of concrete crack properties for bridge inspection. Autom. Constr.
2014, 39, 180–194. [CrossRef]
33. Payab, M.; Abbasina, R.; Khanzadi, M. A Brief Review and a New Graph-Based Image Analysis for Concrete Crack Quantification.
Arch. Comput. Methods Eng. 2019, 26, 347–365. [CrossRef]
34. Andrushia, A.D.; Anand, N.; Arulraj, G.P. A novel approach for thermal crack detection and quantification in structural concrete
using ripplet transform. Struct. Control Health Monit. 2020, 27, e2621. [CrossRef]
35. Liu, Y.F.; Cho, S.; Spencer, B.F.; Fan, J.S. Concrete Crack Assessment Using Digital Image Processing and 3D Scene Reconstruction.
J. Comput. Civ. Eng. 2016, 30, 04014124. [CrossRef]
36. Li, S.Y.; Zhao, X.F.; Zhou, G.Y. Automatic pixel-level multiple damage detection of concrete structure using fully convolutional
network. Comput.-Aided Civ. Infrastruct. Eng. 2019, 34, 616–634. [CrossRef]
37. Prasanna, P.; Dana, K.J.; Gucunski, N.; Basily, B.B.; La, H.M.; Lim, R.S.; Parvardeh, H. Automated Crack Detection on Concrete
Bridges. IEEE Trans. Autom. Sci. Eng. 2016, 13, 591–599. [CrossRef]
38. Peng, X.; Zhong, X.G.; Zhao, C.; Chen, A.H.; Zhang, T.Y. A UAV-based machine vision method for bridge crack recognition and
width quantification through hybrid feature learning. Constr. Build. Mater. 2021, 299, 123896. [CrossRef]
Sensors 2024, 24, 3 22 of 23
39. Alipour, M.; Harris, D.K.; Miller, G.R. Robust Pixel-Level Crack Detection Using Deep Fully Convolutional Neural Networks.
J. Comput. Civ. Eng. 2019, 33, 04019040. [CrossRef]
40. Zhang, H.; Shen, Z.J.; Lin, Z.H.; Quan, L.W.; Sun, L.F. Deep learning-based automatic classification of three-level surface
information in bridge inspection. Comput.-Aided Civ. Infrastruct. Eng. 2023. [CrossRef]
41. Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road Crack Detection Using Deep Convolutional Neural Network. In Proceedings of
the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3708–3712.
[CrossRef]
42. Cha, Y.J.; Choi, W.; Buyukozturk, O. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks.
Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [CrossRef]
43. Chen, F.C.; Jahanshahi, M.R. NB-CNN: Deep Learning-Based Crack Detection Using Convolutional Neural Network and Naive
Bayes Data Fusion. IEEE Trans. Ind. Electron. 2018, 65, 4392–4400. [CrossRef]
44. Kim, B.; Cho, S. Automated Vision-Based Detection of Cracks on Concrete Surfaces Using a Deep Learning Technique. Sensors
2018, 18, 3452. [CrossRef] [PubMed]
45. Cha, Y.J.; Choi, W.; Suh, G.; Mahmoudkhani, S.; Buyukozturk, O. Autonomous Structural Visual Inspection Using Region-Based
Deep Learning for Detecting Multiple Damage Types. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 731–747. [CrossRef]
46. Deng, J.H.; Lu, Y.; Lee, V.C.S. Concrete crack detection with handwriting script interferences using faster region-based convolu-
tional neural network. Comput.-Aided Civ. Infrastruct. Eng. 2020, 35, 373–388. [CrossRef]
47. Zhang, C.B.; Chang, C.C.; Jamshidi, M. Concrete bridge surface damage detection using a single-stage detector. Comput.-Aided
Civ. Infrastruct. Eng. 2020, 35, 389–409. [CrossRef]
48. Li, Y.D.; Li, H.G.; Wang, H.R. Pixel-Wise Crack Detection Using Deep Local Pattern Predictor for Robot Application. Sensors 2018,
18, 3042. [CrossRef] [PubMed]
49. Kim, B.; Cho, S. Image-based concrete crack assessment using mask and region-based convolutional neural network. Struct.
Control. Health Monit. 2019, 26, e2381. [CrossRef]
50. Zhang, A.; Wang, K.C.P.; Fei, Y.; Liu, Y.; Chen, C.; Yang, G.W.; Li, J.Q.; Yang, E.H.; Qiu, S. Automated Pixel-Level Pavement
Crack Detection on 3D Asphalt Surfaces with a Recurrent Neural Network. Comput.-Aided Civ. Infrastruct. Eng. 2019, 34, 213–229.
[CrossRef]
51. Yang, X.C.; Li, H.; Yu, Y.T.; Luo, X.C.; Huang, T.; Yang, X. Automatic Pixel-Level Crack Detection and Measurement Using Fully
Convolutional Network. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 1090–1109. [CrossRef]
52. Dung, C.V.; Anh, L.D. Autonomous concrete crack detection using deep fully convolutional neural network. Autom. Constr. 2019,
99, 52–58. [CrossRef]
53. Liu, Z.Q.; Cao, Y.W.; Wang, Y.Z.; Wang, W. Computer vision-based concrete crack detection using U-net fully convolutional
networks. Autom. Constr. 2019, 104, 129–139. [CrossRef]
54. Liu, J.W.; Yang, X.; Lau, S.; Wang, X.; Luo, S.; Lee, V.C.S.; Ding, L. Automated pavement crack detection and segmentation based
on two-step convolutional neural network. Comput.-Aided Civ. Infrastruct. Eng. 2020, 35, 1291–1305. [CrossRef]
55. Miao, Z.H.; Ji, X.D.; Okazaki, T.; Takahashi, N. Pixel-level multicategory detection of visible seismic damage of reinforced concrete
components. Comput.-Aided Civ. Infrastruct. Eng. 2021, 36, 620–637. [CrossRef]
56. Guan, J.C.; Yang, X.; Ding, L.; Cheng, X.Y.; Lee, V.C.S.; Jin, C. Automated pixel-level pavement distress detection based on stereo
vision and deep learning. Autom. Constr. 2021, 129, 103788. [CrossRef]
57. Zhang, X.X.; Rajan, D.; Story, B. Concrete crack detection using context-aware deep semantic segmentation network. Comput.-Aided
Civ. Infrastruct. Eng. 2019, 34, 951–971. [CrossRef]
58. Chen, T.Y.; Cai, Z.H.; Zhao, X.; Chen, C.; Lianga, X.F.; Zou, T.R.; Wang, P. Pavement crack detection and recognition using the
architecture of segNet. J. Ind. Inf. Integr. 2020, 18, 100144. [CrossRef]
59. Zheng, X.; Zhang, S.L.; Li, X.; Li, G.; Li, X.Y. Lightweight Bridge Crack Detection Method Based on SegNet and Bottleneck
Depth-Separable Convolution With Residuals. IEEE Access 2021, 9, 161649–161668. [CrossRef]
60. Ji, A.K.; Xue, X.L.; Wang, Y.N.; Luo, X.W.; Xue, W.R. An integrated approach to automatic pixel-level crack detection and
quantification of asphalt pavement. Autom. Constr. 2020, 114, 103176. [CrossRef]
61. Sun, Y.J.; Yang, Y.; Yao, G.; Wei, F.J.; Wong, M.P. Autonomous Crack and Bughole Detection for Concrete Surface Image Based on
Deep Learning. IEEE Access 2021, 9, 85709–85720. [CrossRef]
62. Bang, S.; Park, S.; Kim, H.; Kim, H. Encoder-decoder network for pixel-level road crack detection in black-box images. Comput.-
Aided Civ. Infrastruct. Eng. 2019, 34, 713–727. [CrossRef]
63. Li, G.; Li, X.Y.; Zhou, J.; Liu, D.Z.; Ren, W. Pixel-level bridge crack detection using a deep fusion about recurrent residual
convolution and context encoder network. Measurement 2021, 176, 109171. [CrossRef]
64. Zhang, L.; Jiang, F.L.; Yang, J.; Kong, B.; Hussain, A. A real-time lane detection network using two-directional separation attention.
Comput.-Aided Civ. Infrastruct. Eng. 2023. [CrossRef]
65. Chen, J.; He, Y. A novel U-shaped encoder–decoder network with attention mechanism for detection and evaluation of road
cracks at pixel level. Comput.-Aided Civ. Infrastruct. Eng. 2022, 37, 1721–1736. [CrossRef]
66. Chen, L.J.; Yao, H.D.; Fu, J.Y.; Ng, C.T. The classification and localization of crack using lightweight convolutional neural network
with CBAM. Eng. Struct. 2023, 275, 115291. [CrossRef]
Sensors 2024, 24, 3 23 of 23
67. Du, Y.C.; Zhong, S.; Fang, H.Y.; Wang, N.N.; Liu, C.L.; Wu, D.F.; Sun, Y.; Xiang, M. Modeling automatic pavement crack object
detection and pixel-level segmentation. Autom. Constr. 2023, 150, 104840. [CrossRef]
68. Yang, L.; Bai, S.L.; Liu, Y.H.; Yu, H.N. Multi-scale triple-attention network for pixelwise crack segmentation. Autom. Constr. 2023,
150, 104853. [CrossRef]
69. Zhu, G.J.; Liu, J.C.; Fan, Z.; Yuan, D.; Ma, P.L.; Wang, M.H.; Sheng, W.H.; Wang, K.C.P. A lightweight encoder-decoder network
for automatic pavement crack detection. Comput.-Aided Civ. Infrastruct. Eng. 2023. [CrossRef]
70. Lei, M.F.; Zhang, Y.B.; Deng, E.; Ni, Y.Q.; Xiao, Y.Z.; Zhang, Y.; Zhang, J.J. Intelligent recognition of joints and fissures in tunnel
faces using an improved mask region-based convolutional neural network algorithm. Comput.-Aided Civ. Infrastruct. Eng. 2023.
[CrossRef]
71. Que, Y.; Dai, Y.; Ji, X.; Leung, A.K.; Chen, Z.; Jiang, Z.L.; Tang, Y.C. Automatic classification of asphalt pavement cracks using a
novel integrated generative adversarial networks and improved VGG model. Eng. Struct. 2023, 277, 115406. [CrossRef]
72. Nguyen, Q.D.; Thai, H.T. Crack segmentation of imbalanced data: The role of loss functions. Eng. Struct. 2023, 297, 116988.
[CrossRef]
73. Weng, X.X.; Huang, Y.C.; Li, Y.A.; Yang, H.; Yu, S.H. Unsupervised domain adaptation for crack detection. Autom. Constr. 2023,
153, 104939. [CrossRef]
74. Kang, D.; Benipal, S.S.; Gopal, D.L.; Cha, Y.J. Hybrid pixel-level concrete crack segmentation and quantification across complex
backgrounds using deep learning. Autom. Constr. 2020, 118, 103291. [CrossRef]
75. Yuan, C.; Xiong, B.; Li, X.; Sang, X.; Kong, Q. A novel intelligent inspection robot with deep stereo vision for three-dimensional
concrete damage detection and quantification. Struct. Health Monit. 2022, 21, 788–802. [CrossRef]
76. Kim, H.; Sim, S.-H.; Spencer, B.F. Automated concrete crack evaluation using stereo vision with two different focal lengths. Autom.
Constr. 2022, 135, 104136. [CrossRef]
77. Chen, C.X.; Shen, P. Research on Crack Width Measurement Based on Binocular Vision and Improved DeeplabV3+. Appl. Sci
2023, 13, 2752. [CrossRef]
78. Pan, S.J.; Yang, Q.A. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [CrossRef]
79. Wang, M.; Deng, W.H. Deep visual domain adaptation: A survey. Neurocomputing 2018, 312, 135–153. [CrossRef]
80. Zhang, T.Y.; Suen, C.Y. A Fast Parallel Algorithm for Thinning Digital Patterns. Commun. ACM 1984, 27, 236–239. [CrossRef]
81. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [CrossRef]
82. Shan, B.; Zheng, S.; Ou, J. A stereovision-based crack width detection approach for concrete surface assessment. KSCE J. Civ. Eng.
2016, 20, 803–812. [CrossRef]
83. Hinton, G.E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Improving neural networks by preventing co-
adaptation of feature detectors. arXiv 2012, arXiv:1207.0580.
84. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks
from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
85. Zhuang, F.Z.; Qi, Z.Y.; Duan, K.Y.; Xi, D.B.; Zhu, Y.C.; Zhu, H.S.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning.
Proc. IEEE 2021, 109, 43–76. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.