0% found this document useful (0 votes)
18 views20 pages

A Pavement Crack Detection Method Via Deep Learning and A Binocular-Vision-Based Unmanned Aerial Vehicle

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views20 pages

A Pavement Crack Detection Method Via Deep Learning and A Binocular-Vision-Based Unmanned Aerial Vehicle

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

applied

sciences
Article
A Pavement Crack Detection Method via Deep Learning and a
Binocular-Vision-Based Unmanned Aerial Vehicle
Jiahao Zhang 1 , Haiting Xia 1, *, Peigen Li 2 , Kaomin Zhang 1, *, Wenqing Hong 3 and Rongxin Guo 4

1 Faculty of Civil Aviation and Aeronautics, Kunming University of Science and Technology,
Kunming 650500, China; [email protected]
2 International Joint Laboratory for Green Construction and Intelligent Maintenance of Yunnan Province,
Kunming 650500, China; [email protected]
3 Kunming Institute of Physics, Kunming 650223, China; [email protected]
4 Faculty of Civil Engineering and Mechanics,
Kunming University of Science and Technology, Kunming 650500, China; [email protected]
* Correspondence: [email protected] (H.X.); [email protected] (K.Z.)

Abstract: This study aims to enhance pavement crack detection methods by integrating unmanned
aerial vehicles (UAVs) with deep learning techniques. Current methods encounter challenges such as
low accuracy, limited efficiency, and constrained application scenarios. We introduce an innovative
approach that employs a UAV equipped with a binocular camera for identifying pavement surface
cracks. This method is augmented by a binocular ranging algorithm combined with edge detection
and skeleton extraction algorithms, enabling the quantification of crack widths without necessitating
a preset shooting distance—a notable limitation in existing UAV crack detection applications. We
developed an optimized model to enhance detection accuracy, incorporating the YOLOv5s network
with an Efficient Channel Attention (ECA) mechanism. This model features a decoupled head struc-
ture, replacing the original coupled head structure to optimize detection performance, and utilizes a
Generalized Intersection over Union (GIoU) loss function for refined bounding box predictions. Post
identification, images within the bounding boxes are segmented by the Unet++ network to accurately
quantify cracks. The efficacy of the proposed method was validated on roads in complex environ-
Citation: Zhang, J.; Xia, H.; Li, P.;
ments, achieving a mean Average Precision (mAP) of 86.32% for crack identification and localization
Zhang, K.; Hong, W.; Guo, R. A
with the improved model. This represents a 5.30% increase in the mAP and a 6.25% increase in recall
Pavement Crack Detection Method
compared to the baseline network. Quantitative results indicate that the measurement error margin
via Deep Learning and a
for crack widths was 10%, fulfilling the practical requirements for pavement crack quantification.
Binocular-Vision-Based Unmanned
Aerial Vehicle. Appl. Sci. 2024, 14,
1778. https://doi.org/10.3390/
Keywords: pavement crack detection; YOLOv5s; U-Net++; binocular vision; unmanned aerial vehicle
app14051778

Academic Editor: José António


Correia
1. Introduction
Received: 20 December 2023 As road mileage increases, the need for meticulous road health inspections intensi-
Revised: 9 February 2024 fies. Cracks, often the initial indicators of road deterioration, evolve dynamically due to
Accepted: 19 February 2024 material aging, traffic loading, and environmental factors. In road health monitoring, crack
Published: 22 February 2024 detection is pivotal, offering insights into a road’s current state and forecasting potential
safety hazards and deterioration patterns. Consequently, profound research into crack
detection methodologies and the advancement of precise, efficient crack identification
and analysis tools are crucial for intelligent road inspection [1–3]. The accurate detection,
Copyright: © 2024 by the authors.
localization, and width measurement of cracks are vital as different pavement conditions
Licensee MDPI, Basel, Switzerland.
This article is an open access article
necessitate varied repair standards and urgency levels. Current crack detection methods,
distributed under the terms and
however, face significant challenges. Traditional manual inspections are labor-intensive,
conditions of the Creative Commons
time-consuming, and often inefficient. The complexity and size of road networks further
Attribution (CC BY) license (https:// complicate accurate detection and quantitative analysis. In recent years, the development of
creativecommons.org/licenses/by/ unmanned aerial vehicle (UAV) technology and deep learning algorithms has opened new
4.0/). avenues for pavement crack detection [4–7]. UAVs, characterized by their speed, efficiency,

Appl. Sci. 2024, 14, 1778. https://doi.org/10.3390/app14051778 https://www.mdpi.com/journal/applsci


Appl. Sci. 2024, 14, 1778 2 of 19

compactness, high passability, and low risk, have gained prominence in construction defect
detection. When combined with deep learning algorithms, such as convolutional neural
networks, they excel in complex detection environments and in hazard identification. These
technologies have sparked innovative detection approaches and have been substantiated
in numerous studies.
Deep learning models, with their intricate network structures, adeptly learn and recog-
nize various crack features from extensive datasets, exhibiting high flexibility and accuracy
in identifying diverse crack types and sizes. Compared with traditional manual methods,
deep learning substantially enhances the efficiency and speed of crack detection, reduces
human resource dependence, and minimizes the likelihood of errors. Zhang et al. [8]
pioneered the use of deep convolutional neural networks for crack detection and classifi-
cation. Cha et al. [9] introduced a combination of convolutional neural networks (CNNs)
and a sliding window technique for high-resolution crack imagery, advancing structural
health detection. Jiang et al. [10] employed a novel optimization strategy including deeply
differentiable convolution and inverse residual networks to significantly enhance concrete
crack detection accuracy. Yang et al. [11] developed the Feature Pyramid and Hierarchical
Boosting Network (FPHBN) for crack detection, which layers weights over nested samples
to balance simple and complex sample contributions. Yun et al. [12] utilized a Generative
Adversarial Network (GAN)-based data enhancement approach for crack images and
employed an improved VGG network for enhanced accuracy. Rao et al. [13] designed
a CNN-based method using non-overlapping windows to expedite crack detection and
streamline analysis. Yu et al. [14] improved the Dempster–Shafer algorithm for handling
conflicting CNN results, which boosted detection accuracy. Silva et al. [15] enhanced the
VGG16 model and investigated various parameters’ impacts on detection outcomes. UAVs
are in the spotlight due to their efficiency and flexibility in engineering inspections. They
cover extensive areas rapidly and are equipped with high-resolution cameras and sensors
for data precision. Duan et al. [16] developed a binocular-vision-based UAV platform for
improved image recognition. Ma et al. [17] combined UAV remote sensing with binocular
vision for precise power equipment detection. Shuai et al. [18] introduced a UAV-mounted
binocular vision system for obstacle detection in power infrastructure. Gopalakrishnan
et al. [19] utilized a transfer learning approach with a pre-trained VGG-16 network for
building crack detection via a UAV. Lei et al. [20] proposed the Crack Central Point Method
(CCPM), combining UAVs and digital image processing for robust bridge crack detection
with limited data. Liu et al. [21] employed UAVs for high-rise-building crack detection,
addressing motion blur with a GAN-based model. Although there have been significant
advancements, the task of detecting pavement cracks remains challenging. Variations in
image scale and complex background conditions contribute to the occurrence of leakage
and misdetection.
Quantifying pavement crack sizes allows for an accurate assessment of a road’s current
condition, which facilitates the implementation of preventive maintenance measures. Such
measures not only prolong the road’s service life but also diminish long-term maintenance
expenses. Regarding crack width measurement, Kim et al. [22] employed a hybrid image
processing technique alongside a UAV equipped with ultrasonic displacement sensors,
enabling working distance calculation and crack width estimation. Liu et al. [23] employed
a deep learning approach to analyze signals from distributed fiber optic sensors, enhancing
the efficiency of detecting spatially distributed cracks. However, the application of fiber
optic sensors is not feasible for large-scale pavement crack detection. Park et al. [24]
utilized deep learning and laser sensors in a structured light application for detecting
and quantifying surface cracks in concrete structures based on the laser beam’s projection
position on the surface. Yu et al. [25] achieved bridge crack identification, segmentation,
and width calculation using a UAV with a monocular camera aided by a Mask R-CNN.
Peng et al. [26] introduced a two-stage crack identification method combining a UAV with a
laser range finder and image threshold segmentation for crack width determination. Zhou
et al. [27] employed Faster R-CNN for crack region detection and used maximum entropy
Appl. Sci. 2024, 14, 1778 3 of 19

threshold segmentation and Canny edge detection to determine crack dimensions. Ding
et al. [28] developed a method to quantify cracks in various measurement poses using
a full-field-scale UAV gimbal camera, addressing the challenge of image measurement
relative to markers. However, current crack quantization methods face issues of either
insufficient accuracy or high equipment costs.
The existing research on pavement crack detection has achieved some progress, but
the quantitative detection of cracks continues to encounter challenges. Firstly, pavement
cracks exhibit a wide range of dimensional variations and often manifest in variable, com-
plex backgrounds, posing difficulties for deep learning-based target detection algorithms.
Secondly, in instances in which a photo contains multiple or exceedingly long cracks, using
a single measured distance as the representative shooting distance for width calculation
lacks precision. This paper introduces a novel crack-labeling method that employs a small
frame overlay technique to accurately label and train a crack dataset. Additionally, a
binocular ranging algorithm is used to determine the shooting distances to various crack
segments. By precisely measuring the shooting depth, this approach significantly enhances
the accuracy of crack width measurements, allowing for differentiated width calculations
across various crack sections.

2. Materials and Methods


2.1. Overview of the YOLOv5 Algorithm
YOLOv5, an efficient and precise single-stage target detection network, was introduced
by Glenn Jocher in 2020. It processes faster than two-stage networks and has become widely
applied in various target detection tasks. We utilize the fastest YOLOv5s as the base model
among the four versions provided by the small, medium, large, and extra-large models
Appl. Sci. 2024, 14, x FOR PEER REVIEW 4 of 19
of YOLOv5. The network’s architecture comprises four main components, the input,
backbone, neck, and head, as depicted in Figure 1.

Figure 1.
Figure 1. Network
Network structure
structure of
of YOLOv5s.
YOLOv5s.

2.2. Improvements in YOLOv5s Network Model


A challenge in pavement crack detection is the significant variation in crack shape
and size from image to image. The crack images in our constructed dataset have scale and
shape diversity. The original YOLOv5s model, despite its obvious advantage in detection
speed, does not perform well on targets with multiple scales and shapes. To solve this
problem, we train model weights on the channel dimension of the feature layer extracted
Appl. Sci. 2024, 14, 1778 4 of 19

The backbone, a critical component of YOLOv5s, efficiently extracts spatial features


from input images. It primarily consists of the CBS (convolutional layer, batch Normal-
ization, and SiLU activation function), C3 (three convolutional blocks), and SPPF (Spatial
Pyramid Pooling Fast) modules. The CBS module lays the foundation for feature extraction
and assists the C3 module. The C3 module further enhances the structural depth of the
network. The SPPF module, an optimized version of the SPP, refines feature representation
by reducing the pooling window scale and simplifying the process. Thus, the inference of
the network model is accelerated. This design aids in extracting features at multiple scales
while preserving the spatial hierarchy.
The neck structure, comprising the Feature Pyramid Network (FPN) and the Path
Aggregation Network (PAN), processes feature maps of varying sizes obtained from the
backbone. These features are fused and upsampled, generating new multi-scale feature
maps for detecting objects of different sizes. The detection head uses anchor boxes to depict
the confidence level and bounding box regression, and it generates a multi-dimensional
array containing the target category, confidence level, and bounding box dimensions.
The detection results are then refined using a confidence threshold and non-maximum
suppression (NMS).
The integration of the FPN and PAN in YOLOv5s facilitates an effective fusion of top-
down semantic and bottom-up positional information flows to enhance the network’s multi-
scale target detection capability. This design allows YOLOv5s to efficiently predict targets
of various sizes across different feature resolutions, making it suitable for computation-
limited environments.

2.2. Improvements in YOLOv5s Network Model


A challenge in pavement crack detection is the significant variation in crack shape
and size from image to image. The crack images in our constructed dataset have scale and
shape diversity. The original YOLOv5s model, despite its obvious advantage in detection
speed, does not perform well on targets with multiple scales and shapes. To solve this
problem, we train model weights on the channel dimension of the feature layer extracted by
the backbone network so that the network strengthens the part with pavement defects and
weakens the useless information in the image. This strategy reduces the network’s focus on
irrelevant image information, which improves the network’s feature extraction efficiency
in complex contexts. We employ an enhanced YOLOv5s framework that integrates the
ECA-Net (efficient channel attention network) [29], an improvement upon the Squeeze-
and-Excite Network (SE-Net) [30]. This integration serves to reduce model complexity and
dependence on dimensionality reduction.
We introduce a decoupled head structure for the independent optimization of di-
mension prediction and category confidence, enhancing model accuracy without extra
computational load. The standard CIoU (Complete Intersection over Union) loss function
is replaced with the GIoU (Generalized Intersection over Union) loss function to offer a
more general metric for assessing target box overlap. The GIoU is particularly beneficial
for targets with significant size variations. The above adjustments more effectively correct
discrepancies between predicted and actual boxes.
Integrating the ECA attention mechanism, implementing a decoupled head structure,
and utilizing the GIoU loss function result in a more efficient and robust YOLOv5s-based
model. The proposed model significantly improves the accuracy of multi-scale pavement
crack detection in complex real-world environments.

2.2.1. ECA Attention Mechanism


The ECA-Net captures cross-channel interactions with one-dimensional convolution
instead of fully connected layers. The mechanism follows a principle of localized cross-
channel interactions in which each channel only interacts with its k neighboring channels,
and k is adaptively determined based on the number of channels. A one-dimensional
convolution efficiently implements the process, which generates attention weights based
2.2.1. ECA Attention Mechanism
The ECA-Net captures cross-channel interactions with one-dimensional convolution
instead of fully connected layers. The mechanism follows a principle of localized cross-
Appl. Sci. 2024, 14, 1778 channel interactions in which each channel only interacts with its k neighboring channels, 5 of 19
and k is adaptively determined based on the number of channels. A one-dimensional con-
volution efficiently implements the process, which generates attention weights based on
dependencies
on dependencies between
betweenchannels.
channels.As As
illustrated in Figure
illustrated 2, the
in Figure ECA
2, the ECAattention module
attention module in-
itially conducts global average pooling on the feature maps of H × W ×
initially conducts global average pooling on the feature maps of H × W × C, compressingC, compressing
each feature
each feature map
map ofof H
H ××WW×× 1 into a single
1 into a singlevalue,
value,thereby yielding
thereby yieldingan an
output array
output of 1of×
array
1 × C. This process calculates the average response of each channel, capturing
1 × 1 × C. This process calculates the average response of each channel, capturing its global its global
information. Subsequently,
information. Subsequently,a a1D1D convolution
convolution with
with a kernel
a kernel sizesize
of kof k is employed,
is employed, effec-
effectively
tively replacing
replacing the twothe twoconnected
fully fully connected layers found
layers found in the SE-Net.
in the SE-Net.

Figure 2. Network
Figure 2. Network structure
structure of
of efficient
efficient channel
channel attention
attention module.
module.

The convolution kernel


The convolution kernelsize
sizeofofthe
theECA-Net
ECA-Netis is derived
derived as as
an an adaptive
adaptive function
function of
of the
the number of input channels, as described in Equation (1). This equation calculates
number of input channels, as described in Equation (1). This equation calculates the loga- the
logarithm
rithm of theof the channel
channel countcount
(𝑐),(c),
addsadds
thethe offset
offset (𝑏),(b),
andand divides
divides byby𝑟 rto
to adjust
adjust the
the scale.
scale.
The
The resultant
resultant value
value (k)
(𝑘) is
is obtained
obtained byby taking
taking the
the absolute
absolute value
value to
to obtain
obtain aa non-negative
non-negative
number and then converting the non-negative number into the nearest odd number. The
number and then converting the non-negative number into the nearest odd number. The
odd in the equation denotes taking the nearest odd number. This approach effectively
𝑜𝑑𝑑 in the equation denotes taking the nearest odd number. This approach effectively
captures channel dependencies while maintaining a low level of computational complexity.
captures channel dependencies while maintaining a low level of computational complex-
It reduces the parameter count and the computational cost but preserves performance
ity. It reduces the parameter count and the computational cost but preserves performance
compared to the SE-Net.
compared to the SE-Net. logs c + b
k = φ(c) = (1)
logr 𝑐 𝑏odd
𝑘 = 𝜑(𝑐) = (1)
Here, b and r are constants, typically set to b 𝑟= 1 and r = 2. Each channel’s weight
Here, 𝑏 and
is determined 𝑟 are
via the sigmoid activation
constants, function
typically 𝑏 = 1 and
following
set to 𝑟 = 2 . Each operation.
the convolution channel’s
Convolution-shared
weight is determinedweights
via the are employed
sigmoid to further
activation enhance
function network
following the performance. This
convolution oper-
method efficiently captures the information of locally interacting channels
ation. Convolution-shared weights are employed to further enhance network perfor- while reducing
the network’s
mance. parameters.
This method The shared-weights
efficiently approach isof
captures the information calculated in Equationchannels
locally interacting (2).
while reducing the network’s parameters. The shared-weights
 approach is calculated in
k
ωi = σ ∑ j=1 Wi Yi , Yi ∈ Ωik
j j j
Equation (2). (2)

𝜔 = 𝜎 ∑ 𝑊 𝑌 ,𝑌 ∈ Ω j (2)
Here, σ (·) represents the sigmoid activation function. Wi denotes the jth local weight
matrixHere,
within𝜎(∙)
therepresents
ith grouped theweight
sigmoid activation
matrix of the c function.
channels, and𝑊 Y j
denotes the 𝑗 th local
i is derived similarly.
weight
The finalmatrix within the
step involves 𝑖th grouped
multiplying theweight
obtainedmatrix by 𝑐thechannels,
of the
weights input𝑌feature
original and is derived
map
similarly.
to generateThe final step
a feature mapinvolves multiplying
with attention the obtained
weights. The ECA-Net weights
can by the original
amplify input
the weights
feature
of map to generate
the effective channels,athus
feature map withthe
minimizing attention
loss of weights.
important The ECA-Net can
information amplify
during the
the weights of the effective channels, thus
convolutional dimensionality reduction process. minimizing the loss of important information
during the convolutional dimensionality reduction process.
2.2.2. Optimization of Network Structure and Loss Function
2.2.2.Object
Optimization
detectionofalgorithms
Network Structure and
are usually Loss Function
divided into two different tasks: classification
and localization. The classification task focuses on identifying
Object detection algorithms are usually divided into texture
two different andclassification
tasks: appearance
features to determine objects’ categories, while the localization task aims to pinpoint
and localization. The classification task focuses on identifying texture and appearance fea-
the objects’ exact locations by accurately capturing edge information. Previous
tures to determine objects’ categories, while the localization task aims to pinpoint methods the
utilized a single feature map for both tasks, which may result in suboptimal performance
due to the differing feature requirements of each task. The classification result depends on
feature similarity to a specific class, whereas the localization task requires precise spatial
coordinate predictions for bounding box adjustments. The previous approach often led to
spatial misalignment between tasks as the feature needs of the classification and localization
tasks may be different.
detection efficiency by focusing on the respective key features.
Crack detection in complex pavement scenes especially benefits from this decoupled
head structure. The classification pathway can focus on identifying the essential features
of cracks, significantly reducing the interference of complex backgrounds. Although the
Appl. Sci. 2024, 14, 1778 decoupled head provides different feature maps for the two tasks, it is designed to remain 6 of 19
lightweight, which is critical for pavement crack detection systems that require real-time
responses. Compared to conventional coupled heads, decoupled heads exhibit more effi-
cientTo
processing
address thecapabilities in terms we
above problems, of characterization requirements
implemented a decoupled forstructure
head differentwhich
tasks,
enhancing the classification and localization of objects. Furthermore, this structure
was initially introduced and validated in YOLOx [31]. As illustrated in Figure 3, this struc-effi-
ciently preserves channel information through depth and breadth optimization, lowering
ture enables separate feature extraction pathways for classification and localization, and
computational
each demandsa and
pathway includes boosting network
custom-designed speed.
network Consequently,
layer. This designthe
notdecoupled head
only improves
structure offers
detection accuracya solution
but alsofor object detection,
accelerates particularly
the convergence in dynamic
of network and intricate
training pave-
and improves
ment environments.
detection efficiency by focusing on the respective key features.

Figure 3.
Figure 3. Decoupled
Decoupled head
head architecture.
architecture.

Crack detection in complex pavement scenes especially benefits from this decoupled
head structure. The classification pathway can focus on identifying the essential features
of cracks, significantly reducing the interference of complex backgrounds. Although the
decoupled head provides different feature maps for the two tasks, it is designed to re-
main lightweight, which is critical for pavement crack detection systems that require
real-time responses. Compared to conventional coupled heads, decoupled heads exhibit
more efficient processing capabilities in terms of characterization requirements for different
tasks, enhancing the classification and localization of objects. Furthermore, this structure
efficiently preserves channel information through depth and breadth optimization, lower-
ing computational demands and boosting network speed. Consequently, the decoupled
head structure offers a solution for object detection, particularly in dynamic and intricate
pavement environments.
In the original YOLOv5 model, the regression loss function of a bounding box is
CIoU (Complete Intersection over Union) loss. The Unet++ network was chosen for
further crack segmentation after object detection using YOLOv5s, and high recall is crucial.
GIoU (Generalized Intersection over Union) shows superior performance with small-box
annotations. Thus, the GIoU loss function is used in the improved YOLOv5s. These
modifications enhance the model’s compatibility with small-bounding-box annotations
tions. Thus, the GIoU loss function is used in the improved YOLOv5s. These modifications
enhance the model’s compatibility with small-bounding-box annotations and boost pre-
cision and recall in crack identification. The loss function is defined as shown in Equations
(3)–(6):
Appl. Sci. 2024, 14, 1778 7 of 19
𝜌 (𝑏, 𝑏 )
𝐿𝑜𝑠𝑠 =1 𝐼𝑜𝑈 𝑎𝑣 (3)
𝑐
and boost precision and recall in crack identification. The loss function is defined as shown
in Equations (3)–(6): 𝑣= 𝑡𝑎𝑛 𝑡𝑎𝑛 (4)
ρ2 b, b gt

LossCIoU = 1 − IoU + + av (3)
c2
𝑎= ( ) (5)
 gt 2
4 −1 w −1 w
v = 2 tan − tan (4)
π h gt | h ( ⋃ )|
𝐿 = 1 𝐼𝑜𝑈 (6)
v | |
a= (5)
where the variables are defined as follows:(1 − IoU ) + v

IoU (Intersection over Union) calculates the |C − ( A ∪ B)|


LGIoU = 1 − IoU +ratio of intersection to union between(6)the pre-
dicted and actual boxes; |C |
𝑏where
represents the centroid
the variables of the
are defined follows: box, and 𝑏 is the centroid of the actual box;
as predicted
𝜌IoU
denotes the Euclidean distance;
(Intersection over Union) calculates the ratio of intersection to union between the
cpredicted
is the length of theboxes;
and actual diagonal of the encompassing rectangle formed by both boxes;
𝛼b is
represents the centroid
the weight coefficient;of the predicted box, and b gt is the centroid of the actual box;
denotes thethe
𝑣ρ measures Euclidean distance;
aspect ratio discrepancy between the predicted and actual boxes.
c is the length of the diagonal of the encompassing rectangle formed by both boxes;
Theweight
α is the introduction of aspect
coefficient; ratiothe
v measures considerations in the CIoU
aspect ratio discrepancy loss function
between emphasizes
the predicted
the
andbounding box’s shape. However, this complexity leads to significant computational
actual boxes.
overhead during training
The introduction andratio
of aspect is less suited for crack
considerations in the prediction with the
CIoU loss function annotation of a
emphasizes
small bounding
the bounding box.
box’s Thus,
shape. we introduce
However, GIoU loss
this complexity (Equation
leads (6)), computational
to significant which considers the
smallest enclosing rectangle of both bounding boxes, resulting in a more stable of
overhead during training and is less suited for crack prediction with the annotation loss
a value.
small bounding box. Thus, we introduce GIoU loss (Equation (6)), which considers the
smallest
2.3. enclosing
Binocular rectangle
Distance of both bounding
Measurement boxes, resulting in a more stable loss value.
Algorithm
Binocular
2.3. Binocular visionMeasurement
Distance leverages the parallax principle to ascertain the three-dimensional
Algorithm
attributes of the object intended for measurement,
Binocular vision leverages the parallax principle toutilizing
ascertainimages captured by left and
the three-dimensional
right cameras.
attributes of theAs illustrated
object intendedinfor
Figure 4, for any
measurement, spatialimages
utilizing point,captured
both cameras
by left simultane-
and
right cameras. As illustrated in Figure 4, for any spatial point, both cameras simultaneously xl and
ously capture the point’s position, denoted as W. Based on the spatial positions
xrcapture
read by
the each camera,
point’s position,the position
denoted of spatial
as W. Based onpoint W is positions
the spatial calculated.
xl and xr read by
each camera, the position of spatial point W is calculated.

Figure 4. Binocular disparity principle.


Figure 4. Binocular disparity principle.
The 3D world coordinates of an object are projected onto the image plane and trans-
formed into 2D image coordinates during camera imaging. Conversely, in the binocular
vision 3D reconstruction process, the aim is to reverse this operation, reconstructing the ob-
ject’s 3D world coordinates from its 2D image coordinates. This reconstruction necessitates
utilizing parallax information from binocular cameras.
The positions of the imaging points captured by the left and right cameras in the
image coordinate system require a spatial transformation matrix to map them into 3D
world coordinates. This transformation hinges on the binocular cameras’ internal and
Appl. Sci. 2024, 14, 1778 8 of 19

external parameters. The internal parameters include the focal length, optical center, and
lens aberration coefficient, characterizing the camera sensor’s imaging properties. The
external parameters describe the camera’s position and orientation relative to the world
coordinate system, incorporating the rotation matrix and the translation vector between the
two cameras. The above parameters, which are crucial for 3D reconstruction, are obtainable
through precise calibration. MATLAB 2020a’s Camera Calibration Toolbox is employed for
this standard calibration, which involves capturing a series of images of a fixed calibration
object (such as a checkerboard grid) to extract spatial geometric feature points and compute
the camera’s internal and external parameters.
W ( X, Y, Z ) is considered a spatial point for which the imaging point in the pixel
coordinate system is (u, v) and the model plane is Z = 0. The parameters can be obtained
by utilizing the checkerboard grid coordinates as the world coordinate system. They can be
derived using Equation (7):
 
  X  
u   Y    X
s v  = A r1 r2 r3 t  0  = A r1
 r21 r3  Y  (7)
1 1
1

where the variables are defined as follows:


s represents the scale factor, indicating the mapping scale;
r1 r2 r3 denotes the rotation matrix between the camera coordinate system and the
world coordinate system;
r1 and r2 are unit orthogonal vectors within the unit orthogonal matrix;
t symbolizes the translation vector of the camera coordinate system relative to the world
coordinate system;
A is the camera parameter matrix, as shown in Equation (8).
 
fx 0 u0
A= 0
 fy v0  (8)
0 0 1

Here, f x represents the number of pixels focused in the x-direction, and f y signifies the
number of pixels focused
 in the y-direction.
  The coordinates
 (u0 , v0 ) denote the camera’s
center point. Let H = r1 r21 t = h1 h2 h3 to obtain Equations (9) and (10):
   
u X
s v  = H  Y  (9)
1 1

h1T A−T A−1 h2 = 0



(10)
h1 A A−1 h1 = h2T A−T A−1 h2
T − T

The matrices for both the internal and external parameters can be obtained using
Equations (7)–(10).
After calibration, we utilize functions from the OpenCV 4.5.5 library for image cor-
rection in the binocular system. Initially, parameters are fed into the cv2.stereoRectify()
function, enabling the correction of images from both cameras to a nearly coplanar 2D
plane. Subsequently, the cv2.initUndistortRectifyMap() function generates a mapping
matrix for both image and distortion correction. Finally, the cv2.remap() function is utilized
to accurately calibrate the images from the left and right cameras, ensuring the precision
of the subsequent stereo matching. The binocular stereo-matching process employs the
Semi-Global Block Matching (SGBM) algorithm [32], implemented in OpenCV. The SGBM
algorithm incorporates both local pixel correlations and global visual information in its
precision of the subsequent stereo matching. The binocular stereo-matching process em-
ploys the Semi-Global Block Matching (SGBM) algorithm [32], implemented in OpenCV.
The SGBM algorithm incorporates both local pixel correlations and global visual infor-
mation in its pixel-level matching, markedly enhancing the accuracy and robustness of
Appl. Sci. 2024, 14, 1778 9 of 19
the matching, particularly in areas with sparse or repetitive textures.
Since crack widths are usually narrow, the method of using similar triangles for
width measurements
pixel-level after extracting
matching, markedly enhancingfeature points using
the accuracy a binocular
and robustness of camera may lead to
the matching,
significant errors. The accuracy of the measurements
particularly in areas with sparse or repetitive textures. can be significantly improved by
usingSince
a camera
crack pinhole model.
widths are usuallyThis model
narrow, combines
the method the focal
of using length,
similar shooting
triangles for widthdistance,
and unit lengthafter
measurements of the sensorfeature
extracting to calculate the physical
points using dimensions
a binocular camera mayofleada pixel point at a
to signif-
icant errors.
specific The accuracy
shooting distance.ofThen,
the measurements
it calculates can be width
crack significantly
basedimproved by usingdimen-
on the physical
a camera pinhole model. This model combines the focal length, shooting distance,
sions. The camera pinhole model is used to calculate the physical size of each pixel and at the
unit length of the sensor to calculate the physical dimensions of a pixel point at a specific
corresponding distance, as shown in Figure 5. The calculation is shown in Equations (11)
shooting distance. Then, it calculates crack width based on the physical dimensions. The
and (12) after substituting the calibrated camera parameters:
camera pinhole model is used to calculate the physical size of each pixel at the correspond-
ing distance, as shown in Figure 5. The calculation is𝑓𝑀
shown in Equations (11) and (12) after
substituting the calibrated camera parameters: 𝑚= (11)
𝐷
fM 𝑤 𝐷
m= (11)
𝑀D= (12)
𝑃𝑓
wp D
where the variables are defined as follows:
M= (12)
Pc f
𝑚 represents the size of the object on the image plane;
where the variables are defined as follows:
𝑓 denotes the camera’s focal length;
m represents the size of the object on the image plane;
𝑀 is the actual size of the object in the x-axis direction;
f denotes the camera’s focal length;
𝐷Misisthe
thedistance from
actual size the
of the object
object to the
in the camera;
x-axis direction;
𝑤D isindicates the from
the distance number of pixels
the object in camera;
to the the object’s width;
𝑃w prefers to the
indicates thenumber
number of pixels
pixelsinonthe
the camera’s
object’s sensor corresponding to 1 cm.
width;
Pc refers to the number of pixels on the camera’s sensor corresponding to 1 cm.

Figure 5. Camera pinhole model.


Figure 5. Camera pinhole model.
2.4. Crack Segmentation and Quantification Method
2.4. Crack Segmentation
Before and Quantification
quantizing cracks, the U-Net++ Method
network [33] is used for fine crack image seg-
mentation.
Before U-Net++ is a cracks,
quantizing deep learning image segmentation
the U-Net++ network [33] network
is usedbased on acrack
for fine prototype
image seg-
of U-Net. It is optimized by introducing nested jump connections and a deep
mentation. U-Net++ is a deep learning image segmentation network based on a prototype supervision
mechanism. It maintains U-Net’s architecture, comprising an encoder and a decoder. The
of U-Net. It is optimized by introducing nested jump connections and a deep supervision
encoder features a 3 × 3 convolutional layer, a subsequent batch normalization layer, and a
mechanism.
ReLU activationIt maintains U-Net’s
function for architecture,
high-level semantic comprising an encoder
feature extraction. andcomputa-
To reduce a decoder. The
encoder features athe
tional complexity, 3 ×encoder
3 convolutional
utilizes a 2layer, a subsequent
× 2 max pooling layerbatch normalization The
for downsampling. layer, and
a decoder’s
ReLU activation function for
design is distinguished high-level
by its nested andsemantic
dense jump feature extraction.
connections, enablingTo
thereduce
encoder’s outputs to connect not only to the corresponding decoder layer but also to all
computational complexity, the encoder utilizes a 2 2 max pooling layer for downsam-
pling. The decoder’s design is distinguished by its nested and dense jump connections,
Appl. Sci. 2024, 14, 1778 enabling the encoder’s outputs to connect not only to the corresponding decoder layer but
10 of 19
also to all preceding decoder layers. The decoder upsamples the feature maps, matching
the original image’s resolution, to produce precise pixel-level segmentation labels.
preceding decoder
The above layers.
designs The decoder
create a dense upsamples the feature
feature transfer maps,
network, matching full-scale
facilitating the originalfea-
image’s resolution, to produce precise pixel-level segmentation labels.
ture utilization and enhancing the fusion of low- and high-level features. U-Net++ intro-
ducesThe above designs
a predictive outputcreate a dense
module feature
at the transferfinal
encoder’s network,
layer facilitating full-scale
to bolster model feature
robustness
utilization and enhancing the fusion of low- and high-level features. U-Net++
and minimize over-segmentation risk. This module pre-determines the presence of target introduces a
predictive output module at the encoder’s final layer to bolster model robustness
objects across the entire image region, thus reducing non-target region mis-segmentation. and minimize
over-segmentation risk. This module pre-determines the presence of target objects across
Furthermore, U-Net++ adopts an intensely supervised strategy with a custom composite
the entire image region, thus reducing non-target region mis-segmentation. Furthermore,
loss function, which offers multi-scale training supervision to improve segmentation per-
U-Net++ adopts an intensely supervised strategy with a custom composite loss function,
formance.
which offers multi-scale training supervision to improve segmentation performance.
After image segmentation, potential edge contours are identified through a gradient
After image segmentation, potential edge contours are identified through a gradient
strength and orientation analysis.
strength and orientation analysis. A
A non-maximum
non-maximum suppression
suppressiontechnique
techniquerefines
refinesthethe
edges, and a dual-thresholding approach distinguishes solid and
edges, and a dual-thresholding approach distinguishes solid and weak edges, ensuringweak edges, ensuring
continuity
continuity and
and clarity. The distance
clarity. The distance transform
transformalgorithm
algorithmcalculates
calculateseach
eachpixel’s
pixel’sproximity
proximity
to the nearest background pixel for extracted crack edges. This transform
to the nearest background pixel for extracted crack edges. This transform indicates the indicates the
distance to the crack’s central region in crack detection and is equal to half the
distance to the crack’s central region in crack detection and is equal to half the crack width. crack width.
The
The crack
crack width,
width, as depicted in Figure
Figure 6,6,the
theshooting
shootingdistance,
distance,andandthe
thebinocular
binocular camera
camera
parameters
parameters can be input
input into
into Equation
Equation(12) (12)totoaccurately
accuratelycalculate
calculatethethe actual
actual crack
crack width.
width.

Figure
Figure6.6.Crack
Cracksegmentation
segmentationprocess:
process:(a)
(a)crack
crackdetection
detectionresults,
results,(b)
(b)extraction
extractionofofcrack
crackimages,
images,(c)
segmentation images, (d) edge detection results, and (e) crack skeletonization.
(c) segmentation images, (d) edge detection results, and (e) crack skeletonization.

3.3.Implementation
Implementation Details and
and Experimental
ExperimentalResults
Results
3.1. Production of Datasets
3.1. Production of Datasets
The dataset
The dataset included
included 400
400 images
images of
of pavement
pavement cracks
cracks captured
captured with
with the
the DJI
DJI Mavic
Mavic 3
3 Drone from China and 3266 open-source crack images collected by Zhu [33]. In addi-
Drone from China and 3266 open-source crack images collected by Zhu [33]. In addition,
tion, an innovative strategy of crack annotation was introduced for precise identification
an innovative strategy of crack annotation was introduced for precise identification and
and measurement.
measurement.
Labelimg, an open-source annotation tool, was used to label cracks according to the
Labelimg, an open-source annotation tool, was used to label cracks according to the
small-bounding-box overlay method, as shown in Figure 7. This annotation method al-
small-bounding-box
lowed us to cover the overlay method,
full length of a as shown
crack, in Figure
marking 7. This annotation
the various parts of themethod
crack byal-
lowed us to cover the full length of a crack, marking the various parts of
means of small, dense bounding boxes. This annotation strategy is designed to support the crack by
means of small, dense bounding boxes. This annotation strategy is designed
binocular-vision-based ranging algorithms. Accurate distance measurements of different to support
binocular-vision-based ranging
parts of a single long crack algorithms.
can improve Accurate distance
the quantification measurements
accuracy of different
of crack width. This
parts of a single long crack can improve the quantification accuracy of crack width.
meticulous annotation method allows for further analyses of the cracks’ spatial distribu- This
tion characteristics.
Appl. Sci. 2024, 14, x FOR PEER REVIEW 11 of 19
Appl. Sci.
Appl. Sci. 2024,
2024, 14,
14, xx FOR
FOR PEER
PEER REVIEW
REVIEW 11 of
11 of 19
19
Appl.
Appl. Sci.
Sci. 2024,
2024, 14,
14, x
x FOR
FOR PEER
PEER REVIEW
REVIEW 11
11 of
of 19
19
Appl. Sci. 2024, 14, x FOR PEER REVIEW 11 of 19

Appl. Sci. 2024, 14, 1778 11 of 19


meticulousannotation
meticulous annotationmethod
methodallows
allowsfor
forfurther
furtheranalyses
analysesofofthe
thecracks’
cracks’spatial
spatialdistribu-
distribu-
meticulous
meticulous annotation method
annotation method allows
allows for
for further
further analyses
analyses of
of the
the cracks’
cracks’ spatial
spatial distribu-
distribu-
tion characteristics.
meticulous annotation
tion characteristics.
characteristics. method allows for further analyses of the cracks’ spatial distribu-
meticulous
tion annotation method allows for further analyses of the cracks’ spatial distribu-
tion
tion characteristics.
characteristics.
tion characteristics.

Figure7.7.The
Figure Thedataset
datasetannotation
annotationmethod.
method.
Figure7.
Figure Thedataset
7.The datasetannotation
annotationmethod.
method.
Figure
Figure 7.
7. The
The dataset
dataset annotation
annotation method.
method.
Figure 7. The dataset annotation method.
Thecracks
The cracksneeded
cracks neededto
needed tobebesegmented
segmentedfor forquantitative
quantitativeanalysis,
analysis,and andthethealgorithm
algorithmused used
The
The
The cracks
cracks needed
needed totobe
to
be
be segmented
segmented
segmented
for
for
for
quantitative
quantitative
quantitative
analysis,
analysis,
analysis,
and
and
and
the
the
the
algorithm
algorithm
algorithm
used
used
used
was U-Net++.
The
wasU-Net++.
was cracks
U-Net++.
U-Net++. The
The
The crack
needed
crack
crack tosegmentation
be segmented
segmentation
segmentation dataset
for
dataset
dataset was prepared
quantitative
wasprepared
was prepared
prepared by
analysis,utilizing
and
byutilizing
by utilizing
utilizing the publicly
algorithm
publicly
publicly available
used
available
available
was The cracks needed
The crack to be segmented
segmentation for quantitative
dataset was analysis,
by and the algorithm
publicly used
available
was
was U-Net++.
datasets.
U-Net++. The The
The crack
dataset
crack segmentation
sources
segmentationare dataset
detailed
dataset inwas
Table
was prepared
1, and
prepared by utilizing
they
by comprised
utilizing publicly
aatotal
publicly available
total of 2851
available
datasets.
datasets.
was U-Net++.
datasets. The
The
The Thedataset
dataset
crack
dataset sources
sources
segmentation
sources are
are detailed
detailed
dataset
are detailed
detailed inin Table
Table
was 1,
1,
Tableprepared
in Table and
and
1, and
and they they
they comprised
comprised
by utilizing
they comprised a
publicly total
a total
total of
of 2851
2851
available
of 2851
2851
datasets.
images.
datasets.
images.The The dataset
images
Theimages
dataset
imagesin sources
in the
sources are
dataset underwent
are detailed
detailed in
in Table
Table 1,
preprocessing
1, and
and they comprised
to
theytoto align
comprised with a
withaathethe
total of
network’s
of 2851
2851
images.
datasets.
images. The
The dataset
images inin thedataset
the
sources
the dataset
dataset
are underwent
underwent
underwent in preprocessing
preprocessing
1,
preprocessing alignwith
align
tocomprised
align with the
the network’s
network’s
total of
network’s
images.
images. The
requirements,
The
requirements, images
which
images
which in
in the
included
the
included dataset
dataset underwent
cropping
underwent
cropping and
and preprocessing
resizing to
preprocessing
resizing to to
ensure
to
ensure align
a
align
a with
consistent
with
consistent the
the network’s
resolution
network’s
resolution of
of
requirements,
images. whichinincluded
The images
requirements, which included
the dataset cropping
underwent
cropping and resizing
and resizing to ensure
preprocessing
to ensureto aaalign
consistent
with the
consistent resolution
network’s
resolution of
of
requirements,
256 × 256
requirements,
256 × 256 which
pixels for
which
pixels for included
each
included
each image.
image. cropping
cropping and
Subsequently,
and
Subsequently, resizing
a
resizing
a to
subset
to
subset ensure
of 11,298
ensure
of 11,298aa consistent
images
consistent
images was
was resolution
chosen
resolution
chosen of
from
of
from
256 ×× 256
256 pixels
requirements,
256 pixels for each
which
for each
includedimage.cropping
image. Subsequently,
Subsequently, a subset
and resizing subsettoof
ofensure
11,298aimages
images
consistent wasresolution
chosen fromfromof
256
the××preprocessed
256
the 256 pixels
preprocessed
256 pixels for
pixels for
preprocessed each
images
forimages
each image.image.
image.
to Subsequently,
toconstruct
construct the
Subsequently,
the aaa subset
segmentation
subset of
segmentation of 11,298
11,298
dataset
of dataset
11,298
dataset images
forour
images
forimages
our our
was
was
was
model model
chosen
chosen
chosen from
training.
training. from
This
the
256 × 256
the preprocessed
preprocessed images images
each
images to to construct
to construct
construct the the
Subsequently, segmentation
a subset
the segmentation
segmentation dataset 11,298
dataset for for
for our model
was
our model chosentraining.
model training. from
training.
the
This
the
datasetdataset
preprocessed
was waswasthen
then then
images partitioned
partitioned to construct intotraining,
construct
into into training,
the
training, validation,
segmentation
validation, andsets,
dataset
anddataset
test testfor sets, following
ourfollowing
model
following an8:1:1
8:1:1
training.
an training.
8:1:1 ratio
This
the
This dataset
preprocessed
dataset was images
then partitioned
to
partitioned into the
training, validation,
segmentation
validation, and
and test
test forsets,
our
sets, model
following an
an 8:1:1
This
ratio
This dataset
to
dataset
to facilitate was
facilitate
was
modelthen
model
then partitioned
training
partitioned
training into
and training,
evaluation.
intoevaluation. validation,
training, validation,
and evaluation. validation, and and test
and test sets,
test sets, following
sets, following
following an an 8:1:1
an 8:1:1
8:1:1
ratio
This
ratio to facilitate
to facilitate
dataset was model
then
model training
partitioned
training and
into
and training,
evaluation.
ratio
ratio to
to facilitate
facilitate model
model training
training and
and evaluation.
evaluation.
ratio to facilitate model training and evaluation.
Table 1. Information on public datasets.
Table1.1.Information
Table Informationon onpublic
publicdatasets.
datasets.
Table
Table 1. Information
1. Information on on public
public datasets.
datasets.
Table
Table 1. Information
1. Information on public
on public datasets.
Dataset
Dataset
Dataset Name
Name Number
Name Number datasets.
Number ImageSize
Image
Image Size
Size ExampleImages
Example
Example Images
Images
Dataset
Dataset Name
Name Number
Number Image
Image Size
Size Example
Example Images
Images
Dataset Name
Dataset Name Number Number Image Size
Image Size Example Images
Example Images
2560
2560 ××1440
1440pixels
1440 pixels
CRACK500 [8] 500 2560 ×
2560 × 1440 pixels
pixels
CRACK500
CRACK500
CRACK500 [8] [8]
[8]
[8] 500
500
500 2560
2592
2560 × ×1440
××14401946
1440 pixels
pixels
pixels
CRACK500 500 2592 ×
2592
2560 1946 pixels
1946 pixels
CRACK500 [8] [8] 500 2592
2592 × 1946
1946 pixels
CRACK500 500 2592 ×× 1946 1946 pixels
pixels
2592
Cracktree200[34]
Cracktree200 [34] 206
206 800××600
800 600pixels
pixels
Cracktree200[34]
Cracktree200 [34] 206
206 800 ×
800 600 pixels
Cracktree200
Cracktree200 [34]
[34] 206
206 800
800 ××× 600
600
600
pixels
pixels
pixels
Cracktree200 [34] 206 800 × 600 pixels
CFD
CFD [35][35]
[35] 118
118 480××320
480 320pixels
pixels
CFD
CFDCFD [35]
[35] 118
118118 480
480 ×
480 ×× 320
320 pixels
320 pixels
pixels
CFD [35]
CFD [35] 118
118 480 ×× 320
480 320 pixels
pixels
311××462
311 462pixels
pixels
311 ×× 462
311 462 pixels
pixels
AEL
AEL [36][36]
[36] 58
58 768
311
311 × ×
× × 512
462 pixels
pixels
462 pixels
pixels
AEL 58 768
311
768 ×× 462
512
512 pixels
pixels
AEL
AELAEL [36]
[36]
[36] 58
58 58 768
700
768 ×× × 512
768 ××× 10001000
512 pixels
pixels
pixels
512 pixels
AEL [36] 58 700
768 512 pixels
700
700 ×
700 × 1000
× 1000pixels
1000 pixels
pixels
pixels
700 ×× 1000
700 1000 pixels
pixels
GAPs384
GAPs384 [37] [37]
[37] 1969
1969 1920
1920 ×× 1080× 1080 pixels
1080 pixels
pixels
GAPs384
GAPs384 [37] 1969
1969 1920
1920 × 1080 pixels
GAPs384
GAPs384 [37]
GAPs384[37] [37] 1969
19691969 1920
1920 ×
1920 ××1080
1080 pixels
1080 pixels
pixels
3.2.Experimental
3.2. ExperimentalEnvironment Environmentand andExperimental
ExperimentalSubjects Subjects
3.2.
3.2. Experimental Environment
Experimental Environment and and Experimental
Experimental Subjects Subjects
3.2. Experimental
All experiments
experimentsEnvironment and Experimental Subjects
3.2.
3.2.Experimental
All
Experimental
All experiments ininthis
Environment
Environment
in
thisand
this study were
study
study
were conducted
andExperimental
Experimental
conducted
were conducted Subjects
Subjectson
conducted on
on
on aa cloudcloud server server equipped
equipped with with
NVIDIAAll
All experiments
GeForce
experiments 4090 in
in this
GPUs,
this study
using
study were
a
wereCUDA parallel
conducted on aaa cloud
cloud server
computing
cloud
server equipped
architecture
server
equipped with
equipped within
with
with the
NVIDIA All GeForce 4090 4090inGPUs,
GPUs, using were a CUDA
CUDA parallel on computing architecture withinwith the
NVIDIA
NVIDIA
Pytorch Allexperiments
GeForce
experiments
GeForce
framework 4090 to
thisstudy
inGPUs,
this
construct
study
using
usingwere
the CUDA
conducted
aaadetection
conducted parallel
parallel
model.
a cloud
computing
on a computing
cloud
Python server
and
server
architecture
equipped
architecture
OpenCV
equipped
with
were
within
NVIDIA
within the
the
utilized
NVIDIA
Pytorch
NVIDIA GeForce
framework
GeForce 4090
4090 GPUs,
tousing
construct
GPUs,a using using CUDA
the detection
adetection
CUDA parallelparallel computing
model. Python
Python
computing architecture
andarchitecture
OpenCV werewithin
utilized
within the
the
Pytorch
GeForce
Pytorch framework
4090 GPUs, to construct the
CUDA parallel model.
computing and
architecture OpenCV within werethe utilized
Pytorch
Pytorch
for dataframework
fordata augmentation,
framework
augmentation,
to
to construct
andthe
construct
and the the
the detection
environment
detection
environment
model.
model. Python
configuration
Python
configuration isisand
and
detailed
OpenCV
detailed
OpenCV ininTable were
Table
were 2.utilized
2. Theop-
utilized
The op-
Pytorch
for data framework
framework augmentation,
to constructto construct
and the
thethe the detection
environment
detection model. model.
PythonThe Python
configurationandinitial and
is detailed
detailed
OpenCV OpenCV
werein Table
Tablewere
utilized2. utilized
The
forset op-
data
for data
timizer augmentation,
Adam was and
employed environment
in model configuration
training. is learning in rate 2.
wasThe op-
for data Adam
timizer
for data
timizer
augmentation,
augmentation,
augmentation,
Adam was
was
and the
and the
employed
and
employed
the environment
environment
environment
in model
in model configuration
training. The
configuration
training.
configuration The
is detailed
is detailed
initial
is
initial
detailed
learning
learning
in Table
in Table
in Table
rate
rate
2. The
2. The
was
2.
was
Thesetop-
set
optimizer toto
op-
to
1 10
timizer
timizer
1Adam 10 Adam
Adam
, ,and
andthe
and
was
the
was employed
weight
employed
weight decayrate
decay
in
in model
rate
model
was wassetset 55 10
training.
totoinitial
training. 10
The
The. .The
10
initial
Theepoch
initial learning
epoch
learningofofthe
rate
therate
the
was
training
training
set
process
wasprocess
set to
to
11 10 was
timizer
10
Adam
,,,toand thewas
employed employed
weight in decay
model in
rate model
was
training. setThe to 5
training.
5 10
The initial
... The
learningThe epoch learning
rate of
was rate
to 1towas
training
settraining ×64010 set

process to
2 , and
1 was10 set 250,
and the
theandweight
the
weight decay
batch
decay rate
size
rate was
was
was setset
to
set to
16.
to 5All 10
input images
The epoch
epoch were of
of the
resized
the training 640
process
process 640
640
1was
was
the 10 set
set to
, and
to
weight 250,
250, andweight
the
and
decay the
the
rate batch
batchdecay
was size
setrate
size was
was
to 5was
× set
set to
to 16.
−416.
10set 5 input
All
.toThe
All 10 .images
input
epoch images
The
of the epochwere
were
training resized
ofresized
theprocess to 640
training
to wasprocess640
set to
was
was set
pixels
settoto 250,
toconform
conform
250, and
andto the
to
the batch
the
batch size
model. was
size was
was set set to
set to 16.
to 16. All
16. All input
All input images
input images
images were were resized
were resized
resized to to 640
640 640
to 640 640
640
pixels
was
pixels
250, set to
and 250,
conform
the and
batch the
to the
the
size model.
batch
model.
was size
set to 16. All input images were resized to 640 × 640 pixels to
pixels
pixelsThe to
The conform
to conform
conform to
model’stesting the
totesting model.
the model. phaseinvolved
model. involvedusing usingaaconcrete
concreteroad roadatatKunmingKunmingUniversity Universityofof
pixels
conformTheto model’s
to the
model’s to
model. the
testing phase
phase involved using a concrete road at Kunming University of
The
Science
Theand model’s
and
model’s testing
Technology
testing to phase
toevaluate
phase involved
evaluate
involved using
using aaa concrete
theaccuracy
accuracy concrete
andefficiency
concrete efficiencyroad
road at at
at Kunming
of boththe
Kunming University
thecrack
crackidentifi-
Universityidentifi-of
of
Science
Science The
The and Technology
model’s
model’s testing
Technology testingto phase
to phase involved
evaluate the
involved
the using
using aand
accuracy and
concrete
and efficiency road
road of of
of both
atKunming
Kunming
both University
the crackUniversity
crack identifi- ofof
Science
cation
Science and
and Technology
localization
Technology and
to evaluate
the
evaluate imagethe
the accuracy
segmentation
accuracy and efficiency
models,
efficiency asof both
depicted
both the
the in Figure
crack identifi-
8.
identifi- For
cation
Science
Science
cation and
and
and and localization
Technology
Technology
localization and
to
andto the image
evaluate
evaluate
the imagethe
the segmentation
accuracy
accuracy
segmentation models,
andmodels,
efficiency
efficiency asof
as depicted
both the
both
depicted incrack
thein Figure
crack
Figure 8. For
identifi-For
identifica-
8.
cation
roadimage and
image localization
acquisition, and wethe image
utilized segmentation models, as depicted in Figure 8. For
cation
road
cation
tion
road andandlocalization
and
image
localization
acquisition,
localization
acquisition, and and
we
and
we thethe image
utilized
the image
image
utilized aaDJI
a
DJIMavic3
DJI Mavic3UAV
segmentation
segmentation
segmentation
Mavic3
UAV
UAV
models,
models,
models,
equipped
equipped
as
equipped
as depicted
as
depicted
withaaRaspberry
depicted
with
with in a
Raspberry
in
in
FigureFigure
Figure
Raspberry 8. 8.PiPi
8.
For Pi
For
4B
For
road
4B
4B
road
and
road image
a binocular
image acquisition,
camera,
acquisition, we
we and utilized
the
utilized aa DJI
shootingDJI Mavic3
parameters
Mavic3 UAV
UAV areequipped
detailed
equipped with
within aa Raspberry
Table 3.
Raspberry Pi
Pi 4B
4B
and
road
image
and aimage
binocular
acquisition, camera,
acquisition, andutilized
we
we utilized theashooting
shooting
DJIaMavic3 parameters
DJI Mavic3 UAVUAV are
equipped detailed
equipped with with inRaspberry
ain Table 3. Pi 4BPi
a Raspberry and 4Ba
and
and aaa binocular
binocular
binocular
camera,
camera,
camera,
and
and
and
the
the
the shooting
shooting
parameters
parameters
parameters
are
are
are
detailed
detailed
detailed
in
in
Table
Table
Table
3.
3.
3.
and a binocular
binocular camera, camera,
and the andshooting
the shooting parameters
parameters are detailed
are detailed in Table in Table
3. 3.
Appl. Sci. 2024, 14, x FOR PEER REVIEW 12 of 1

Appl. Sci. 2024, 14, 1778 12 of 19

Table 2. Configuration of the deep learning computing environment.


Table 2. Configuration of the deep learning computing environment.
Configured Contents Type
Configured Contents
Operating System Type Linux
OperatingCPU
System Xeon(R)
Linux Platinum 8352V
CPUGPU Xeon(R) Platinum
NVIDIA 8352V
GeForce RTX 4090, 24G
GPU NVIDIA GeForce RTX 4090, 24G
Pytorch Version 2.0.1
Pytorch Version 2.0.1
CUDA
CUDA Version 11.7
Version 11.7
cuDNN
cuDNN Version 8.5.0
Version 8.5.0
Python
Python Version 3.8
Version 3.8

Figure 8. Experimental process and site.


Figure 8. Experimental process and site.
Table 3. Parameters of the image acquisition device.
Table 3. Parameters of the image acquisition device.
Hardware Configured Contents
UAV Hardware
type Configured
DJI mavic3 Contents
Microcomputer
UAV type RaspberryDJI
Pi 4B
mavic3
Camera frame rate 30 fps/s
Microcomputer Raspberry Pi 4B
Camera pixels 4 million pixels
Camera
Maximum cameraframe rate
resolution 30 fps/s
2688 × 1520
Camera
Binocular camera pixels
baseline 4 million pixels
70 mm
Camera focal
Maximum length
camera resolution 3.0 mm2688 × 1520
Binocular camera baseline 70 mm
3.3. Comparison Camera
of Network Improvements
focal length 3.0 mm
The baseline model utilizes the YOLOv5s network structure. The ECA module is
incorporated
3.3. Comparisoninto the Bottleneck
of Network structure based on the baseline model, optimizing feature
Improvements
transfer and reuse by fusing the ECA mechanism with the Bottleneck module, a modifica-
The baseline model utilizes the YOLOv5s network structure. The ECA module is in
tion we refer to as Baseline + ECA. The YOLOX network’s decoupled head structure also
corporated into the Bottleneck structure based on the baseline model, optimizing featur
replaces the YOLOv5s’s coupled head structure and is called Baseline + DH. Concurrently,
transfer and reuse by
the CIoU localization fusing
loss the ECAwith
is substituted mechanism
GIoU andwith the Bottleneck
is designated module,
Baseline + GIoU.a modifi
cationthree
These we refer to as Baseline
enhancements + ECA.
collectively The
form YOLOX
Baseline network’s
+ GIoU + ECAdecoupled
+ DH. Thishead
studystructur
also replaces the YOLOv5s’s coupled head structure and is called Baseline
conducted comparative experiments on datasets with fine-crack annotations to evaluate + DH. Concur
the efficacy
rently, theofCIoU
theselocalization
improvements. lossThe weight of the
is substituted localization
with GIoU andlossiswas adjusted in
designated Baseline
GIoU. These three enhancements collectively form Baseline + GIoU + ECA + DH. Thi
study conducted comparative experiments on datasets with fine-crack annotations t
evaluate the efficacy of these improvements. The weight of the localization loss was ad
justed in the loss function to improve the recall of crack detection. The training process i
Appl. Sci. 2024, 14, x FOR PEER REVIEW 13 of 19
Appl. Sci. 2024, 14, 1778 13 of 19

shown in Figure 9. The proposed model had the highest recall and the lowest loss value
thethroughout
loss function
thetotraining
improveprocess.
the recall of crack detection. The training process is shown in
Figure 9. The proposed model had the highest recall and the lowest loss value throughout
the training process.

Figure 9. Comparison
Figure of recall
9. Comparison andand
of recall loss loss
curves between
curves the improved
between modelmodel
the improved and the original
and model.model.
the original

AsAsshown
shownin Table 4, the
in table baseline
4, the baselinemodel’s mean
model’s Average
mean Precision
Average (mAP)
Precision for crack
(mAP) for crack
detection tasks was 81.02%. Implementing the ECA mechanism at critical points within
detection tasks was 81.02%. Implementing the ECA mechanism at critical points within
the network structure reduced the model’s parameters and adeptly captured inter-channel
the network structure reduced the model’s parameters and adeptly captured inter-chan-
dependencies through a localized cross-channel interaction strategy. This integration of the
nel dependencies through a localized cross-channel interaction strategy. This integration
ECA module elevated the mAP to 82.82%, underscoring the mechanism’s effectiveness in
of the ECAthe
augmenting module
model’s elevated the mAP
crack feature to 82.82%,
recognition underscoring
capabilities. the mechanism’s
Furthermore, effective-
introducing
the YOLOx decoupled head structure refined the network training by distinctly separating intro-
ness in augmenting the model’s crack feature recognition capabilities. Furthermore,
ducing
the the YOLO
classification andx localization
decoupled tasks.head structure refinedresulted
This separation the network training
in enhanced by and
focus distinctly
separating the classification and localization tasks. This separation
precision, leading to a 2.10% increase in the mAP and an improvement in recall to 82.56%. resulted in enhanced
Byfocus and precision,
converting leading
the bounding boxto a 2.10%
loss increase
function to GIoUin loss,
the mAP
whichand an improvement
accounted in recall
for both the
to 82.56%.
coverage and By
the converting the bounding
distance between boundingbox loss
boxes, thefunction to GIoU
model gained moreloss, which accounted
comprehensive
crack
for localization feedback,
both the coverage andfurther raising the
the distance mAP tobounding
between 82.33%. The combined
boxes, application
the model of more
gained
these three improvements significantly boosted the network’s recall to 86.82%
comprehensive crack localization feedback, further raising the mAP to 82.33%. The com- and boosted
thebined
mAPapplication
to 86.32%. These datathree
of these emphatically demonstrate
improvements that the boosted
significantly strategic the
enhancements
network’s recall
toto
the86.82%
YOLOv5s architecture improve its capability to detect pavement
and boosted the mAP to 86.32%. These data emphatically demonstrate crack defects. that the
strategic enhancements to the YOLOv5s architecture improve its capability to detect pave-
Table 4. Comparative results of network improvements.
ment crack defects.
Network Model Precision (%) Recall (%) mAP (%)
Table 4. Comparative results of network improvements.
YOLOv5s baseline 82.57 80.57 81.02
YOLOv5s + ECA
Network Model 80.66
Precision (%) 83.68
Recall (%) 82.82
mAP (%)
YOLOv5s + decoupled head 86.41 82.56 83.12
YOLOv5s baseline 82.57 80.57 81.02
YOLOv5s + GIoU 82.12 85.55 82.33
YOLOv5s + ECAYOLOv5s
+ GIoU ++decoupled
ECA head 80.66
85.96 83.68
86.82 86.32 82.82
YOLOv5s + decoupled head 86.41 82.56 83.12
YOLOv5s + GIoU 82.12 85.55 82.33
Figure 10
YOLOv5s exemplifies
+ ECA + GIoU the results ofhead
+ decoupled pavement 85.96
crack detection. The
86.82initial YOLOv5s
86.32
model, prior to improvement, exhibited missed detections, particularly in areas with long
cracks. The enhanced model demonstrated improved recall, comprehensively covering the
Figure 10 exemplifies the results of pavement crack detection. The initial YOLOv5s
cracks with small boxes and exhibiting higher confidence.
model, prior to improvement, exhibited missed detections, particularly in areas with long
cracks. The enhanced model demonstrated improved recall, comprehensively covering
the cracks with small boxes and exhibiting higher confidence.
Appl. Sci. 2024, 14, 1778 14 of 19
Appl. Sci. 2024, 14, x FOR PEER REVIEW 14 of 19

Figure 10. Comparison of actual detection effects between improved model and baseline model.
Figure 10. Comparison of actual detection effects between improved model and baseline model.
3.4. Comparison of Crack Measurement Results
3.4. Comparison of Crack Measurement Results
The final improved model is compared with several other mainstream target detec-
The final improved model is compared with several other mainstream target detection
tion models, SSD [38], Faster R-CNN [39], and RetinaNet [40], under the same experi-
models,
mental SSD [38], Faster
conditions to verifyR-CNN [39], and
the detection RetinaNet
advantages of this[40], under
paper’s modeltheover
sametheexperimental
others
conditions to verify the detection advantages of this paper’s model
in pavement crack recognition. SSD uses a VGG16 backbone network, and Faster R-CNN over the others in
pavement crackuse
and RetinaNet recognition.
a ResNet50SSD uses anetwork.
backbone VGG16 Figure
backbone network,
11 displays theand Fasterperfor-
detection R-CNN and
RetinaNet
mance of useeacha model
ResNet50 backbone cracks.
for pavement network.
TheFigure
model11 displaysin
developed the detection
this performance
study demon-
ofstrates
each model for pavement
the highest cracks.
detection rate, The delineating
accurately model developed
the entireincrack
this with
study demonstrates
numerous
small
the bounding
highest boxes.rate,
detection This indicates
accuratelythedelineating
significant efficacy of thecrack
the entire targeted
with improvement
numerous small
strategy boxes.
bounding for longThis
cracks in pavement,
indicates as proposed
the significant in this
efficacy paper.
of the Conversely,
targeted the SSDstrategy
improvement
for long cracks in pavement, as proposed in this paper. Conversely, the SSDtarget.
model fails to identify some targets and inaccurately positions another defective model fails
toThe Fastersome
identify R-CNN model and
targets incorrectly identifies
inaccurately expansionanother
positions joints asdefective
cracks, lacking the The
target. pre- Faster
cision to distinguish between actual cracks and similar background features effectively.
R-CNN model incorrectly identifies expansion joints as cracks, lacking the precision to
distinguish between actual cracks and similar background features effectively. Similarly,
the RetinaNet model fails to localize defects accurately, with its detection bounding boxes
not enclosing the crack targets adequately.
The methodology introduced in this research demonstrates a superior equilibrium
between accuracy, computational efficiency, and model compactness, as depicted in Table 5.
Appl. Sci. 2024, 14, x FOR PEER REVIEW 15 of 19

Appl. Sci. 2024, 14, 1778 Similarly, the RetinaNet model fails to localize defects accurately, with its detection 15 of 19
bounding boxes not enclosing the crack targets adequately.
The methodology introduced in this research demonstrates a superior equilibrium
between
It achieves accuracy,
a meancomputational
accuracy (mA) efficiency,
of 86.32%andand
model compactness,
processes up toas152.7
depicted in Table
frames per second
5. It achieves a mean accuracy (mA) of 86.32% and processes up to 152.7
(FPS) while maintaining a minimal model size of 15.3 MB, offering near-optimal frames per secondaccuracy
(FPS) while
alongside maintaining
greater a minimal
processing speedmodel size of 15.3
and reduced MB, offering
storage near-optimal
requirements accuracy
compared to existing
alongside greater processing speed and reduced storage requirements compared to exist-
methods such as SSD, Faster R-CNN, and RetinaNet. The proposed method’s efficiency
ing methods such as SSD, Faster R-CNN, and RetinaNet. The proposed method’s effi-
makes it particularly apt for real-time processing applications and deployment on devices
ciency makes it particularly apt for real-time processing applications and deployment on
with constrained
devices computational
with constrained capacity,
computational thereby
capacity, providing
thereby providinganan
enhanced
enhancedbalance
balance among
detection precision,
among detection operational
precision, velocity,
operational and and
velocity, easeease
of implementation.
of implementation.

Figure 11. Comparison


Figure11. Comparison ofofthe
the actual
actual detection
detection effects
effects of different
of different models.
models.

Table5.5.Test
Table Test results ofdifferent
results of differentmodels.
models.
Model
Model mAP
mAP (%)(%) FPS (Frames-1)
FPS (Frames-1) Weight Weight
(MB) (MB)
SSD 77.32 112.2 99.7
SSD
Faster R-CNN 77.32
87.25 33.7 112.2 602.4 99.7
Faster R-CNN
RetinaNet 87.25
81.94 55.1 33.7 153.2 602.4
RetinaNet
YOLOv5s 81.94
82.61 173.5 55.1 13.4 153.2
YOLOv5s
Our method 82.61
86.32 152.7173.5 15.3 13.4
Our method 86.32 152.7 15.3
3.5. Comparison of Crack Measurement Results
3.5. Comparison of Crackabout
The UAV hovered Measurement
4 m in theResults
air to capture images of cracks in concrete pave-
ment. The
The UAVcrack width about
hovered was calculated
4 m in thefollowing the identification
air to capture and segmentation
images of cracks of
in concrete pavement.
cracks using the aforementioned image processing method. Concurrently, crack
The crack width was calculated following the identification and segmentation of cracks widths
were manually
using measured with
the aforementioned calipers
image for comparison.
processing TheConcurrently,
method. 25 extracted cracks were
crack se- were
widths
lected for a quantitative analysis and numbered #1-#25. The results of the UAV’s measure-
manually measured with calipers for comparison. The 25 extracted cracks were selected
ments and the manual measurements are presented in Table 6. The table includes an orig-
for a quantitative analysis and numbered #1-#25. The results of the UAV’s measurements
inal image, a segmented image, the measurements of the shooting distances, and the
and the manual measurements are presented in Table 6. The table includes an original
image, a segmented image, the measurements of the shooting distances, and the values of
the crack widths measured by the UAV and manually. The results reveal that the absolute
error remained below 4.14 mm when measuring cracks less than 2 cm wide. However,
the binocular camera had resolution limitations, resulting in higher relative errors, with a
maximum of 28.89%. Nevertheless, the absolute error remained low. For cracks wider than
2 cm, the absolute error did not exceed 3.53 mm, and the maximum relative error was 9.57%,
which remained below the 10% criterion. These findings indicate that the measurements of
cracks wider than 2 cm fell within acceptable error limits. These results provide valuable
Appl.Sci.
Appl. Sci.2024,
2024,14,
14,xx FOR
FOR PEER REVIEW
REVIEW values of the crack widths measured by the UAV and manually. The 16 ofresults
1619of 19reveal that
the absolute
the absolute error
error remained
remained below 4.14
below 4.14 mm
mm when
when measuring
measuring cracks
cracks less
less than
ofthan 2 cm
cm wide.
wide.
Appl. Sci. 2024, 14, x FOR PEER REVIEW values
the
values of
absolute
of the
the crack
error
crack widths
remained
widths measured
below
measured 4.14 by
by mmthe
the UAV
when
UAV and manually.
measuring
and manually. cracks The
The
16results
less 19
than
results 22reveal
cm wide.
reveal that
that
Appl. Sci. 2024, 14, x FOR PEER REVIEW values
However,
the
However,
values of
absolute the
the
the
of the crack
binocular
error
binocular
crack widths
remained
widths camerameasured
camera below
measuredhad
had 4.14 by
bymmmmthe
resolution
resolution
thewhen UAV
when
UAV and
limitations, manually.
measuring
limitations,
and manually.resulting
cracks
resulting The
The
16
in
less
in results
of 19
higher
than
higher
results 2 reveal
relative
cm
relative
reveal that
wide.er-
er-
that
the
the absolute
However,
absolute theerror
error remained
binocular
remained camera below
below had4.14 resolution
4.14 mm when measuring
limitations,
measuring cracks
resulting
cracks less
in
less than
higher
than 22relative
cm
cm wide.
wide.er-
the
rors, absolute
However,
the with
absolute a
the error
maximum remained
binocular of
camera below
28.89%. had 4.14 mm
Nevertheless,
resolution when measuring
the
limitations,absolute cracks
error
resulting less
in than
remained
higher 2 cm
than 2relativelow. wide.
For
er-
rors,
values of
However,
values
with
of the
the cracka error
the
crack
maximum
widths remained
binocular
widths measured
camera
measured
below
of 28.89%.
bybyhad 4.14
thethe UAV UAV
mm
Nevertheless,
and
resolution and
when
manually.measuring
the absolute
limitations,
manually. TheThe cracks
results error
resulting reveal
results
less
in remained
revealthat
higherthat
cm
low.
relativewide.
For
er-
However,
cracks
values
rors,of the
However, withwiderthe
crack
a
the binocular
than
widths
maximum
binocular cm, camera
2 measuredthe
of
camera28.89%. had
absolute
by thewhen
had resolution
UAVerror
resolutionanddid
Nevertheless, did limitations,
not
manually.
the exceed
The
absolute
limitations, resulting
3.53
results mm,
error
resulting in
reveal
in higher
and the relative
thatthe
remained
higher maximum
low.
relative er-
For
er-
the
the cracks
absolute
rors,
absolute withwider
error
error a than
remained
maximum
remained 2 cm,
below the
of
below absolute
4.14 mm
28.89%.
4.14 mm error measuring
Nevertheless,
when measuring notthe exceed
cracks less
absolute
cracks 3.53
than
less mm,
2
error
than cm 2 and
wide.
remained
cm wide. maximum
low. For
Appl. Sci. 2024, 14, 1778 values
rors,of the
relative
the absolute with crack
error
error widths
aa remained
maximum
was measured
9.57%, belowof
which byremained
28.89%.
4.14 the when
mm UAV and
Nevertheless,
below manually.
measuring thethe
10% The
cracks results
absolute
criterion. reveal
error that
cmremained
2These findings low.
indicateFor
cracks
rors,
However, with
the wider
binocular than
maximum 22 cm, the
of absolute
28.89%. error
Nevertheless, did nottheexceedinless
absolute than
3.53 mm,
error wide.
and
remained the 16
er- maximum
low. of 19
For
the relative
However,
cracks
absolute
cracks
However,
that
the
the
the
error
wider
error
wider
binocularthancamera
was
binocular
remained
than
measurements
9.57%,
camera
2
cameracm,
below
cm, of
had
which
had
the
the
had
resolution
4.14 remained
resolution
absolute
mm
absolute
resolution
cracks wider when
limitations,
error
below
limitations,
error did
measuring
limitations,
than did
2 cm
resulting
the
not
not
10%
resulting
exceed
cracks
exceed
resulting
fell within
higher
criterion.
less
in in higher
3.53
than
3.53
higher
relative
mm,
acceptable
These
2mm,cmrelative
and
wide.
relativeand
er-
findings
error the
the
er-
indicate
maximum
maximum
limits. These
rors,relative
with
that
cracks the a maximum
measurements
error
wider was of 28.89%.
9.57%,
thancamera
2 cm, of which
the Nevertheless,
cracks wider
remained
absolute than
errorthe
below 2absolute
2didcm fell
the
not error
within
10% remained
acceptable
criterion.
exceed 3.53 low.
These
mm, and For
error limits.
findings
the These
indicate
maximum
rors,that
However,
rors, with
relative
with
the
the measurements
aaprovide
maximum
binocular
error
maximum was of
9.57%,
of
ofhad
28.89%.cracks
which widerlimitations,
Nevertheless,
resolution
remained than the
below cm fell10%
absolute within
resulting
the error
in acceptable
remained
higher
criterion. relative
These error
low.er- limits.
For
findings These
indicate
relative
results
cracks
that wider
results
relative
that the
the
error
than 2was
measurements
provide
error
measurementscm, the28.89%.
9.57%,
valuable
valuable
was 9.57%, of
which
absolute
of
Nevertheless,
empirical
cracks
empirical
which
cracks
remained
error
widersupport
did not
support
remained
wider than
than
the
below
for
exceedabsolute
thethe
22absolute
for
below cm
the
cm fell
the
fell
10%error
utilization
3.53 within
utilization
10%
remained
mm,criterion.
and ofthe
binocular
acceptable
of
criterion.
within binocular
acceptable
low.
These
maximum
These
For
findings
cameras
error limits.
cameras
findings
error limits.
indicate
inThese
in crack
crack
indicate
These
rors,results
cracks
that
cracks wider
with a provide
the was
wider than
maximum 2
measurements
than9.57%, valuable
cm,
2 cm, of the ofempirical
absolute
28.89%.
thedemonstrate
absolute errorsupport
Nevertheless,
cracks wider
error diddidthanthe
not10%for
not the
exceed utilization
2 specific
exceedcm fell3.53
error mm,
within
3.53within
mm, of
remained
and binocular
and the
acceptable
thetheselow. maximum
maximum cameras
For
error in
limits.They crack
These
that
empirical
relative the
error
measurement.
results
that measurements
support
theprovide
measurements for
Theywhich
valuable theof cracks
utilization
remained
empirical
ofremained
cracks wider
below of
that
support
wider than
binocular
the
under
than for2 cm
210%the
cm fell
cameras
criterion.
fell
These in
conditions,
utilization
within acceptable
crack
findings
of binocular
acceptable error
measurement.
indicate limits.
measurements
cameras
error limits. in These
crackre-
cracks
relative
relative
that
wider
measurement.
error
error
results than
the measurements
demonstrate
results
main that
provide
informative
29.57%,
was9.57%,
was
provide cm,
They the
which
which
valuable
ofdespite
under cracks
valuable
absolute
demonstrate
remained
empirical
wider
specific
the
error
than
did under
that
below
below
support
conditions,
empirical
presence 2 cmof
support
not
thethe
fell
exceed
10%
for
within
these
for
relative the 3.53
specific mm,
criterion.
criterion.
utilization
acceptable
measurements
the utilization
errors.
and
conditions,
These
These of
error
the maximum
these
findings
findings
binocular measurements
indicate
indicate
limits.informative
remain
of binocular These cameras
cameras inThese
crack
despite
in crack
re-
relative
that theerror
thatmeasurement.
results
main
the was 9.57%,
measurements
provide
informative
measurements Theywhich
of
valuable
ofdespite
cracks remained
demonstrate
cracks wider
empirical
the
wider below
presence
than that
than 2cm
support
2 thethe
under
cm 10%
fell
for
ofutilization
relative
fell withincriterion.
specific
within
the of These error
utilization
errors.
acceptable findings
conditions,
acceptable oferror theseindicate
limits.
binocular
limits. measurements
TheseThese in crack
cameras re-
that
measurement.
results
the provide
presence
measurement.
the measurements
valuable They
of relative
They demonstrate
empirical
errors. support
demonstrate
ofdespite
cracks wider
that
for
thanthat2for
cm
under
under
fell within
specific
specific conditions,
binocular
conditions,
acceptable
cameras
error
these
these
limits.
measurements
in crack
measurements
These
re-
re-
results
main
results provide
provide valuable
informative
measurement. valuable They empirical the
demonstrate
empirical support
presence
support that
for the
of
under
the utilization
relative specific
utilization of of
errors.binocular
conditions,
binocular cameras
camerasthese in in crack
measurements
crack re-
measurement.
main
Table 6. TheThey
informative demonstrate
results despite
of the UAV’s that
the under
presence
measurementsspecific
of conditions,
relative
and these
errors.
manual measurements
measurements. re-
main
results provide
measurement.
Table
measurement.
main informative
6. The valuable
They
results
They
informative despite
empirical
demonstrate
ofthe
the
demonstrate
despite the
UAV’sthat
the presence
support
that under
measurements
under
presence
the of
for specific
of relative
utilization
specific and errors.
of
manual
conditions,
relative
binocular
conditions,
errors. these
these cameras in crack
measurements
measurements.
measurements re- re-
main informative despite presence of relative errors.
measurement.
Table
main
main The
6. 6.
informative
informative
Table They
The demonstrate
results of of
despite
despite
results thethe
the
theUAV’s that
presence
presence
UAV’s under
measurements specific
ofofrelative
relative
measurements andconditions,
errors.
errors.
andmanual
manual these measurements re-
measurements.
Table
mainTable 6. The
informativeThe results
results
despiteof of
thethe UAV’s
presence measurements
of Width
relative errors.
Crackmeasurements.
and manual
manual Width (mm)
measurements. Error Value
6. the UAV’sMax measurements and
Distance Crackmeasurements.
Width (mm) Error Value
Sample TableTable
6. The6.results
The ofSegmentation
the UAV’s
No. results of themeasurements
UAV’sMax Widthand manual
measurements measurements.
Distance
and manual measurements. Abs. Rel.
Sample
Sample Table 6.
Table 6. The No. of
Theresults
results Segmentation
ofthe
theUAV’s
UAV’smeasurements
measurements
(Pixels) and
and manual
manual Crack Width
Crack
measurements.
(m)measurements.
UAV Width (mm)
Manual(mm) Abs. Error
Error Value
Value
Rel.
Table 6. The results of the UAV’s Max(Pixels)
Max
measurements Width
Width
and manual (m)
Distance
Distance Crack
UAV
UAV
Crack
measurements.
Width
Width (mm)
Manual
Manual
(mm) (mm)Error Value
Error Value
(%)
Sample
Sample No. Segmentation
No. Crack
Segmentation Max Width Distance Crack Width (mm)Width (mm) Error Value Abs.
(mm) Rel.
Error Value
Abs. (%)
Rel.
Sample No. Segmentation
SegmentationMax Width Max Width Distance
Distance
(Pixels)
(Pixels) (m) (m)
Crack
Crack UAV
Width
Width UAV Abs.
(mm) Manual
Manual
Error Abs.
Value Rel.
Sample
Sample No.
No. Segmentation
#1 Max Width
(Pixels)
3.00 Distance
(m)Width
4.45 UAV
(mm)
UAV
23.27 Manual
Error
Manual
Value
Rel. Abs.
25.0Value (mm) (mm)
−1.73 Rel.
(%)
6.94
Sample No.
#1 Segmentation Max
Max Width
Width
(Pixels) Distance
3.00Distance
(m)
(Pixels) Crack
UAV
(m)
4.45 (mm)
Manual
UAV
23.27 Error
Manual
25.0 Abs.
−1.73 (%)
Rel.
6.94
Sample
Sample No.
No. Segmentation
Segmentation Max Width(Pixels)Distance (m) UAV (mm) Abs.Abs. (%)
Manual (mm)
Rel.Rel.
(mm) (%)
(%)
Sample No. Segmentation (Pixels)
(Pixels) (m)(m) UAVUAV Manual
Manual Abs. Rel.(mm) (%)
#1 #1
#2 (Pixels) 3.00
4.00
3.00
(m) 4.45
3.97
UAV 4.45 23.27
27.67
Manual 23.27 (mm)25.0
25.5
(mm)
25.0 −1.73
(%) −
2.17
(%)1.73 6.94
8.53
6.94
#1 #1
#2
#1 3.00 3.00
4.00
3.00 4.45 4.45
23.27
3.97
4.45 23.27 (mm)
25.0
27.67
23.27 −1.73 25.0 6.94
25.5
25.0 −1.73
2.17
(%)−1.73 6.94
8.53
6.94
#1
#1 #2 #1
#2 3.00
3.00 4.003.004.45
4.00 4.45 4.45
23.27
23.27
3.97 23.27
25.0
25.0
27.67 25.0
−1.73
−1.73
25.5 −1.73
6.94 6.94
2.17 6.94
8.53
#2
#1 #3
#2 4.00
3.00 4.24
4.00 3.97
4.45 3.97
3.78
27.67
23.27
3.97 27.67
30.22
25.5
25.0
27.67 25.5
33.0
2.1733.0
−1.73 25.5 2.17
−2.78
8.53−2.78
6.94 2.17 8.53
8.42
8.53
#2#3
#2 4.24
4.00
4.00 3.78
3.97
3.97 30.22
27.67
27.67 25.5
25.5 2.17
2.17 8.42
8.53
8.53
#2
#2 #3 #2 4.00
4.00 4.244.003.973.97 27.67
3.97
27.67 25.5
27.67
25.5 2.17
25.5
2.17 8.532.17
8.53 8.53
#3 4.24 3.78 3.78
30.22 30.22 −2.78
33.0 33.0 8.53 −2.78 8.42
#2
#3
#3
4.00
4.24
4.24
3.97 27.67
3.78
3.78
25.5
30.22 2.1733.0
30.22 33.0 8.42−2.78
−2.78 8.42
8.42
#3
#3 #3
#4 4.24
4.24 4.24
4.00 3.78
3.78 30.22
30.22
3.78
3.70 33.0
33.0
30.22
25.79 −2.78
−2.78
33.0
28.5 8.42 8.42
−2.78
−2.78
−2.71 8.42
9.50
#3 #3#4 #4 4.24 4.24
4.00
4.003.78 30.223.78 33.0
3.70
3.70
30.22−2.7828.5
25.79
25.79
33.0 8.42−2.71
28.5 −2.71
8.42
9.50
9.50
#4 #4 4.00 4.00 3.70 25.79
3.70 28.5
25.79 −2.7128.5 9.50−2.71 9.50
#4 #4#4
#4 4.00
4.00
4.00
4.003.70
3.70
3.70
3.70
25.79
25.79
25.79
28.5 25.79 −2.7128.5
28.5
28.5 9.50 −2.71
−2.71
−2.71 9.50
9.50
9.50
#4 #4
#4 #4 4.00 4.00
4.00 4.003.703.70 3.70
25.79 28.5
3.70
25.79
25.79
28.5 −2.71
25.79
28.5
−2.71
28.5
−2.71
9.50
9.50−2.71
9.50
9.50
#5
#5 2.00
2.00 4.47
4.47 15.58
15.58 13.5
13.5 2.08
2.08 15.41
15.41
#5 #5 2.00 2.004.47 4.47
15.58 15.58 2.0813.5 15.412.08
13.5 15.41
#5
#5 2.00
2.00 4.47
4.47 15.58
15.58 13.5
13.5 2.08
2.08 15.41
15.41
#5 #5#6#6
#5 2.00 2.00
2.83
2.00 4.47 4.47
4.19
15.58
4.47 13.5 15.58 2.08
20.66
15.58
13.5 15.41
22.5
13.5
2.08
−1.84
2.08
15.41
8.16
15.41
#5 #5
#6
#5 2.00 2.83
2.83
2.00 2.00 4.47
4.19
4.47 4.19
15.58 13.5
4.47
20.66
15.58 20.66
15.58
22.5 2.0822.5
13.5 −1.84 13.5 8.16−1.84
2.08 15.41 15.41
2.08 8.16
15.41
#5 2.00 4.47 15.58 13.5 2.08 15.41
#6 #6#6#7
#7
#6 2.83 2.83
2.83
3.00
3.00
2.83 4.19 4.19
4.19
3.98
20.66
3.98
4.19
20.66
22.5 20.66 −1.84
20.81
20.81
20.66
22.5
22.5 8.16
23.0
23.0
22.5
−1.84
−1.84
−2.19
−2.19
−1.84
8.16
8.16
9.53
9.53
8.16
#6 #7
#7
#6 #6 2.83 3.00
3.00
2.83 2.834.194.19
3.98 20.66 22.5
3.98
20.81
20.66
4.19 23.022.5 −1.84
20.81
20.66 −2.19−1.84
23.0
22.5 8.16 8.16
9.53−2.19
−1.84 9.53
8.16
#7 #7#7
#6
#7 3.00
2.83
3.00 3.98
4.19
3.98
20.81
20.66
20.81
23.0
22.5
23.0
−2.19
−1.84
−2.19
9.53
8.16
9.53
#8
#7 #8
#7 #8
#7 3.00 3.00
3.00 3.00
3.00
3.88
3.983.98 3.98
3.88
20.293.98 23.0
3.98
20.81
20.81 20.81
20.29
20.81−2.19
20.81
21.523.0 −1.2123.0
21.5
23.0 9.53
23.0
−2.19 −2.19
−1.21
−2.19
−2.19
5.65−1.21
9.53 9.53
5.65
9.53
9.53
#7 3.00 3.88
3.98 20.29
20.81 21.5
23.0 −2.19 5.65
9.53
#8 #7 3.00 3.00 3.88 3.98
20.29 20.81
21.5 23.0
−1.21 −2.19
5.65 9.53
#8 #8
#8 #8 #8 3.00
3.00 3.003.88
3.00 3.88 3.88
20.293.88 21.5
20.29 20.29
20.29−1.21
21.5 21.5
21.5 5.65
−1.21 −1.21
−1.21
5.65 5.65
5.65
#9 #9
#8
#9 4.24 3.00
4.24
3.00
4.24 3.71 3.88
3.71
27.41
3.88
3.71 20.29
27.41
30.0
20.29
27.41 −2.59 21.5
30.0
21.5
30.0 −1.21
−2.59
8.62−1.21
−2.59 5.65
8.62
5.65
8.62
#9
#8 4.24
3.00 3.71
3.88 27.41
20.29 30.0
21.5 −2.59
−1.21 8.62
5.65
#9 #9 4.24 4.24 3.71 27.41
3.71 30.0
27.41 −2.59
30.0 8.62
−2.59 8.62
#9
#9 #9#9
#10 4.24
4.24 4.24
4.00 3.71
4.24 3.71
4.51 27.413.71 30.0
3.71
27.41
31.44 34.5 27.41−2.59
27.41
30.0 −3.06 30.0 8.62
30.0
−2.59 8.87−2.59
−2.59
8.62 8.62
8.62
#9
#10
#9 4.24
4.00
4.24 3.71
4.51
3.71 27.41
31.44
27.41 30.0
34.5
30.0 −2.59
−3.06
−2.59 8.62
8.87
8.62
#10
#9 4.00
4.24 4.51
3.71 31.44
27.41 34.5
30.0 −3.06
−2.59 8.87
8.62
#10 4.00 4.51 31.44 34.5 −3.06 8.87
#11
#10
#10#10 #10 3.00
4.00
4.00 4.00 4.29
4.51 22.43
31.44
4.51 24.5
34.5
31.44 −2.07
−3.0634.5 8.45−3.06
8.87 8.87
#11
#10 4.004.51
3.00
4.00
31.44
4.29
4.51
4.51
34.5
22.43
31.44
31.44
−3.06
24.5
34.5
34.5
8.87
−2.07
−3.06
−3.06 8.45
8.87
8.87
#11 #11#10 3.00 3.00
4.00 4.29 4.29
4.51
22.43 22.43
31.44
24.5 24.5
34.5
−2.07 −2.07
−3.06
8.45 8.45
8.87
#11
#11 #11#10 3.00 4.004.29
3.00 3.00 4.29 4.51
22.43
22.43 31.44
24.524.5 −2.07 34.5
−2.07 8.45−3.06
8.45 8.87
4.29 22.43 24.5 −2.07 8.45
#11
#11 3.00
3.00 4.29
4.29 22.43
22.43 24.5
24.5 −2.07
−2.07 8.45
8.45
#12 #11#11 4.24 3.00
3.00
4.01 4.29
29.634.29 32.522.43
22.43 −2.87 24.5 −2.07
24.5 8.83−2.07 8.45
8.45
#12 #12#12 4.24 4.244.01 4.01
29.63 29.63 −2.87
32.5 32.5 −2.87
8.83 8.83
#12 4.24 4.244.01 4.01
29.63 29.63 −2.87
32.5 32.5 8.83 −2.87 8.83
#12 4.24 4.01 29.63 32.5 −2.87 8.83
#12#12 4.24
4.24 4.01
4.01 29.63
29.63 32.5
32.5 −2.87
−2.87 8.83
8.83
#12 4.24 4.01 29.63 32.5 −2.87 8.83
#13 #12 #12 5.66 4.24
4.24 4.09 4.01
40.34
4.01
29.63
38.5
29.63 1.8432.5
32.5 4.79−2.87
−2.87
8.83
8.83
#13 #13
#13 5.66
5.66 5.664.09
4.09 40.34
4.09
40.34 38.5
40.34 1.84
38.5 1.84
38.5 4.791.84
4.79 4.79
#13#13#13
#14 4.24
5.66 5.66
3.98
5.664.09 4.09
29.41
4.09
40.34 40.34
27.5 40.34 1.9138.5
38.5 38.5
1.84 6.94 1.84
1.84
4.79 4.79
4.79
#14 #13 4.24 5.663.98 4.09
29.41 40.34
27.5 38.5
1.91 6.941.84 4.79
#14
#15 #13#13
#14
#14
4.24
2.83 5.66
3.98
4.24
4.37
5.66
4.24 4.09
29.41
3.98
21.55
4.09
3.98 40.34
27.5
29.41 −1.95
23.5
40.34
29.41
1.9138.5
27.5
38.5
27.5
6.94 1.84
1.91
8.29 1.84
1.91 4.79
6.94
4.79
6.94
#14#14#14
#13 4.24 4.24
4.243.98
5.66 3.98
3.98
29.41
4.09 29.41
29.41
27.5
40.34 27.5
27.5
1.91
38.5 1.91
1.91
6.94
1.84 6.94
6.94
4.79
#15
#15 #14#14 2.83
2.83 4.24
4.244.37
4.37 21.55
3.98
3.98
21.55 23.5
29.41
29.41
23.5 −1.95
−1.9527.5
27.5 8.291.91
1.91
8.29−1.95 6.94
6.94
#16 #15
#14
#14 2.83 2.83
4.24
4.79
2.83
4.24 4.37
3.98
23.62
3.98 21.55 −2.3827.5
29.41
26.0
29.41 23.5
27.5 1.91
9.14−1.95
1.91 8.29
6.94
6.94
#15#15#15 2.83 2.83
2.834.37 4.37
21.55
4.37 21.55
23.5
21.55 23.5
−1.95
23.5 8.29
−1.95 8.29
8.29
#14 4.24 3.98 29.41 27.5 1.91 6.94
#16 #15
#16 2.83
2.83 2.834.79
4.79 23.62
4.37
23.62 26.0
21.55 −2.38
26.0 −2.38
23.5 9.14
9.14−1.95 8.29
#15
#15
#16 2.83
2.83 4.37
4.37
4.79 21.55
21.55
23.62 23.5
23.5
26.0 −1.95
−1.95
−2.38 8.29
8.29
9.14
#17
#16#16#16
#15 3.00
2.83 5.53
2.83
2.834.79 28.91
23.62
4.79
4.37
4.79 32.0 23.62 −3.09
26.0
23.62
21.55 −2.38
26.0
23.5
26.0 9.65−2.38
9.14
−1.95
−2.38 9.14
8.29
9.14
#17 #16
#17 3.00
3.00 2.835.53
5.53 4.79
28.91
28.91 23.62 −3.09
32.0
32.0 26.0 9.65
−3.09 −2.38
9.65 9.14
#18 #16#16 2.00 2.83
5.89
2.83 4.79
20.53
4.79 23.62 2.0326.0
18.5
23.62 26.0 10.97−2.38
−2.38 9.14
9.14
#17
#17#17#16 3.00 3.00
2.83 5.53 5.53
28.91
4.79 28.91
32.0
23.62 32.0
−3.09
26.0 −3.09
9.65
−2.38 9.65
9.14
#18 #17 2.00 3.00
3.00
5.89 5.53
5.53
20.53 28.91
18.5 28.91 32.0
2.03 32.0 −3.09
10.97−3.09 9.65
9.65
#18
#19 2.00
2.83 5.89
6.48 20.53
31.96 18.5
30 2.03
1.96 10.97
6.53
#17 3.00 5.53 28.91 32.0 −3.09 9.65
#17
#18 #17
#18 2.00 3.00
2.00
3.00 5.89 5.53
20.53
5.89
5.53 28.91
18.5
20.53
28.91 32.0
2.03
18.5
32.0 −3.09
10.97
2.03
−3.09 9.65
10.97
9.65
#19 #18
#19 #18 2.83
2.83 2.006.48
2.00
6.48 31.96
5.89
5.89
31.96 30
20.53
30 20.53 1.96
1.9618.518.5 6.53
6.53 2.03
2.03 10.97
10.97
#20 #17 3.00 3.00
4.74 5.53
24.78 28.91 −2.2232.0 8.22−3.09
27.0 9.65
Appl.Sci.
Appl. Sci. 2024,
2024, 14,14,
x FORx FOR PEER
PEER REVIEW #19 #18
REVIEW
2.83 2.00 6.48 5.89
31.96 20.53
30 18.5
1.96 17 of 19
2.03
6.53 17 of 19
10.97
#20 #19#18
#19
#18 3.00 2.00
2.83
2.004.74
2.83 5.89
6.48
24.78
5.89
6.48 20.53
31.96
27.0
20.53 18.5
−2.22
31.96−2.22 30
18.5 2.03
1.96
8.222.03 10.97
6.53
10.97
Appl. Sci. 2024, 14, x FOR PEER REVIEW #20#21 #19
#18
4.24
3.00 4.53
4.74
2.83
2.00
33.47
24.78
6.48
5.89
37.0
27.0
31.96
20.53
−3.53 30 30 8.22
18.5
9.53 1.96
1.96 2.03
6.53
6.53
17 of 19
10.97
Appl. Sci.
Appl. Sci. 2024,
2024, 14,
14, xx FOR
FOR PEER
PEER REVIEW
REVIEW 17 of
17 of 19
19
#20 #19
#21
#21 3.00
4.24
4.24 2.83 4.74
4.53
4.53 6.48
24.78
33.47
33.47 31.96
27.0
37.0
37.0 −3.5330
−2.22 9.531.96
8.22 6.53
#19
#20
#20
#19 2.83
3.00
3.00
2.83 6.48
4.74
4.74
6.48 24.78 −3.53
31.96
24.78
31.96 30
27.0
27.0
30
9.53 1.96
−2.22
−2.22
1.96 6.53
8.22
8.22
6.53
#20
#20#19 3.00
3.00
2.834.19 4.74
4.74
6.48 24.78
24.78
31.96 27.0
3027.0 −2.22
−2.22
1.96 8.22
8.22
6.53
#21
#22 #20 4.24
3.00 3.00 4.53 33.47
21.91
4.74 37.0
24.0
24.78 −3.53
−2.09
27.0 9.53
8.73
−2.22 8.22
#21#22
#20
#21 3.00
4.24
4.24 4.19
4.74
4.53
4.53 21.91
24.78
33.47
33.47 24.0
27.0
37.0
37.0 −2.09
−2.22
−3.53
−3.53 8.73
8.22
9.53
9.53
#20
#21 3.00
4.24 4.74
4.53 24.78
33.47 27.0
37.0 −2.22
−3.53 8.22
9.53
#23 #20#22
#20 2.23 3.003.98
3.00
4.74
15.47
4.19
4.74
24.78
12.0
21.91
24.78
27.0
3.47
24.0
27.0
−2.22
28.89
−2.09
−2.22
8.22
8.73
8.22
#22
#21
#23
#22
#22 3.00
4.24
2.23
3.00
3.00 4.19
4.53
3.98
4.19
4.19 21.91
33.47
15.47
21.91
21.91 24.0
37.0
12.0
24.0
24.0 −2.09
−3.53
3.47
−2.09
−2.09 8.73
9.53
28.89
8.73
8.73
#21 4.24 4.53 33.47 37.0 −3.53 9.53
#24 #21#23
#21 2.83 4.24
2.23
4.24 3.88 4.53
19.14
3.98
4.53
33.47
15.0
15.47
33.47
37.0
4.14
12.0
37.0
−3.53
27.57
3.47
−3.53
9.53
28.89
9.53
#23#23
#24
#23 2.23
2.83
2.23
2.23 3.98
3.88
3.98
3.98 15.47
19.14
15.47
15.47 12.0
15.0
12.0
12.0 3.47
4.14
3.47
3.47 28.89
27.57
28.89
28.89
#24#24
#24 2.83
2.83
2.83 3.88
3.88
3.88 19.14
19.14
19.14 15.0
15.0
15.0 4.14
4.14
4.14 27.57
27.57
27.57
#24
#25 4.24 2.83 4.00 3.88
29.56 19.14
32.5 15.0
−2.94 4.14
9.06 27.57
#25#25 4.24
4.24 4.00
4.00 29.56
29.56 32.5
32.5 −2.94
−2.94 9.06
9.06
#25
#25 4.24
4.24 4.00
4.00 29.56
29.56 32.5
32.5 −2.94
−2.94 9.06
9.06
#25 4.24 4.00 29.56 32.5 −2.94 9.06
4. Discussion
InDiscussion
4. this study, a binocular-vision UAV was employed to identify and quantify pave-
ment4.cracks with success. However, the UAV’s payload limitations restricted our ability
Discussion
In this study, a binocular-vision UAV was employed to identify and quantify pave-
4. Discussion
to utilize a higher-performance, long-baseline binocular camera, subsequently impacting
mentIncracks
the precision
this with success.
study,
of distance
However,
a binocular-vision the
UAVUAV’s payload
was employed limitations
to identifyrestricted our ability
and quantify pave-
In this study, measurements. This limitation
a binocular-vision UAV wasbecame particularly
employed pronounced
to identify and quantify pave-
to
ment
during utilize
changinga
cracks higher-performance,
with success. long-baseline
However, the UAV’s binocular
payload camera, subsequently
limitations restricted impacting
our ability
ment crackslighting conditions,
with success. when the
However, theeffects
UAV’s onpayload
ranging accuracy were
limitations signifi- our ability
restricted
the precision
to magnified.
cantly utilize of distance measurements.
a higher-performance, This limitation
long-baseline binocular became
camera, particularly
subsequently pronounced
impacting
Appl. Sci. 2024, 14, 1778 17 of 19

4. Discussion
In this study, a binocular-vision UAV was employed to identify and quantify pave-
ment cracks with success. However, the UAV’s payload limitations restricted our ability
to utilize a higher-performance, long-baseline binocular camera, subsequently impacting
the precision of distance measurements. This limitation became particularly pronounced
during changing lighting conditions, when the effects on ranging accuracy were signifi-
cantly magnified.
Considering these challenges, our future efforts will be directed toward overcoming
payload limitations. By increasing the UAV’s load capacity, we aim to integrate a more
advanced binocular camera system and a LiDAR unit. Such enhancements are expected
to not only augment the accuracy of distance measurements but also to refine overall
efficiency in quantifying pavement crack characteristics. Furthermore, we intend to inves-
tigate advanced image processing and deep learning algorithms to reduce the impact of
variations in ambient lighting on the accuracy of distance measurements. Through the
synergistic application of these technologies, we anticipate a significant enhancement in
both the performance and reliability of the system, all while maintaining its compactness
and efficiency.
In summary, although the current binocular vision system has its limitations, we are
optimistic that through systematic improvements and technological advancements, the
accuracy of ranging and crack detection can be effectively enhanced. Future efforts will be
concentrated on increasing the drone’s payload capacity, exploring higher-performance
sensing devices, and optimizing algorithms, with the goal of achieving higher pavement
crack recognition accuracy and crack quantification precision.

5. Conclusions
This paper introduces a deep learning model that integrates the YOLOv5s model,
efficient channel attention, and a decoupled head structure, aiming to enhance the accuracy
and speed of pavement crack detection, particularly for extra-long cracks on road surfaces.
Furthermore, by employing binocular camera technology, this study expands the applica-
tion scenarios for UAV-based crack analysis, overcoming the limitation of needing to preset
the shooting distance inherent in previous UAV crack-detection technologies. The main
findings and conclusions are as follows:
1. An improved crack detection algorithm is proposed, significantly enhancing pave-
ment crack recognition accuracy through the optimization of the detection network
structure. This algorithm meets stringent accuracy requirements for crack detection in
real-world applications, with notable increases in network recall to 86.82% and mAP
to 86.32%.
2. This study describes a method to accurately measure the photographed depths of
different portions of a long crack in a roadway surface using a binocular unmanned
aerial vehicle capable of quantifying the widths of various sections of long cracks
within images. The method successfully solves the errors arising from relying on
approximate distances for depth measurements when UAVs detect pavement and
bridge cracks.
3. The combined use of binocular UAV vision and deep learning algorithms for crack
detection is effectively applied to the quantitative analysis of pavement cracks. For
cracks wider than 2 cm, the absolute error does not exceed 3.53 mm, and the maximum
relative error is maintained at 9.57%, remaining below the 10% standard.
These outcomes furnish empirical support for the precision of crack measurement
techniques.

Author Contributions: Conceptualization, H.X.; methodology, J.Z.; software, W.H.; validation, P.L.
and K.Z.; investigation, P.L.; resources, R.G. and W.H.; writing—original draft preparation, J.Z.;
writing—review and editing, H.X.; visualization, J.Z.; supervision, R.G.; project administration, K.Z.;
funding acquisition, H.X. All authors have read and agreed to the published version of the manuscript.
Appl. Sci. 2024, 14, 1778 18 of 19

Funding: This research was funded by National Natural Science Foundation of China, grant num-
ber 12262015.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author. The data are not publicly available due to privacy.
Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Hsieh, Y.-A.; Tsai, Y.J. Machine learning for crack detection: Review and model performance comparison. J. Comput. Civ. Eng.
2020, 34, 04020038. [CrossRef]
2. Ali, R.; Chuah, J.H.; Talip, M.S.A.; Mokhtar, N.; Shoaib, M.A. Structural crack detection using deep convolutional neural networks.
Autom. Constr. 2022, 133, 103989. [CrossRef]
3. Kheradmandi, N.; Mehranfar, V. A critical review and comparative study on image segmentation-based techniques for pavement
crack detection. Constr. Build. Mater. 2022, 321, 126162. [CrossRef]
4. Taha, B.; Shoufan, A. Machine learning-based drone detection and classification: State-of-the-art in research. IEEE Access 2019, 7,
138669–138682. [CrossRef]
5. Meng, S.; Gao, Z.; Zhou, Y.; He, B.; Djerrad, A. Real-time automatic crack detection method based on drone. Comput.-Aided Civ.
Infrastruct. Eng. 2023, 38, 849–872. [CrossRef]
6. Liu, Y.F.; Nie, X.; Fan, J.S.; Liu, X.G. Image-based crack assessment of bridge piers using unmanned aerial vehicles and three-
dimensional scene reconstruction. Comput.-Aided Civ. Infrastruct. Eng. 2020, 35, 511–529. [CrossRef]
7. Liu, Y.; Hajj, M.; Bao, Y. Review of robot-based damage assessment for offshore wind turbines. Renew. Sustain. Energy Rev. 2022,
158, 112187. [CrossRef]
8. Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network. In Proceedings of the
2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3708–3712.
9. Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep learning-based crack damage detection using convolutional neural networks. Comput.-
Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [CrossRef]
10. Jiang, Y.; Pang, D.; Li, C. A deep learning approach for fast detection and classification of concrete damage. Autom. Constr. 2021,
128, 103785. [CrossRef]
11. Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature pyramid and hierarchical boosting network for pavement
crack detection. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1525–1535. [CrossRef]
12. Que, Y.; Dai, Y.; Ji, X.; Leung, A.K.; Chen, Z.; Tang, Y.; Jiang, Z. Automatic classification of asphalt pavement cracks using a novel
integrated generative adversarial networks and improved VGG model. Eng. Struct. 2023, 277, 115406. [CrossRef]
13. Rao, A.S.; Nguyen, T.; Palaniswami, M.; Ngo, T. Vision-based automated crack detection using convolutional neural networks for
condition assessment of infrastructure. Struct. Health Monit. 2021, 20, 2124–2142. [CrossRef]
14. Yu, Y.; Samali, B.; Rashidi, M.; Mohammadi, M.; Nguyen, T.N.; Zhang, G. Vision-based concrete crack detection using a hybrid
framework considering noise effect. J. Build. Eng. 2022, 61, 105246. [CrossRef]
15. Silva, W.R.L.d.; Lucena, D.S.d. Concrete cracks detection based on deep learning image classification. Proceedings 2018, 2, 489.
16. Duan, H.; Xin, L.; Chen, S. Robust cooperative target detection for a vision-based UAVS autonomous aerial refueling platform via
the contrast sensitivity mechanism of eagle’s eye. IEEE Aerosp. Electron. Syst. Mag. 2019, 34, 18–30. [CrossRef]
17. Ma, Y.; Li, Q.; Chu, L.; Zhou, Y.; Xu, C. Real-time detection and spatial localization of insulators for UAV inspection based on
binocular stereo vision. Remote Sens. 2021, 13, 230. [CrossRef]
18. Shuai, C.; Wang, H.; Zhang, W.; Yao, P.; Qin, Y. Binocular vision perception and obstacle avoidance of visual simulation system
for power lines inspection with UAV. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28
July 2017; pp. 10480–10485.
19. Gopalakrishnan, K.; Khaitan, S.K.; Choudhary, A.; Agrawal, A. Deep convolutional neural networks with transfer learning for
computer vision-based data-driven pavement distress detection. Constr. Build. Mater. 2017, 157, 322–330. [CrossRef]
20. Lei, B.; Wang, N.; Xu, P.; Song, G. New crack detection method for bridge inspection using UAV incorporating image processing.
J. Aerosp. Eng. 2018, 31, 04018058. [CrossRef]
21. Liu, Y.; Yeoh, J.K.; Chua, D.K. Deep learning–based enhancement of motion blurred UAV concrete crack images. J. Comput. Civ.
Eng. 2020, 34, 04020028. [CrossRef]
22. Kim, H.; Lee, J.; Ahn, E.; Cho, S.; Shin, M.; Sim, S.-H. Concrete crack identification using a UAV incorporating hybrid image
processing. Sensors 2017, 17, 2052. [CrossRef]
23. Liu, Y.; Bao, Y. Intelligent monitoring of spatially-distributed cracks using distributed fiber optic sensors assisted by deep learning.
Measurement 2023, 220, 113418. [CrossRef]
24. Park, S.E.; Eem, S.-H.; Jeon, H. Concrete crack detection and quantification using deep learning and structured light. Constr. Build.
Mater. 2020, 252, 119096. [CrossRef]
Appl. Sci. 2024, 14, 1778 19 of 19

25. Yu, J.-y.; Li, F.; Xue, X.-k.; Zhu, P.; Wu, X.-y.; Lu, P.-s. Intelligent Identification of Bridge Structural Cracks Based on Unmanned
Aerial Vehicle and Mask R-CNN. China J. Highw. Transp. 2021, 34, 80–90.
26. Peng, X.; Zhong, X.; Zhao, C.; Chen, A.; Zhang, T. A UAV-based machine vision method for bridge crack recognition and width
quantification through hybrid feature learning. Constr. Build. Mater. 2021, 299, 123896. [CrossRef]
27. Zhou, Q.; Ding, S.; Qing, G.; Hu, J. UAV vision detection method for crane surface cracks based on Faster R-CNN and image
segmentation. J. Civ. Struct. Health Monit. 2022, 12, 845–855. [CrossRef]
28. Ding, W.; Yang, H.; Yu, K.; Shu, J. Crack detection and quantification for concrete structures using UAV and transformer. Autom.
Constr. 2023, 152, 104929. [CrossRef]
29. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020;
pp. 11534–11542.
30. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141.
31. Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430 2021.
32. Hirschmuller, H. Accurate and efficient stereo processing by semi-global matching and mutual information. In Proceedings of the
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25
June 2005; pp. 807–814.
33. Zhou, Z.; Siddiquee, M.; Tajbakhsh, N.; Liang, J.U. A nested U-Net architecture for medical image segmentation (2018). arXiv
2018, arXiv:1807.10165.
34. Zou, Q.; Cao, Y.; Li, Q.; Mao, Q.; Wang, S. CrackTree: Automatic crack detection from pavement images. Pattern Recognit. Lett.
2012, 33, 227–238. [CrossRef]
35. Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic road crack detection using random structured forests. IEEE Trans. Intell.
Transp. Syst. 2016, 17, 3434–3445. [CrossRef]
36. Amhaz, R.; Chambon, S.; Idier, J.; Baltazart, V. Automatic crack detection on two-dimensional pavement images: An algorithm
based on minimal path selection. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2718–2729. [CrossRef]
37. Eisenbach, M.; Stricker, R.; Seichter, D.; Amende, K.; Debes, K.; Sesselmann, M.; Ebersbach, D.; Stoeckert, U.; Gross, H.-M. How to
get pavement distress detection ready for deep learning? A systematic approach. In Proceedings of the 2017 International Joint
Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2039–2047.
38. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of
the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37.
39. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings
of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28.
40. Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International
Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
Reproduced with permission of copyright owner. Further reproduction
prohibited without permission.

You might also like