Sensors 23 07395
Sensors 23 07395
Article
A Transformer-Optimized Deep Learning Network for Road
Damage Detection and Tracking
Niannian Wang 1 , Lihang Shang 1 and Xiaotian Song 2, *
1 School of Water Conservancy and Transportation, Zhengzhou University, Zhengzhou 450001, China
2 School of Engineering and Technology, China University of Geosciences (Beijing), Beijing 100083, China
* Correspondence: [email protected]
Abstract: To solve the problems of low accuracy and false counts of existing models in road damage
object detection and tracking, in this paper, we propose Road-TransTrack, a tracking model based on
transformer optimization. First, using the classification network based on YOLOv5, the collected road
damage images are classified into two categories, potholes and cracks, and made into a road damage
dataset. Then, the proposed tracking model is improved with a transformer and a self-attention
mechanism. Finally, the trained model is used to detect actual road videos to verify its effectiveness.
The proposed tracking network shows a good detection performance with an accuracy of 91.60%
and 98.59% for road cracks and potholes, respectively, and an F1 score of 0.9417 and 0.9847. The
experimental results show that Road-TransTrack outperforms current conventional convolutional
neural networks in terms of the detection accuracy and counting accuracy in road damage object
detection and tracking tasks.
1. Introduction
For economic development and social benefits, the health of roads is crucial. In daily
life, repeated crushing by vehicles can cause damage to the structural layer of the road,
Citation: Wang, N.; Shang, L.;
which in turn produces cracks, potholes and other damage. The road performance and load
Song, X. A Transformer-Optimized
carrying capacity will suffer as a result of pavement degradation [1,2]. If pavement damage
Deep Learning Network for Road
is not repaired in a timely manner, rain and snow, as well as vehicle loads, will deepen
Damage Detection and Tracking.
the degree of pavement damage, which will seriously affect people’s travel and safety and
Sensors 2023, 23, 7395. https://
doi.org/10.3390/s23177395
thus have an impact on social benefits. Therefore, regular maintenance of roads is very
important. For road maintenance, one of the main aspects lies in efficient and accurate
Academic Editor: Biswanath road damage detection. Currently, manual inspection and analysis is the main method of
Samanta detecting pavement damage in China; however, manual inspection is often tedious and
Received: 20 July 2023 inefficient [3]. Although manual inspection has obvious operational advantages, when the
Revised: 11 August 2023 inspector is inexperienced, the assessment of the degree of damage can be inaccurate, thus
Accepted: 23 August 2023 adversely affecting the pavement evaluation process [4–6]. The drawbacks of these manual
Published: 24 August 2023 inspections mean that this method no longer meets the increasing requirements of modern
society for road damage detection.
often unable to meet the actual needs in terms of recognition accuracy and speed, and
this type of equipment often incurs higher hardware costs, corresponding to an increase
in detection costs. For example, some vibration-based detection methods are suitable
for real-time assessment of pavement conditions [9], but they cannot measure pavement
damage in areas outside the vehicle wheel path or identify the size of pavement damage.
Laser-measurement-based inspection methods use special equipment, such as a laser
scanner, mounted on a separate inspection vehicle [10–13] to convert the pavement into
a three-dimensional object in a coordinate system, and this method allows for the direct
calculation of various metrics for an accurate evaluation of pavement condition. However,
real-time processing at high speeds is difficult and relatively expensive due to the increased
amount of computation required. Compared with the high cost of automatic detection,
the benefits of image processing technology include a great effectiveness and low cost.
As technology advances, its recognition accuracy also gradually improves. Therefore,
numerous researchers have chosen to use image processing methods for the detection of
pavement damage [14–16]. Traditional image processing methods use manually chosen
features, such as color, texture and geometric features, to first segment pavement faults,
and then machine learning algorithms are used to classify and match them for pavement
damage detection purposes. For instance, Fernaldez et al. [17] began by preprocessing
cracked photos of a road in order to highlight the major aspects of the cracks, and then
chose a decision tree heuristic algorithm and finally achieved classification of the images.
Rong G et al. [18] performed entropy and image dynamic threshold segmentation of
pavement crack pixels based on thresholds obtained from image histograms as a way to
classify cracked and non-cracked pixels. Bitelli G et al. [19] proposed another application of
image processing to crack recognition, focusing and obtaining additional noise-free images
of specific cracks. Li Q et al. [20] presented an image processing algorithm for accuracy
and efficiency, which was specifically used for fast evaluation of pavement surface cracks.
Song E P et al. [21] proposed an innovative optimized two-phase calculation method for
primary surface profiles to detect pavement crack damage. Traditional image processing
techniques cannot, however, meet the requirements of model generalization capability and
resilience in real-world engineering through manually planned feature extraction due to
the complexity of the road environment. For example, it is often impossible to segment an
image effectively when it contains conditions such as uneven illumination.
convolutional networks (FCN), Yang et al. [31] were able to successfully identify cracks
at the pixel level in pavement and wall images, but there was still a shortcoming of poor
detection of small cracks. Jeong et al. [32] improved a model based on You Only Look Once
(YOLO)v5x with Test-Time Augmentation (TTA), which could generate new images for
data enhancement then combine the original photographs with the improved images in the
trained u-YOLO. Although this method achieved a high detection accuracy, the detection
speed was not good. Many other researchers have worked to test lightweight models.
Shim et al. [33] developed a semantic segmentation network with a small volume. They
improved the network’s parameters, but at the same time affected the detection speed of
the model. Sheta et al. [34] developed a lightweight convolutional neural network model,
which had a good crack detection effect. However, this model still had the problem of
a single application scenario and could not deal with multiple road damage detection.
Guo et al. [35] improved the model based on YOLOv5s to achieve the purpose of detecting
a variety of road damage, and achieved a high accuracy in damage detection. However,
the improved model was somewhat higher in weight, and meeting the criteria of embed-
ded devices proved difficult. In addition, Ma D. et al. [36] proposed an algorithm called
YOLO-MF that combines an acceleration algorithm and median flow for intelligent recogni-
tion of pavement cracks, achieving high recognition accuracy and a good PR curve. All of
the above researchers have made reasonable contributions to road damage detection, but
there are some deficiencies. For example, the models only detect crack damage, they cannot
find a reasonable balance between detection efficiency and accuracy, they cannot effectively
detect damage in road videos, etc. These are problems that still need to be studied and
solved.
YOLOv5 is a single-stage target detection algorithm. Four versions of the YOLOv5
single-stage target detection model exist: YOLOv5s, YOLOv5m, YOLOv5l and YOLOv5x.
For this study, the fastest and smallest model, YOLOv5s, with parameters of 7.0 M and
weights of 13.7 M, was selected. YOLOv5 makes the following improvements compared to
YOLOv4: For input side, the model training phase makes use of mosaic data augmentation,
adaptive anchor frame computation and adaptive picture scaling. The benchmark network
makes use of the FOCUS structure and the Cross Stage Partial (CSP) structure. In the Neck
network, between the Backbone and the final Head output layer, the Feature Pyramid
Network (FPN)_Path Aggregation Network (PAN) structure is added. The loss function
named Generalized Intersection over Union Loss (GIOU_Loss) is added to the Head output
layer during training and predicts the Distance-IOU_nns of the screening frame.
As shown in Figure 1, the algorithm framework is split into three major sections:
the backbone network (Backbone), the bottleneck network (Neck) and the detection layer
(Output). The Backbone consists of a focus module (focus), a standard convolution module
(Conv), a C3 module and a spatial pyramid pooling module (SPP). In YOLOv5, the network
architecture is the same for all four versions, and two variables determine the network
structure’s size: depth_multiple and width_multiple. For instance, the C3 operation of
YOLOv5s is performed just once, while YOLOv5l is three times as deep as v5s and three
C3 surgeries will therefore be carried out. Since the one-stage network YOLOv5s technique
leverages multilayer feature map prediction, it produces improved outcomes in terms of
detecting speed and accuracy.
evaluated the potential of using HSR multispectral digital aerial photographs to estimate
overall pavement deterioration using principal component analysis and linear least squares
regression models. The images obtained from aerial photography can also be used to train
models for pavement damage recognition. Ahmet Bahaddin Ersoz et al. [39] processed
a UAV-based pavement crack recognition system by processing UAV-based images for
support vector machine (SVM) model training. Ammar Alzarrad et al. [40] demonstrated
the effectiveness of combining AI and UAVs by combining high-resolution imagery with
deep learning to detect disease on roofs. Long Ngo Hoang, T et al. [41] presented a
methodology based on the mask regions with a convolutional neural network model,
Sensors 2023, 23, x FOR PEER REVIEWwhich was coupled with the new object detection framework Detectron2 to train a 4model
of 20
that utilizes roadway imagery acquired from an unmanned aerial system (UAS).
qi = Wq xi (3)
2.2.Road-TransTrack
2.2. Road-TransTrackDetection
Detection and
and Tracking Model
Model
Traditional deep-learning-based
Traditional deep-learning-based pavement
pavement damage
damage detection algorithms
detection are often
algorithms are ef-
often
fective ininobtaining
effective obtainingthethe
class andand
class location of damage.
location However,
of damage. for sequences
However, of consecu-
for sequences of con-
tive frames,
secutive conventional
frames, conventionaldetection algorithms
detection cannotcannot
algorithms effectively identify
effectively the same
identify theim-
same
pairment and cannot accurately count multiple impairments. In this study,
impairment and cannot accurately count multiple impairments. In this study, the proposed the proposed
detectiontracking
detection trackingmodel
modelcalled
called Road-TransTrack
Road-TransTrack can can solve
solve the
the above
above problem.
problem.Detection
Detection is
is a static task that generally finds regions of interest based on a priori
a static task that generally finds regions of interest based on a priori knowledge knowledge or orsali-
salient
ent features. Tracking, however, is a fluid job, finding the same thing in a series
features. Tracking, however, is a fluid job, finding the same thing in a series of successive of succes-
sive frames
frames by means
by means of characteristics
of characteristics carried
carried overover
fromfrom
thethe earlier
earlier frame.
frame. TheThe trackingtask
tracking
task checks the picture similarity of the previous and current frames to find the best
checks the picture similarity of the previous and current frames to find the best matching
matching position to find the target’s dynamic path.
position to find the target’s dynamic path.
As illustrated in Figure 4, successive frames of the pavement video are first fed into
the model, defects are detected when they first appear in frame Ft and the amount of de-
fects is increased by one. The frames Ft and Ft+1 are then fed into the tracking model. This
damage continues to be tracked till it vanishes from the video, and IOU (Intersection over
Union) matching is performed between the tracked and detected frames to obtain the
tracking result. The detection and counting of the next damage continue. Finally, the over-
Sensors 2023, 23, 7395 7 of 18
As illustrated in Figure 4, successive frames of the pavement video are first fed into
the model, defects are detected when they first appear in frame Ft and the amount of
defects is increased by one. The frames Ft and Ft+1 are then fed into the tracking model.
This damage continues to be tracked till it vanishes from the video, and IOU (Intersection
over Union) matching is performed between the tracked and detected frames to obtain
the tracking result. The detection and counting of the next damage continue. Finally, the
Sensors 2023, 23, x FOR PEER REVIEW 8 of 20
overall number of discovered defects is determined. Meanwhile, the network is improved
with the transformer to enhance the performance of the network.
3.3.Dataset
DatasetConstruction
Construction
3.1. Data Collection
3.1.Like
Datathe
Collection
deep convolutional neural network model, the transformer-improved net-
Like the
work model deep
also convolutional
necessitates neural
a lot of image network
data asmodel, the transformer-improved
the dataset. Images in today’s road net-
work model
damage also
datasets necessitates
have problems alikeloterratic
of image data as inconsistent
resolution, the dataset. picture
Images datain today’s road
capturing
damage datasets
equipment have problems
and extrinsic influenceslike erratic
such resolution,
as lighting inconsistent
and shadows. picture
These havedata capturing
a significant
equipment
impact on theand extrinsic
criteria influences
for the datasets such
used as
to lighting
train theand shadows.
models. Thesethis
Therefore, have a significant
study used a
pavement
impact ondamage dataset
the criteria for that was collected
the datasets used to and produced
train by us.Therefore,
the models. The initialthis
image
studyacqui-
used
sition device is
a pavement an integrated
damage dataset vehicle
that wasused for pavement
collected detection,
and produced as The
by us. shown in Figure
initial 5.
image ac-
The parameters of the on-board camera are shown in Table 1. Combined with
quisition device is an integrated vehicle used for pavement detection, as shown in Figure the actual
acquisition needs, the
5. The parameters shooting
of the height
on-board was are
camera set shown
between in 40 and1.80
Table cm to ensure
Combined withthethe right
actual
size of damage
acquisition in the
needs, theimages.
shootingImages
heightwere
was captured
set betweenunder normal
40 and 80 cm lighting for several
to ensure the right
size of damage in the images. Images were captured under normal lighting for several
asphalt as well as concrete roads, and then images with high clarity and a balanced
amount of damage were manually retained for the next step of processing.
Sensors 2023, 23, 7395 8 of 18
The
Theimages
imagesofofpotholes
potholesand
andcracks
cracksfiltered
filteredby
by the
the classification
classification network
network areare shown
shown in
in
Figure 6. Data annotation was performed on these images to construct the dataset
Figure 6. Data annotation was performed on these images to construct the dataset required required
for
fortraining.
training.InIntotal,
total,there
thereare
are310
310potholes
potholesandand300
300 cracks
cracks in
in the
the training set, 104
training set, 104 potholes
potholes
and 101 cracks in the validation set and 103 potholes and 100 cracks in the
and 101 cracks in the validation set and 103 potholes and 100 cracks in the test set. test set.
Figure6.6.Damage
Figure Damageimages
imagesobtained
obtainedfrom
fromclassification
classificationnetworks.
networks.
Figure7.7.The
Figure Thedecline
declinecurve of loss.
curve of loss.
Sensors 2023, 23, 7395
Sensors 2023, 23, x FOR PEER REVIEW 11 of 18 1
In order to show the comparison results more intuitively, the same frame was
tected with our network and the traditional CNN network, respectively, and the res
were compared. As shown in Figure 15, for crack images, (a) the set of detection ima
demonstrates that the four networks detect approximately the same effect when ther
Sensors 2023, 23, 7395 15 of 18
In order to show the comparison results more intuitively, the same frame was de-
tected with our network and the traditional CNN network, respectively, and the results
were compared. As shown in Figure 15, for crack images, (a) the set of detection images
demonstrates that the four networks detect approximately the same effect when there is
only one crack in the figure and (b) the group detection images show that when multiple
cracks appear in the figure, our network shows a better detection effect, without missing or
wrong detections, and counts are carried out. For potholes, (a) the set of detection images
shows that each network can accurately detect the two potholes present in the figure when
the pothole size feature is obvious, and also our network counts the potholes and (b) the
group
Sensors 2023, 23, x FOR PEER detection images show that our network detects a pothole, while all other networks
REVIEW 17 of
produce false detections, i.e., parts of the ground that are similar in shape to potholes are
detected as potholes.
Figure 15. Comparison of different network detection results: (a1) comparison of individual crack
Figure 15. Comparison of different network detection results: (a1) comparison of individual crack
detection results; (a2) comparison of multiple crack detection results; (b1) comparison of individ-
detection results; (a2) comparison of multiple crack detection results; (b1) comparison of individual
ual pothole detection results; (b2) comparison of multiple pothole detection results.
pothole detection results; (b2) comparison of multiple pothole detection results.
The above comparative tests show that the proposed model has good performan
The above comparative tests show that the proposed model has good performance
in terms of detection accuracy and accuracy of damage statistics. However, the detectio
in terms of detection accuracy and accuracy of damage statistics. However, the detection
speed of the current model does not meet the requirement of real-time execution. Fro
speed of the current model does not meet the requirement of real-time execution. From
the establishment of the dataset to the subsequent part of model testing, the current stud
the establishment of the dataset to the subsequent part of model testing, the current study
uses images
uses pavement pavement andimages
videos and
takenvideos
on thetaken on so
ground, thethe
ground, so the generalization
generalization degree of the degree
model needstheto
model needsinvestigated.
be further to be further For
investigated.
example, For example, the
the collection collection damage
of pavement of pavement dam
age images can be carried out using the UAS technique to enrich the dataset
images can be carried out using the UAS technique to enrich the dataset required for model required f
model training; the model can then be used for the detection of images
training; the model can then be used for the detection of images and videos captured by and videos ca
tured by the UAS for better and efficient assessment
the UAS for better and efficient assessment of pavement damage. of pavement damage.
6. Conclusions
For pavement damage video inspections, the detection accuracy is not high, resultin
in the problem of repeated counting of missed detections. The main contribution of th
study is the proposed tracking counting network called Road-TransTrack. When damag
Sensors 2023, 23, 7395 16 of 18
6. Conclusions
For pavement damage video inspections, the detection accuracy is not high, resulting
in the problem of repeated counting of missed detections. The main contribution of
this study is the proposed tracking counting network called Road-TransTrack. When
damage first appears in a video, it is detected and tracked until the defect disappears and
the number of damages increases by 1. The tracking and counting model is improved
with a transformer and a self-attention mechanism to improve the accuracy of damage
detection and counting in road videos. Compared to the classic CNN network, the F1
score of the transformer-optimized detection network is 96.64, with an average accuracy
of 95.10%, which are 12.49% and 2.74% higher than the optimal CNN model, respectively.
A comparison of actual frame image detections shows that compared to other classical
CNN networks, the model does not have the phenomena of missing and wrong detections.
Additionally, the detection results of two road videos show that the model can track and
count potholes and cracks correctly. All the above results indicate that the model in this
study possesses better performance in video detection and tracking of road damage. In the
future, we will consider training and testing models for more types of road damage.
Author Contributions: Writing—original draft, L.S.; Writing—review and editing, N.W.; Investiga-
tion, X.S. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Key Research and Development Program of
China (No. 2022YFC3801000), the National Natural Science Foundation of China (No. 51978630), the
Program for Innovative Research Team (in Science and Technology) in University of Henan Province
(No. 23IRTSTHN004), the National Natural Science Foundation of China (No. 52108289), the Program
for Science & Technology Innovation Talents in Universities of Henan Province (No. 23HASTIT006),
the Postdoctoral Science Foundation of China (No. 2022TQ0306), the Key Scientific Research Projects
of Higher Education in Henan Province (No. 21A560013) and the Open Fund of Changjiang Institute
of Survey, Lanning, Design and Research (No. CX2020K10).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data are contained within the article.
Conflicts of Interest: The authors declare that there are no conflicts of interest regarding the publica-
tion of this paper.
References
1. Yao, Y.; Tung, S.-T.E.; Glisic, B. Crack detection and characterization techniques-An overview. Struct. Control. Health Monit. 2014,
21, 1387–1413. [CrossRef]
2. Jahanshahi, M.R.; Masri, S.F. A new methodology for non-contact accurate crack width measurement through photogrammetry
for automated structural safety evaluation. Smart Mater. Struct. 2013, 22, 035019. [CrossRef]
3. Barreira, E.; de Freitas, V.P. Evaluation of building materials using infrared thermography. Constr. Build. Mater. 2007, 21, 218–224.
[CrossRef]
4. Wang, N.; Zhao, X.; Zhao, P.; Zhang, Y.; Zou, Z.; Ou, J. Automatic damage detection of historic masonry buildings based on
mobile deep learning. Autom. Constr. 2019, 103, 53–66. [CrossRef]
5. Gattulli, V.; Chiaramonte, L. Condition assessment by visual inspection for a bridge management system. Comput.-Aided Civ.
Infrastruct. Eng. 2005, 20, 95–107. [CrossRef]
6. O’Byrne, M.; Schoefs, F.; Ghosh, B.; Pakrashi, V. Texture Analysis Based Damage Detection of Ageing Infrastructural Elements.
Comput.-Aided Civ. Infrastruct. Eng. 2013, 28, 162–177. [CrossRef]
7. Torbaghan, M.E.; Li, W.; Metje, N.; Burrow, M.; Chapman, D.N.; Rogers, C.D.F. Automated detection of cracks in roads using
ground penetrating radar. J. Appl. Geophys. 2020, 179, 104118. [CrossRef]
8. Hadjidemetriou, G.M.; Vela, P.A.; Christodoulou, S.E. Automated Pavement Patch Detection and Quantification Using Support
Vector Machines. J. Comput. Civ. Eng. 2018, 32, 04017073. [CrossRef]
9. Chang, K.T.; Chang, J.R.; Liu, J.K. Detection of Pavement Distresses Using 3D Laser Scanning Technology. In Proceedings of the
International Conference on Computing in Civil Engineering, Cancun, Mexico, 12–15 July 2005.
10. Li, S.; Yuan, C.; Liu, D.; Cai, H. Integrated Processing of Image and GPR Data for Automated Pothole Detection. J. Comput. Civ.
Eng. 2016, 30. [CrossRef]
Sensors 2023, 23, 7395 17 of 18
11. Huang, Y.; Xu, B. Automatic inspection of pavement cracking distress. J. Electron. Imaging 2006, 15, 013017. [CrossRef]
12. Zou, Q.; Cao, Y.; Li, Q.; Mao, Q.; Wang, S. Crack Tree: Automatic crack detection from pavement images. Pattern Recogn. Lett.
2012, 33, 227–238. [CrossRef]
13. Oliveira, H.; Correia, P.L. Automatic Road Crack Segmentation Using Entropy and Image Dynamic Thresholding. In Proceedings
of the 2009 17th European Signal Processing Conference, Glasgow, UK, 24–28 August 2009.
14. Nguyen, T.S.; Bégot, S.; Duculty, F.; Avila, M. Free-form anisotropy: A new method for crack detection on pavement surface
images. In Proceedings of the IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011.
15. Nguyen, H.T.; Nguyen, L.T.; Sidorov, D.N. A robust approach for road pavement defects detection and classification. Irkutsk. Natl.
Res. Tech. Univ. 2016, 3, 40–52. [CrossRef]
16. Safaei, N.; Smadi, O.; Masoud, A.; Safaei, B. An Automatic Image Processing Algorithm Based on Crack Pixel Density for
Pavement Crack Detection and Classification. Int. J. Pavement Res. Technol. 2021, 15, 159–172. [CrossRef]
17. Cubero-Fernandez, A.; Rodriguez-Lozano, F.J.; Villatoro, R.; Olivares, J.; Palomares, J.M. Efficient pavement crack detection and
classification. Eurasip J. Image Video Process. 2017, 2017, 1. [CrossRef]
18. Rong, G.; Xin, X.; Dejin, Z.; Hong, L.; Fangling, P.; Li, H.; Min, C. A Component Decomposition Model for 3D Laser Scanning
Pavement Data Based on High-Pass Filtering and Sparse Analysis. Sensors 2018, 18, 2294.
19. Bitelli, G.; Simone, A.; Girardi, F.; Lantieri, C. Laser Scanning on Road Pavements: A New Approach for Characterizing Surface
Texture. Sensors 2012, 12, 9110–9128. [CrossRef] [PubMed]
20. Li, Q.; Yao, M.; Yao, X.; Xu, B. A real-time 3D scanning system for pavement distortion inspection. Meas. Sci. Technol. 2010,
21, 015702. [CrossRef]
21. Park, S.E.; Eem, S.H.; Jeon, H. Concrete crack detection and quantification using deep learning and structured light. Constr. Build.
Mater. 2020, 252, 119096. [CrossRef]
22. Aloysius, N.; Geetha, M. A review on deep convolutional neural networks. In Proceedings of the 2017 International Conference
on Communication and Signal Processing (ICCSP), Chennai, India, 6–8 April 2017.
23. Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Tang, X. Residual Attention Network for Image Classification. In Proceedings of the 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017.
24. Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object Detection with Deep Learning: A Review. arXiv 2018. [CrossRef]
25. Wang, P.; Chen, P.; Yuan, Y.; Liu, D.; Huang, Z.; Hou, X.; Cottrell, G. Understanding Convolution for Semantic Segmentation.
In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA,
12–15 March 2018.
26. Xu, Y.; Li, D.; Xie, Q.; Wu, Q.; Wang, J. Automatic defect detection and segmentation of tunnel surface using modified Mask
R-CNN. Measurement 2021, 178, 109316. [CrossRef]
27. Wang, W.; Wu, B.; Yang, S.; Wang, Z. Road Damage Detection and Classification with Faster R-CNN. In Proceedings of the 2018
IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018.
28. Zhang, K.; Zhang, Y.; Cheng, H.D. CrackGAN: Pavement Crack Detection Using Partially Accurate Ground Truths Based on
Generative Adversarial Learning. IEEE Trans. Intell. Transp. Syst. 2020, 22, 1306–1319. [CrossRef]
29. Lin, Y.; Nie, Z.; Ma, H. Dynamics-based cross-domain structural damage detection through deep transfer learning. Comput.-Aided
Civ. Infrastruct. Eng. 2022, 37, 24–54. [CrossRef]
30. Wang, Z.; Zhang, Y.; Mosalam, K.M.; Gao, Y.; Huang, S. Deep semantic segmentation for visual understanding on construction
sites. Comput.-Aided Civ. Infrastruct. Eng. 2022, 37, 145–162. [CrossRef]
31. Yang, X.; Li, H.; Yu, Y.; Luo, X.; Huang, T.; Yang, X. Automatic Pixel-Level Crack Detection and Measurement Using Fully
Convolutional Network. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 1090–1109. [CrossRef]
32. Hegde, V.; Trivedi, D.; Alfarrarjeh, A.; Deepak, A.; Shahabi, C. Yet Another Deep Learning Approach for Road Damage Detection
using Ensemble Learning. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA,
10–13 December 2020.
33. Shim, S.; Kim, J.; Lee, S.-W.; Cho, G.-C. Road surface damage detection based on hierarchical architecture using lightweight
auto-encoder network. Autom. Constr. 2020, 130, 103833. [CrossRef]
34. Sheta, A.F.; Turabieh, H.; Aljahdali, S.; Alangari, A. Pavement Crack Detection Using Convolutional Neural Network.
In Proceedings of the Computers and Their Applications, San Francisco, CA, USA, 23-25 March 2020.
35. Guo, K.; He, C.; Yang, M.; Wang, S. A pavement distresses identification method optimized for YOLOv5s. Sci. Rep. 2022, 12, 3542.
[CrossRef]
36. Ma, D.; Fang, H.; Wang, N.; Zhang, C.; Dong, J.; Hu, H. Automatic Detection and Counting System for Pavement Cracks Based
on PCGAN and YOLO-MF. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22166–22178. [CrossRef]
37. Zhang, S.; Lippitt, C.D.; Bogus, S.M.; Neville, P.R.H. Characterizing Pavement Surface Distress Conditions with Hyper-Spatial
Resolution Natural Color Aerial Photography. Remote Sens. 2016, 8, 392. [CrossRef]
38. Zhang, S.; Bogus, S.M.; Lippitt, C.D.; Neville, P.R.H.; Zhang, G.; Chen, C.; Valentin, V. Extracting Pavement Surface Distress
Conditions Based on High Spatial Resolution Multispectral Digital Aerial Photography. Photogramm. Eng. Remote Sens. 2015,
81, 709–720. [CrossRef]
Sensors 2023, 23, 7395 18 of 18
39. Ersoz, A.B.; Pekcan, O.; Teke, T. Crack identification for rigid pavements using unmanned aerial vehicles. In Proceedings of the
International Conference on Building up Efficient and Sustainable Transport Infrastructure (BESTInfra), Prague, Czech Republic,
21–22 September 2017.
40. Alzarrad, A.; Awolusi, I.; Hatamleh, M.T.; Terreno, S. Automatic assessment of roofs conditions using artificial intelligence (AI)
and unmanned aerial vehicles (UAVs). Front. Built Environ. 2022, 8, 1026225. [CrossRef]
41. Long Ngo Hoang, T.; Mora, O.E.; Cheng, W.; Tang, H.; Singh, M. Deep Learning to Detect Road Distress from Unmanned Aerial
System Imagery. Transp. Res. Rec. 2021, 2675, 776–788.
42. Cheng, J.; Dong, L.; Lapata, M. Long Short-Term Memory-Networks for Machine Reading. arXiv 2016, arXiv:1601.06733.
43. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need.
arXiv 2017, arXiv:1706.03762.
44. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Houlsby, N. An Image is Worth 16x16 Words: Transformers for Image
Recognition at Scale. arXiv 2020, arXiv:2010.11929.
45. Arya, D.; Maeda, H.; Ghosh, S.K.; Toshniwal, D.; Mraz, A.; Kashiyama, T.; Sekimoto, Y. Transfer Learning-based Road Damage
Detection for Multiple Countries. arXiv 2020, arXiv:2008.13101.
46. Zhong, Q.; Li, C.; Zhang, Y.; Xie, D.; Yang, S.; Pu, S. Cascade Region Proposal and Global Context for Deep Object Detection.
Neurocomputing 2017, 395, 170–177. [CrossRef]
47. Tao, X.; Gong, Y.; Shi, W.; Cheng, D. Object Detection with Class Aware Region Proposal Network and Focused Attention
Objective. Pattern Recognit. Lett. 2018, 130, 353–361. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.