0% found this document useful (0 votes)
18 views13 pages

Yolov8 and Point Cloud Fusion For Enhanced Road Pothole Detection and Quantification

This paper presents a novel method for detecting and quantifying potholes using YOLOv8 and point cloud data, improving detection accuracy by 6.5% over YOLOv8 alone. The method integrates depth camera images with 3D point cloud data to enhance precision in pothole identification and measurement, achieving a precision of 95.8% and a recall of 93.3%. The proposed approach effectively reduces false positives and allows for real-time assessment of pothole severity, addressing significant challenges in current pothole detection techniques.

Uploaded by

Nuzhat Tabassum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views13 pages

Yolov8 and Point Cloud Fusion For Enhanced Road Pothole Detection and Quantification

This paper presents a novel method for detecting and quantifying potholes using YOLOv8 and point cloud data, improving detection accuracy by 6.5% over YOLOv8 alone. The method integrates depth camera images with 3D point cloud data to enhance precision in pothole identification and measurement, achieving a precision of 95.8% and a recall of 93.3%. The proposed approach effectively reduces false positives and allows for real-time assessment of pothole severity, addressing significant challenges in current pothole detection techniques.

Uploaded by

Nuzhat Tabassum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

www.nature.

com/scientificreports

OPEN YOLOv8 and point cloud fusion for


enhanced road pothole detection
and quantification
Junkui Zhong1,2, Deyi Kong1, Yuliang Wei1 & Bin Pan1,2
Automatic detection of potholes is essential for effective road maintenance and is fundamental to
enhancing environmental perception for intelligent transportation systems. Reducing false positives
is essential for optimizing detection accuracy in this research domain. This paper introduces a novel
method for detecting irregular potholes on road surfaces by integrating depth camera images with
point cloud data. The proposed approach utilizes YOLOv8 for initial 2D object detection, identifying
candidate regions and corresponding 3D point clouds. The boundary contours of potholes are
subsequently determined through surface smoothness analysis, followed by the extraction of all point
clouds within these boundaries. To further refine detection accuracy, elevation thresholds are applied
to evaluate pothole depth, effectively filtering out false positives such as road surface stains and
patches. The experiments were conducted over a 4.7-kilometer road section, demonstrating that on
well-maintained road surfaces, the proposed method improves detection accuracy by 6.5% compared
to the standalone use of YOLOv8, achieving a precision of 95.8%, a recall of 93.3%, and an F1 score
of 94.53%. The model processes a single image in 0.23 seconds. Furthermore, the error rates for
perimeter, surface area, and depth detection are limited to within 4%, 5%, and 4%, respectively.

Keywords YOLOv8, Pothole detection, Point cloud data, Damage quantification, Depth camera

The longevity of road surfaces and overall traffic safety are significantly affected by road faults, underscoring the
heightened importance of road condition evaluation in contemporary maintenance practices1,2. Among these
issues, surface potholes are regarded as one of the most hazardous, posing potential risks to vehicle safety and
compromising the structural integrity of road surfaces. As stated in paper3, the primary approach for pothole
detection remains manual visual inspection. However, this method suffers from limitations in accuracy due to
subjective judgment and lacks detection efficiency. Thus, there is a critical need for the development of intelligent
pothole detection techniques. For a more comprehensive evaluation of road hazards, detection must extend
beyond mere presence confirmation to include precise localization and geometric characterization of potholes.
Furthermore, expanding detection capabilities to allow real-time identification of potholes along the road ahead
is critical for improving efficiency. Although advancements in intelligent pothole detection have enhanced both
timeliness and accuracy, substantial challenges remain4, which numerous researchers have sought to address5–7.
Advancements in sensor technology and image processing techniques have led to a marked increase in academic
and industry research focused on road damage identification. Numerous detection methods with varying
levels of accuracy and applicability have been proposed8–10. The primary approaches include vibration-based
techniques11–13, image-based techniques14–17, and point cloud-based techniques18–23.
Vibration-based techniques assess road damage by measuring how vehicles dynamically respond to surface
flaws. For instance, the authors of11 used gyroscope and accelerometer data to develop a classifier that categorizes
roads as either flat or containing potholes. Another study12 proposed a pothole detection system with a hardware
cost of approximately $60, utilizing triaxial accelerometers and GPS modules to collect raw data. While these
methods are cost-effective and capable of real-time operation, they provide only qualitative results. Additionally,
vehicles must traverse potholes to detect the associated vibrations, and uneven surfaces may be misidentified as
potholes, which can diminish accuracy and performance over longer distances.
Image-based approaches for pothole identification typically involve capturing photos or videos with
cameras, followed by processing with image analysis and machine learning algorithms. These methods enable
rapid classification of potholes, significantly enhancing detection accuracy. Studies24–26 applied YOLO and its
advanced models to detect potholes in images, yielding promising results. However, image-based techniques

1Institute
of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei
230031, China. 2University of Science and Technology of China, Hefei 230026, China. email: [email protected];
[email protected]

Scientific Reports | (2025) 15:11260 | https://doi.org/10.1038/s41598-025-94993-0 1


www.nature.com/scientificreports/

often struggle in adverse weather or low-light conditions. For example, the dataset used in27 leveraged the
YOLO v3 computer vision model for automated pothole detection, incorporating images captured under various
lighting and weather conditions. The study indicated that while environmental changes can significantly affect
detection accuracy, incorporating images of potholes from harsh environments during the training phase allowed
the model to achieve robust detection performance even in complex conditions. Nevertheless, image-based
methods primarily focus on the presence of potholes rather than extracting detailed geometric information. In
practical applications, stains on asphalt surfaces and road patches are often erroneously identified as potholes,
thereby reducing detection accuracy28. Additionally, supervised learning techniques require large amounts of
well-labeled training data for effective stereo matching, presenting challenges for real-world implementation29.
Depth cameras and multi-line LiDAR (Light Detection and Ranging) are commonly used tools in point
cloud-based techniques, though LiDAR’s high cost remains a limitation. Studies30,31 have demonstrated the
effectiveness of depth cameras for road assessment, utilizing Kinect cameras to detect cracks and potholes. Point
cloud-based techniques allow precise capture of pothole geometries. For example, a clustering approach for
identifying cracks in laser-scanned point cloud data was introduced in32. Additionally, roughness descriptors
have been applied to segment and classify stone and asphalt surfaces33, and 3D fracture skeletons have been
extracted from mobile LiDAR point clouds using the Otsu thresholding technique34. Choi et al.35 applied a
multi-resolution segmentation technique to identify road cracks in point cloud images, while Chen and Li36 used
a high-pass convolution method to locate cracks in asphalt. Although various methods have been developed to
detect damage from laser-scanned data, the effectiveness of detection is highly dependent on the resolution of
the point cloud data. Detecting potholes accurately relies on high-resolution point clouds, but extracting precise
edge information remains challenging due to the discrete nature of sampled data. As noted in related studies37,
the computational efficiency of pothole detection from point clouds is generally lower than that of image-based
methods, primarily due to the large data volume, which gives image-based techniques a comparative advantage
in efficiency38.
To address these challenges, we propose a pothole detection method that integrates point cloud and image
data. Our main contributions are as follows:

• A significant reduction in false positive rates for pothole identification, improving detection accuracy.
• Enhanced precision in determining the 3D dimensions of detected potholes.
• Development of a classification framework for real-time assessment of pothole severity, categorizing potholes
based on the extent of road damage.

Methodology
Research framework
This paper presents a novel pothole detection and measurement method that integrates YOLOv8 with 3D point
cloud data for rapid and precise road surface assessment. The proposed approach enables calculation of key
geometric metrics, including pothole perimeter, area, and depth, as outlined in Fig. 1. The method comprises five
primary stages: (1) Image and depth data acquisition: Using an RGB-D depth camera, road surface imagery and
depth data are collected to capture detailed spatial information; (2) Detection and localization: A deep learning-
based YOLOv8 model detects potholes and generates bounding boxes, providing initial regions of interest; (3)
3D point cloud extraction: Corresponding 3D point cloud data is extracted for each detected bounding box to
extract relevant pothole data; (4) Edge point filtering and component analysis: Exploiting the inevitable surface
irregularity caused by potholes, the point cloud data from Step 3 is filtered to identify sharp edge points along the
pothole perimeter; (5) Geometric computation of pothole dimensions: The boundary point cloud data extracted
in Step 4 is used to perform precise 3D geometric calculations, yielding metrics for perimeter, surface area, and
depth.

Fig. 1. Workflow of the proposed pothole detection and measurement method combining YOLOv8 and 3D
point cloud data.

Scientific Reports | (2025) 15:11260 | https://doi.org/10.1038/s41598-025-94993-0 2


www.nature.com/scientificreports/

Region of interest extraction


Automated pothole detection with YOLOv8
The primary task of pothole detection is to determine the presence and location of potholes in road surface
images, thereby providing a target region for subsequent 3D geometric measurements. In this study, YOLOv8 is
selected as the object detection model due to its effective balance between detection accuracy and speed, which
makes it suitable for applications requiring real-time performance. Compared to Faster R-CNN and earlier
versions of YOLO, YOLOv8 demonstrates enhanced performance in terms of both accuracy and processing
speed. Additionally, it incorporates an advanced Feature Pyramid Network (FPN) and cross-scale feature fusion
techniques, which enable it to better detect objects of varying scales, such as potholes in road surfaces.

RGB-Depth data integration


Following the detection of candidate pothole regions, the system extracts the specific area within each bounding
box from the RGB image and aligns it with depth data for further processing. The key to this step lies in accurately
matching the pixel coordinates of the bounding box in the 2D image with the corresponding 3D depth data,
allowing for precise acquisition of 3D information associated with each candidate pothole. This approach not
only improves processing accuracy but also significantly reduces the computational load for subsequent steps.
Assuming the detected bounding box has top-left pixel coordinates (xl , yl ) and bottom-right coordinates
(xr , yr ) in the RGB image, each pixel (x, y) within this region has a depth value d in the depth image, representing
the distance between that pixel and the camera. The transformation from the pixel coordinates of the bounding
box’s top-left and bottom-right corners to 3D coordinates can be described by the following equations:
[ ]  1
[ ]
x fx
0 0 x′
y = D 0 1
fy
0 y ′ (1)
z 0 0 1 1

Where, ′x, y,′ z denote the spatial coordinates within the point cloud, with D as the corresponding depth value,
while x , y refer to the pixel coordinates within the RGB image plane. Following the transformation of these
coordinates, points constrained by the bounding box-defined by the top-left and bottom-right vertices-are
designated as the point cloud region of interest (ROI). The subset of points meeting these specified boundary
conditions is represented as the set P, formulated as follows:
P = {(X, Y, Z) | xl < X < xr , dl < Y < dr , yl < Z < yr }(2)

3D contour extraction
To enhance the accuracy of 3D contour and geometric feature detection for potholes, this study converts the
2D candidate regions identified in the image into corresponding 3D target regions. Within these regions, a
feature analysis of the pothole point cloud is performed. By leveraging the characteristic that potholes disrupt
the smoothness of surface geometry, the boundary points defining the pothole contour are extracted from the
point cloud.

Delaunay mesh optimization


The target point set P = {p1 , p2 , . . . , pn } is initially processed using Delaunay triangulation to construct
an optimized triangular mesh. Delaunay triangulation, a widely employed method for creating geometrically
optimized triangulations from point sets, is particularly suitable for 3D point cloud data. To maintain geometric
integrity and avoid distortions from excessive edge lengths, the resulting triangles are filtered based on edge
length constraints. For each triangle △ABC, the lengths of its edges, denoted as dAB , dBC , dCA , are computed,
where dij represents the distance between vertices i and j, with pi and pj as their coordinates. Triangles whose
edge lengths all fall below a defined threshold dmax are preserved within the mesh set τ . To optimize the quality
of the mesh, we set the threshold dmax to be twice the average point spacing davg of the point cloud data.
Specifically, davg is calculated as the average Euclidean distance between all pairs of points in the point cloud
data. The purpose of setting the threshold in this way is to balance the fineness of the mesh with computational
complexity and avoid the negative impact of excessively long edges on the mesh quality. The calculation formula
for the average point spacing is as follows:

1 ∑
n

n

davg = ∥pi − pj ∥(3)


n(n − 1)
i=1 j=i+1

where pi = (xi , yi , zi ) and pj = (xj , yj , zj ) are two points in the point cloud, and ∥pi − pj ∥ represents the
Euclidean distance between points pi and pj .
Subsequently, for each filtered triangle, the normal vector ni is calculated. Here, e1 and e2 denote the
edge vectors of two adjacent vertices in the triangle; the normal vector for the triangle is then determined by
computing the cross product of these two edge vectors, as described by the following equation:
e1 × e2
ni = (4)
∥e1 × e2 ∥

Scientific Reports | (2025) 15:11260 | https://doi.org/10.1038/s41598-025-94993-0 3


www.nature.com/scientificreports/

Next, the surrounding associated normal vectors are computed for each triangle. For any given triangle ni , all
adjacent triangles are identified as Tni = △1 , △2 , . . . , △m , associating the normal vectors n1 , n2 , . . . , nm of
these neighboring triangles with the central triangle. To detect potential boundary points, the angle θij between
neighboring normal vectors is calculated, with the expression given as follows:
( )
ni · nj
θij = cos−1 (5)
∥ni ∥∥nj ∥

If the angle θij between the normal vectors of any adjacent triangles exceeds a specified threshold θth , this
indicates a significant change in smoothness between ni and its surroundings, identifying ni as a potential
boundary point. All selected boundary points are then compiled into the set PB .
PB = {pi | θij > θth }(6)

At this stage, however, the set PB may contain some noise points, such as other surface irregularities that do not
correspond to the true pothole boundary and thus require further refinement.

Boundary extraction
To filter out noise points and accurately extract pothole boundary points, we employed the DBSCAN (Density-
Based Spatial Clustering of Applications with Noise) clustering algorithm. DBSCAN is an unsupervised density-
based clustering method that automatically forms clusters based on the spatial distribution of data points,
without the need to predefine the number of clusters. The algorithm determines whether a point is a core point,
a border point, or a noise point by setting two key parameters: the neighborhood radius (ϵ) and the minimum
number of neighbors (minPts). This effectively removes isolated points and low-density noise, improving the
accuracy of pothole boundary extraction.
In this study, DBSCAN is used to identify pothole boundary points and filter out noise points from non-
target regions. The algorithm first searches for high-density areas in the point cloud data and classifies them
into the same cluster. Specifically, if a point has at least minPts neighbors within a radius ϵ, it is considered a
core point and forms a new cluster. If a point is not a core point but lies within the ϵ-neighborhood of a core
point, it is labeled as a boundary point and assigned to the cluster of the core point. If a point is neither a core
point nor within the neighborhood of any core point, it is considered a noise point and is removed. DBSCAN’s
performance heavily depends on two key parameters: the neighborhood radius (ϵ) and the minimum number of
neighbors (minPts). Therefore, it is essential to carefully select these parameters to ensure effective extraction of
pothole boundaries and the filtering of noise points.
The neighborhood radius (ϵ) determines the maximum distance at which two points in the point cloud are
considered neighbors. If ϵ is set too small, true pothole boundary points may be misclassified as noise points.
If ϵ is set too large, different pothole regions may be incorrectly merged into a single cluster, compromising
boundary extraction accuracy. In this study, ϵ is set based on the average point distance of the point cloud data,
ensuring that the formation of clusters matches the actual shape of the potholes. Specifically, we calculate the
average distance to the K-nearest neighbors for all points in the dataset and set ϵ to twice the average point
distance. This ensures that boundary points are properly grouped while effectively filtering out outlier noise
points.
The minimum number of neighbors (minPts) determines the minimum number of neighbors required for
a point to be classified as a core point and plays a significant role in clustering connectivity and noise point
identification. If minPts is set too low, isolated points may be mistakenly classified as boundary points, reducing
the robustness of the clustering. If minPts is set too high, sparse pothole boundary points may fail to form valid
clusters, affecting the completeness of boundary extraction. In this study, based on the density of the point cloud
data and the distribution characteristics of boundary points, minPts=10 is selected to ensure the continuity of
pothole boundary points, while effectively removing isolated noise points and improving detection accuracy.
By employing this parameter selection method, DBSCAN can adaptively extract pothole boundaries in
this study and efficiently filter out environmental noise points, thus providing high-quality data support for
subsequent pothole depth analysis and geometric feature extraction. Experimental results demonstrate that
this approach maintains efficient and stable performance in complex scenarios, enhancing the robustness and
accuracy of pothole detection.
We first perform dimensionality reduction on the filtered 3D point cloud data by projecting it onto the
xoy plane, thereby obtaining a planar point cloud dataset without height information. Next, the Alpha-Shape
algorithm is applied for boundary extraction to identify the contours of the pit areas. The Alpha parameter
(α) plays a crucial role in the Alpha-Shape algorithm, as it determines the tightness and smoothness of the
boundary. An appropriate value of α ensures the accuracy of the boundary extraction, while too large or too
small an α can result in incomplete or overfitted boundaries. Typically, the choice of α depends on the density
and spatial distribution characteristics of the point cloud data, ensuring the rationality of the boundary.
Let the point set of the region of interest be pi = {(xi , yi ), . . . , (xn , yn )}, where i represents the index of
the points, and there are n points, which can form n(n − 1) line segments. For any two points p1 (x1 , y1 ) and
p2 (x2 , y2 ) in the point set p, we draw a circle with radius α and calculate the centers of the two circles passing
through these two points, denoted as pc (xc , yc ), expressed as follows:
{
xc = x1 + 12 (x2 − x1 ) + H(y2 − y1 )
(7)
yc = y1 + 12 (y2 − y1 ) + H(x2 − x1 )

Scientific Reports | (2025) 15:11260 | https://doi.org/10.1038/s41598-025-94993-0 4


www.nature.com/scientificreports/

where,

α2 1
H= − (8)
(x1 − x2 ) − (y1 − y2 )2
2
4

After obtaining the two circle centers, the relationship between the distance of other points to the circle centers
and the radius α is evaluated to determine if there are any other points inside the circle. If one of the circles
does not contain any other points, the points p1 and p2 are identified as boundary points of the pit, and the line
segment connecting P1 and p2 is considered a boundary segment. By iterating over the point set p, the set of
boundary points is denoted as pL .
After extracting the boundary points, the elevation value zi of each point pL (xi , yi ) in pL is remapped,
resulting in a new 3D point cloud dataset pLT (xLT , yLT , zLT ). This method not only preserves the features
of the pit but also effectively extracts the upper and lower boundary point cloud information, providing the
necessary data support for subsequent point cloud completion tasks. Figure 2 illustrates the boundary extraction
process.

Geometric feature computation


After defining the final contour of the pothole, the computation of its geometric features becomes essential. The
surface area is determined using the Shoelace Theorem, a reliable method for calculating the area of polygons
with known vertex coordinates. For a polygon defined by the vertices A1 (x1 , y1 ), A2 (x2 , y2 ), . . . , AN (xN , yN )
, the area can be expressed as:
 
N 
1  
S=  (xi yi+1 − xi+1 yi )(9)
2  
i

Where, xN +1 = x1 , yN +1 = y1 .
To calculate the depth, an iteration is performed over all data points within the pothole point set. The
maximum absolute elevation among these points represents the pothole’s maximum depth, denoted as Hmax
, expressed as:
Hmax = |maxi (zi ) − z̄plane | zi ∈ PT L (zi )(10)

Here, PLT represents the pothole point set, Zi is the elevation value of each point within the pothole, and z̄plane
is the average height of points on the segmentation plane. The perimeter of the pothole can be calculated by
summing the distances between consecutive boundary points. The expression is as follows:
n √

C= (xi+1 − xi )2 + (yi+1 − yi ) + (zi+1 − zi )2 (11)
i=1

Fig. 2. Feature point extraction: (a) Standard pothole. (b) Triangulation processing. (c) Feature point
extraction based on normal vectors. (d) Boundary point extraction based on Alpha-Shape.

Scientific Reports | (2025) 15:11260 | https://doi.org/10.1038/s41598-025-94993-0 5


www.nature.com/scientificreports/

This formula represents the sum of Euclidean distances between consecutive points, providing the perimeter
length of the pothole boundary in three-dimensional space.

Experiments
To validate the effectiveness of the proposed YOLOv8 and point cloud fusion-based road pothole detection
method, two experimental scenarios were designed. The first involved real-world road pothole testing, while the
second utilized foam-based simulated potholes. These experiments aimed to evaluate the algorithm’s accuracy,
robustness, and stability. Pothole characteristics such as perimeter, area, and depth were detected and quantified
across multiple dimensions. In addition, the algorithm was compared with existing detection methods to
provide a comprehensive performance analysis. The primary goal of the experiments was to highlight the
advantages of integrating point cloud and image data, while assessing the algorithm’s effectiveness under varying
environmental conditions.

Data acquisition setup


This study employed a RealSense D435i depth camera to capture comprehensive road information. Mounted on
a detection vehicle at a 45-degree downward angle and positioned 1.2 meters above the ground, as depicted in
Fig. 3, the D435i camera integrates a 20-megapixel RGB sensor and a 3D depth sensor. It operates at a frame rate
of 30 frames per second, enabling simultaneous acquisition of high-resolution images and 3D point cloud data.
Featuring a global shutter, the camera effectively mitigates motion artifacts, making it suitable for capturing data
from fast-moving vehicles. The image and point cloud data were collected under favorable daylight conditions
with the vehicle traveling at a speed of 30 km/h.
The experiments were conducted on an Ubuntu 18.04 operating system with hardware configurations
including an AMD Ryzen 7 5800H CPU and an Nvidia RTX 3070 GPU. The deep learning framework utilized in
this study was PyTorch 1.8.1, paired with CUDA version 11.1. YOLOv8n was selected as the base model due to
its computational efficiency and suitability for practical applications. The training process was configured with a
batch size of 16 and a total of 100 epochs. To improve the model’s generalization capability, the dataset included
5,000 high-resolution (640 ∗ 640) images of road potholes. Data augmentation techniques, such as horizontal
flipping and exposure adjustment, were applied to enhance dataset diversity. The dataset was partitioned into
training, testing, and validation subsets with a ratio of 7:2:1.

Pothole measurement and accuracy evaluation


To evaluate the accuracy of the proposed method, experiments were conducted in both controlled indoor and
real-world outdoor environments. In the indoor setting, an artificial pothole fabricated from polystyrene material
was utilized for accuracy testing, as depicted in Fig. 2a. This artificial pothole featured a standard circular shape
with a diameter of 17 cm and a depth of 4.78 cm. The calculated perimeter and surface area served as benchmark
data for comparison and validation of the quantification process.
Outdoor experiments, on the other hand, involved potholes with irregular shapes to further test the method’s
robustness. Following the guidelines of the FHWA-RD-03-031 standard, potholes representing low, medium,
and high damage types were documented. The geometric dimensions of these potholes were manually measured
on-site using the procedures illustrated in Fig. 4. Maximum depth was determined using an electronic caliper
with a precision of 0.01 mm, positioned perpendicularly to the pothole surface. Multiple measurements along
the diameter were performed to identify the deepest point. The upper surface area of each pothole was traced
onto grid paper with a precision of 1 mm2 , and its area and perimeter were subsequently calculated. To minimize

Fig. 3. Data collection setup.

Scientific Reports | (2025) 15:11260 | https://doi.org/10.1038/s41598-025-94993-0 6


www.nature.com/scientificreports/

Fig. 4. Illustration of pothole quantification: (a) Depth measurement; (b) Area and perimeter measurement.

Fig. 5. Test road. The satellite imagery was obtained from Baidu Maps (https://map.baidu.com). The left side is
an asphalt road with lower roughness, while the right side is a gravel road with higher roughness.

errors, all manual measurements were repeated three times, and the average values were used as the ground
truth for validating the method’s multidimensional metrics.

Results and analysis


This section presents a comprehensive evaluation of the proposed pothole detection and quantification method,
focusing on its detection accuracy, reliability in quantification, and practical applicability across diverse scenarios.
The performance of the algorithm was assessed by comparing three approaches: image-based methods, point
cloud-based detection methods, and the fusion-based method proposed in this study. The evaluation was
conducted on a test road segment spanning 4.7 kilometers, encompassing both asphalt and cement-based dirt
roads, as illustrated in Fig. 5. A total of 2,750 images were collected, with 1,342 images selected for testing.

Evaluation of pothole detection results


The detection results are recorded in Table 1, and the Precision and Recall are calculated based on the formula
Eq. (12):

Scientific Reports | (2025) 15:11260 | https://doi.org/10.1038/s41598-025-94993-0 7


www.nature.com/scientificreports/

Methods TP FP FN Precision Recall F1


YOLOv8 1199 143 87 89.3% 93.3% 91.2%
Our 1199 52 87 95.8% 93.3% 94.5%

Table 1. Recorded results of pothole detection on the test road segment.

Fig. 6. Types of false positives: (a) Caused by manhole covers and road patches; (b) Caused by stains; (c)
Caused by road patches; (d) Caused by weeds; (e) Excessive roughness; (f) Caused by distant vehicles.

 TP

Precision =

 TP + FP

TP
Recall = (12)

 TP + FN


F 1 = 2 × Precision × Recall
Precision + Recall

In this evaluation, TP (true positives) represents the number of correctly detected potholes, FP (false positives)
indicates the number of incorrectly detected potholes, and FN (false negatives) denotes the number of missed
potholes. A detection is classified as a true positive (TP) if the intersection over union (IoU) between the detected
bounding box and the annotated pothole box exceeds 0.5. Conversely, if the detected bounding box does not
overlap with the annotated pothole or the IoU is less than 0.5, it is classified as a false positive (FP).
From Table 1, it is evident that the precision achieved by the proposed method surpasses that of YOLOv8
alone. This enhancement is primarily due to the incorporation of 3D point cloud data, which effectively reduces
false positive (FP) detections. The experiments identified two primary types of errors in pothole detection, with
representative misdetection examples presented in Fig. 6. The first error type involves misclassifying non-ground
objects, such as vehicles, as potholes. The second type pertains to the incorrect identification of ground features,
including road patches, manhole covers, or stains, as potholes.
To address misdetections of non-ground objects, we leverage the 3D point cloud data of the bounding box and
apply an elevation threshold, recognizing that potholes are confined to the road surface. Detections exceeding
this elevation threshold are classified as non-road potholes. For misdetections on the ground, a maximum depth
threshold is utilized. After extracting the 3D point cloud data of a potential pothole, its maximum depth is
compared against predefined damage levels. Detections with a maximum depth below the threshold for small
potholes are flagged as false detections, effectively eliminating road patches, stains, shadows, and other similar
anomalies. Table 2 provides a more detailed classification of correctly detected potholes, building upon the data
presented in Table 1.

Scientific Reports | (2025) 15:11260 | https://doi.org/10.1038/s41598-025-94993-0 8


www.nature.com/scientificreports/

Damage severity Low Moderate High Total


BBox Count 435 683 81 1199

Table 2. Classification results of pothole damage severity.

Methods Precision Recall F1


Point cloud6 86.0% 91.2% 88.5%
Image processing28 92.4% 93.8% 93.0%
Multi-sensor fusion39 87.8% 85.4% 86.4%
Our 95.8% 93.3% 94.5%

Table 3. Comparison of results with other detection methods.

Perimeter Area Depth


level Pothole Num. M(cm) C(cm) E(%) M(cm) C(cm) E(%) M(cm) C(cm) E(%)
P1 15.58 16.04 2.95 16.94 17.58 3.78 2.33 2.36 1.29
L P2 16.81 16.26 -3.27 18.23 18.46 1.26 2.07 2.05 -0.97
P3 30.09 31.22 3.76 62.86 66.53 5.84 1.39 1.37 -1.44
P4 142.45 145.62 2.23 1089.19 1124.08 3.20 4.66 4.57 -1.93
M P5 205.59 216.49 5.30 3149.42 3288.62 4.42 4.89 4.82 -1.43
P6 53.41 54.23 1.54 226.95 228.94 0.88 4.78 4.63 -3.14
P7 171.88 190.44 10.80 2199.87 2488.48 13.12 5.94 5.79 -2.53
H P8 184.93 198.89 7.55 2577.39 2831.78 9.87 5.10 4.91 -3.73
P9 214.37 232.55 8.48 3586.14 4030.43 12.39 6.21 5.99 -3.54

Table 4. Detection results of pothole geometric dimensions.

Table 2 offers an intuitive visualization of the quantity, size, and classification of detected potholes along
the test sections, providing actionable insights for road inspection and maintenance. Specifically, potholes are
classified into three categories based on depth: low (less than 25 mm), moderate (25-50 mm), and high (greater
than 50 mm). To further validate the effectiveness and applicability of the proposed method, we compared its
performance with several widely adopted pothole detection algorithms, including point cloud-based methods6,
traditional image processing techniques28, and multi-sensor fusion algorithms39. The comparison results are
detailed in Table 3.
A comparison of the detection results in Table 3 reveals that the proposed method outperforms existing
approaches in both precision and recall. Notably, the method demonstrates a significant advantage in precision,
attributed to the integration of YOLOv8’s high accuracy with a classification step that categorizes detected
potholes into small, medium, and large sizes using 3D point cloud features. This additional classification effectively
eliminates false positives, such as road patches and stains, by excluding detections where depth differences fall
below the specified thresholds. The proposed method achieves an overall detection accuracy of 95.8% and a
recall rate of 93.3%, with a single-frame detection speed of 0.24 seconds, meeting the requirements for real-time
applications. Moreover, it surpasses other recent advanced methods in terms of detection effectiveness.

Geometric accuracy evaluation and damage quantification


To further evaluate the geometric characteristics of potholes with varying damage levels, we conducted
quantification analyses by measuring their perimeter, surface area, and depth. These metrics were used to
quantify the extent of damage. As summarized in Table 4, the results represent the averages of 30 repeated
measurements for each pothole. Damage severity was classified into three levels: low, medium, and high, with
three representative potholes analyzed for each category.
The results presented in Table 4 demonstrate the algorithm’s ability to accurately capture the geometric
dimensions of potholes and classify them based on damage severity. The classification is independent of the
pothole’s shape but is influenced by road roughness and the clarity of the pothole boundary. Potholes 1, 2,
4, and 5 exhibit relatively low road roughness and well-defined boundary profiles. In contrast, Pothole 6, a
manufactured foam standard circular pothole (Fig. 4) with distinct boundaries, achieved the most accurate
detection results. For Pothole 6, the relative error in perimeter was 1.54%, corresponding to an absolute error
of 0.82 cm. Similarly, the relative error in surface area was 0.88%, with an absolute error of 1.99cm2 , while the
relative error in depth was −3.14%, resulting in an absolute error of 0.15 cm. For other potholes, the relative
error in perimeter remains below 4%, the surface area error is under 5%, and the depth error stays within 4%.
Pothole 3, located in an area with low road roughness and ambiguous boundary contours, demonstrates a
relatively higher surface area error of 5.84%. Potholes 7 through 9, situated in regions with greater road roughness

Scientific Reports | (2025) 15:11260 | https://doi.org/10.1038/s41598-025-94993-0 9


www.nature.com/scientificreports/

Fig. 7. Detection results.

Precision
Methods Depth Area
Deep learning (GA-DenseNet) and binocular stereo vision40 93% 88%
Laser point cloud6 ≥ 94% ≥ 93%
RGB-D camera and instance segmentation algorithm41 - 87% - 91%
Our ≥ 96% 87% - 95%

Table 5. Comparative analysis of geometric dimension accuracy with other detection methods.

and less distinct boundary profiles, show less accurate detection results, with surface area errors approximately
13% and perimeter errors around 11%. However, regardless of the road roughness or the clarity of the boundary
contours, depth values are consistently accurately detected, with relative errors not exceeding 4%.
Some of the detection results for the potholes presented in Table 4 are illustrated in Fig. 7. The first row shows
the RGB image of the pothole, the second row displays the triangulated point cloud, the third row presents the
extracted feature point cloud, and the fourth row shows the corresponding point cloud of the pothole.
To further assess the accuracy of the proposed method, a comparative analysis was conducted against recent
related studies. The results of this comparison are summarized in Table 5.
In Li et al.40, the GA-DenseNet classification model is applied to categorize road surface damage types, where
point cloud data is processed and converted into binary images to extract the depth and area features of potholes.
However, this approach, which relies on a transformation ratio to estimate the damage area, produces larger
errors compared to our boundary extraction method based on smoothness. Similarly, Rufei et al.6 employ the
integral invariants principle based on laser point clouds for pothole detection. This direct method of extracting
pothole dimensions from point cloud contours demonstrates higher accuracy in measuring pothole sizes. In

Scientific Reports | (2025) 15:11260 | https://doi.org/10.1038/s41598-025-94993-0 10


www.nature.com/scientificreports/

contrast, Lin et al.41 focus on the surface area and volume of potholes, rather than their maximum depth. Their
findings indicate that as the severity of damage increases, the errors in calculated area and volume decrease, with
error rates ranging from 9% to 13.2%. The accuracy of these measurements is highly dependent on the size of
the pothole area.
The method proposed in this paper, however, performs consistently across potholes of varying sizes.
Detection accuracy is influenced primarily by road roughness and the clarity of pothole boundaries. The highest
detection precision is achieved with foam potholes due to their distinct contours and pronounced curvature
changes, which enhance boundary extraction accuracy. In contrast, natural potholes on cement roads often have
smoother boundary planes and less variation in internal structure. Additionally, prolonged vehicle pressure
causes the boundary localization of these potholes to become less distinct, leading to higher errors in detection.

Conclusion
This paper presents a method for road pothole detection by integrating image and point cloud data. The proposed
approach begins with YOLOv8, which detects potholes in captured images and marks the 2D bounding boxes.
The top-left and bottom-right corner coordinates of these bounding boxes are matched with corresponding depth
data to determine the 3D coordinates, designating the target region within the point cloud data. Subsequently,
the pothole boundary contour is identified by analyzing smoothness variations, and the point clouds within
the contour are extracted to calculate geometric features, including average depth, surface area, and perimeter.
The proposed method was validated under real-world road conditions. YOLOv8 effectively identified
candidate potholes, which were then classified by damage severity. Furthermore, surface stains and patches, often
misclassified as potholes, were successfully filtered out-a limitation of YOLO-based methods alone. While the
recall rate remained consistent at 93.3% compared to YOLOv8’s results, the proposed method improved precision
from 89.3% to 95.8%. To evaluate the geometric accuracy of the detected potholes, multiple experiments were
conducted. The results showed that under road conditions with lower roughness, the detection accuracies for
perimeter, surface area, and depth were 96%, 95%, and 96%, respectively. Furthermore, the model processed
one image in 0.23 seconds, demonstrating its suitability for practical applications. However, the experiments
revealed that the proposed method relies on a complete pothole point cloud for accurate detection. In cases
where occlusion leads to an incomplete point cloud representation of the pothole, the accuracy of size estimation
is compromised.
Future research should focus on enhancing the recall rate of pothole detection, expanding the dataset to
include diverse environmental conditions, and improving the model’s adaptability to complex and harsh
environments. Additionally, developing a pothole detection approach that can handle incomplete point cloud
data is a key direction for future studies. Further efforts are also needed to advance the sampling frequency and
quality of data acquisition equipment, broadening the applicability of this method in engineering practice.

Data availibility
The datasets generated during and analyzed during the current study are available from the corresponding au-
thor on reasonable request.

Received: 21 November 2024; Accepted: 18 March 2025

References
1. Arya, D. et al. Deep learning-based road damage detection and classification for multiple countries. Automation in Construction
132, 103935 (2021).
2. Zhao, L., Wu, Y., Luo, X. & Yuan, Y. Automatic defect detection of pavement diseases. Remote Sensing 14, 4836 (2022).
3. Fan, R. & Liu, M. Road damage detection based on unsupervised disparity map segmentation. IEEE Transactions on Intelligent
Transportation Systems 21, 4906–4911 (2019).
4. Dhiman, A. & Klette, R. Pothole detection using computer vision and learning. IEEE Transactions on Intelligent Transportation
Systems 21, 3536–3550 (2019).
5. Fan, R. et al. Long-awaited next-generation road damage detection and localization system is finally here. In 2021 29th European
Signal Processing Conference (EUSIPCO), 641–645 (IEEE, 2021).
6. Rufei, L., Jiben, Y., Hongwei, R., Bori, C. & Chenhao, C. Research on a pavement pothole extraction method based on vehicle-
borne continuous laser scanning point cloud. Measurement Science and Technology 33, 115204 (2022).
7. Ravi, R., Bullock, D. & Habib, A. Pavement distress and debris detection using a mobile mapping system with 2d profiler lidar.
Transportation research record 2675, 428–438 (2021).
8. Ma, N. et al. Computer vision for road imaging and pothole detection: a state-of-the-art review of systems and algorithms.
Transportation safety and Environment 4, tdac026 (2022).
9. Kim, Y.-M. et al. Review of recent automated pothole-detection methods. Applied Sciences 12, 5320 (2022).
10. Tang, K. et al. Decision fusion networks for image classification. IEEE Transactions on Neural Networks and Learning Systems
(2022).
11. Allouch, A., Koubâa, A., Abbes, T. & Ammar, A. Roadsense: Smartphone application to estimate road conditions using
accelerometer and gyroscope. IEEE Sensors Journal 17, 4231–4238 (2017).
12. Ren, J. & Liu, D. Pads: A reliable pothole detection system using machine learning. In International Conference on Smart Computing
and Communication, 327–338 (Springer, 2016).
13. Ghadge, M., Pandey, D. & Kalbande, D. Machine learning approach for predicting bumps on road. In 2015 International Conference
on Applied and Theoretical Computing and Communication Technology (iCATccT), 481–485 (IEEE, 2015).
14. Kortmann, F. et al. Detecting various road damage types in global countries utilizing faster r-cnn. In 2020 IEEE International
Conference on Big Data (Big Data), 5563–5571 (IEEE, 2020).
15. Javed, A. et al. Pothole detection system using region-based convolutional neural network. In 2021 IEEE 4th International
Conference on Computer and Communication Engineering Technology (CCET), 6–11 (IEEE, 2021).

Scientific Reports | (2025) 15:11260 | https://doi.org/10.1038/s41598-025-94993-0 11


www.nature.com/scientificreports/

16. Cano-Ortiz, S., Iglesias, L. L., del Árbol, P. M. R., Lastra-González, P. & Castro-Fresno, D. An end-to-end computer vision system
based on deep learning for pavement distress detection and quantification. Construction and Building Materials 416, 135036
(2024).
17. Cano-Ortiz, S., Iglesias, L. L., del Árbol, P. M. R. & Castro-Fresno, D. Improving detection of asphalt distresses with deep learning-
based diffusion model for intelligent road maintenance. Developments in the Built Environment 17, 100315 (2024).
18. Haq, M. U. U., Ashfaque, M., Mathavan, S., Kamal, K. & Ahmed, A. Stereo-based 3d reconstruction of potholes by a hybrid, dense
matching scheme. IEEE Sensors Journal 19, 3807–3817 (2019).
19. Ahmed, A. et al. Pothole 3d reconstruction with a novel imaging system and structure from motion techniques. IEEE Transactions
on Intelligent Transportation Systems 23, 4685–4694 (2021).
20. Guan, J. et al. Automated pixel-level pavement distress detection based on stereo vision and deep learning. Automation in
Construction 129, 103788 (2021).
21. Wu, R. et al. Scale-adaptive pothole detection and tracking from 3-d road point clouds. In 2021 IEEE International Conference on
Imaging Systems and Techniques (IST), 1–5 (IEEE, 2021).
22. Tang, K. et al. Deep manifold attack on point clouds via parameter plane stretching. In Proceedings of the AAAI Conference on
Artificial Intelligence 37, 2420–2428 (2023).
23. Tang, K. et al. Manifold constraints for imperceptible adversarial attacks on point clouds. In Proceedings of the AAAI Conference on
Artificial Intelligence 38, 5127–5135 (2024).
24. Dharneeshkar, J., Aniruthan, S., Karthika, R., Parameswaran, L. et al. Deep learning based detection of potholes in indian roads
using yolo. In 2020 international conference on inventive computation technologies (ICICT), 381–385 (IEEE, 2020).
25. Suong, L. K. & Jangwoo, K. Detection of potholes using a deep convolutional neural network. Journal of universal computer science
24, 1244–1257 (2018).
26. Omar, M. & Kumar, P. Detection of roads potholes using yolov4. In 2020 international conference on information science and
communications technologies (ICISCT), 1–6 (IEEE, 2020).
27. Bučko, B., Lieskovská, E., Zábovská, K. & Zábovskỳ, M. Computer vision based pothole detection under challenging conditions.
Sensors 22, 8878 (2022).
28. Anand, S., Gupta, S., Darbari, V. & Kohli, S. Crack-pot: Autonomous road crack and pothole detection. In 2018 digital image
computing: techniques and applications (DICTA), 1–6 (IEEE, 2018).
29. Wang, H., Fan, R., Cai, P. & Liu, M. Pvstereo: Pyramid voting module for end-to-end self-supervised stereo matching. IEEE
Robotics and Automation Letters 6, 4353–4360 (2021).
30. Zhang, Y. et al. A kinect-based approach for 3d pavement surface reconstruction and cracking recognition. IEEE Transactions on
Intelligent Transportation Systems 19, 3935–3946 (2018).
31. Kamal, K. et al. Performance assessment of kinect as a sensor for pothole imaging and metrology. International Journal of Pavement
Engineering 19, 565–576 (2018).
32. Chang, K., Chang, J. & Liu, J. Detection of pavement distresses using 3d laser scanning technology. In Computing in civil engineering
2005, 1–11 (2005).
33. Díaz-Vilariño, L., González-Jorge, H., Bueno, M., Arias, P. & Puente, I. Automatic classification of urban pavements using mobile
lidar data and roughness descriptors. Construction and Building Materials 102, 208–215 (2016).
34. Yu, Y., Li, J., Guan, H. & Wang, C. 3d crack skeleton extraction from mobile lidar point clouds. In 2014 IEEE Geoscience and Remote
Sensing Symposium, 914–917 (IEEE, 2014).
35. Choi, J., Zhu, L. & Kurosu, H. Detection of cracks in paved road surface using laser scan image data. The International Archives of
the Photogrammetry, Remote Sensing and Spatial Information Sciences 41, 559–562 (2016).
36. Chen, X. & Li, J. A feasibility study on use of generic mobile laser scanning system for detecting asphalt pavement cracks. The
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 41, 545–549 (2016).
37. Wu, H. et al. Road pothole extraction and safety evaluation by integration of point cloud and images derived from mobile mapping
sensors. Advanced Engineering Informatics 42, 100936 (2019).
38. Yoon, S. & Cho, J. Convergence of stereo vision-based multimodal yolos for faster detection of potholes. Computers, Materials &
Continua 73 (2022).
39. Chen, L. et al. Gocomfort: Comfortable navigation for autonomous vehicles leveraging high-precision road damage crowdsensing.
IEEE Transactions on Mobile Computing 22, 6477–6494 (2022).
40. Li, J., Liu, T. & Wang, X. Advanced pavement distress recognition and 3d reconstruction by using ga-densenet and binocular stereo
vision. Measurement 201, 111760 (2022).
41. Lin, W., Li, X., Han, H., Yu, Q. & Cho, Y.-H. A novel approach for pavement distress detection and quantification using rgb-d
camera and deep learning algorithm. Construction and Building Materials 407, 133593 (2023).

Author contributions
Junkui Zhong: Conceptualization, Methodology, Data Collection, Analysis, Writing-Original Draft, Writing-Re-
view and Editing. Deyi Kong: Methodology, Supervision, Writing-Review and Editing. Yuliang Wei: Methodol-
ogy, Supervision, Review-editing and validation. Bin Pan: Data Analysis, Review and Editing. All authors have
read and agreed to the published version of the manuscript.

Funding
This paper was supported by the Anhui Provincial Natural Science Foundation (No. 2308085QA22), the Hefei
Institute of Technology Innovation Engineering (Project No. KY-2023-SC-01), and the Anhui Provincial Major
Science and Technology Project (Project No. 202203a06020002).

Declarations

Competing interests
The authors declare no competing interests.

Additional information
Correspondence and requests for materials should be addressed to D.K. or Y.W.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Scientific Reports | (2025) 15:11260 | https://doi.org/10.1038/s41598-025-94993-0 12


www.nature.com/scientificreports/

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives


4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in
any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide
a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have
permission under this licence to share adapted material derived from this article or parts of it. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence
and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to
obtain permission directly from the copyright holder. To view a copy of this licence, visit ​h​t​t​p​:​//​ ​c​r​e​a​t​i​v​e​co
​ ​m​m​o​
n​s​.​o​r​g/​ ​l​i​c​e​n​s​e​s​/b
​ ​y​-​n​c​-​n​d​/​4​.​0​/​​​​.​​

© The Author(s) 2025

Scientific Reports | (2025) 15:11260 | https://doi.org/10.1038/s41598-025-94993-0 13

You might also like