Image Super Resolution ESR-GAN
Image Super Resolution ESR-GAN
Automation in Construction
journal homepage: www.elsevier.com/locate/autcon
A R T I C L E I N F O A B S T R A C T
Keywords: The inspection of bridges is increasingly dependent on advanced equipment and algorithms like digital cameras
Multi-sensor fusion SLAM and SfM (Structure from Motion). However, many existing SfM-based bridge inspection methods lack efficiency
Image super-resolution due to lengthy 3D reconstruction computation times, and digital image resolution often falls short in detecting
Bridge inspection
fine cracks and calculating their widths, mainly influenced by the acquisition equipment. This paper describes a
3D reconstruction
Crack assessment
fast and accurate crack assessment method that leverages multi-sensor fusion SLAM (Simultaneous Localization
and Mapping) and image super-resolution. Through multi-sensor fusion SLAM, textured point clouds of the
bridge structure can be obtained directly, significantly improving efficiency. Furthermore, deep learning-based
image super-resolution enhances the precision of crack width calculation. Field tests demonstrate the effec
tiveness of the proposed methods, showcasing a 94% reduction in scene reconstruction time and a 16%
improvement in crack width calculation accuracy.
* Corresponding author at: Department of Civil Engineering, Tsinghua University, Beijing 100084, China.
E-mail address: [email protected] (Y.-F. Liu).
https://doi.org/10.1016/j.autcon.2023.105047
Received 30 March 2023; Received in revised form 30 July 2023; Accepted 1 August 2023
Available online 9 August 2023
0926-5805/© 2023 Elsevier B.V. All rights reserved.
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
learning model, which improved the efficiency and accuracy of real-time cannot be directly utilized for crack detection. This is because bridge
bridge crack detection and effectively addressed challenges associated cracks are usually narrow, measured in millimeters, while the accuracy
with complex image backgrounds. Fu et al. [10] applied the improved of the point cloud is at the centimeter level, which is greater than the
Deeplabv3+ semantic segmentation algorithm to the crack segmenta width of the crack. Additionally, it is difficult to identify continuous
tion task, remarkably enhancing the accuracy of crack segmentation cracks based on discontinuous point cloud or mesh features using
details. Zhang et al. [11] proposed an encoder–decoder crack segmen existing technical means. Therefore, spending a significant amount of
tation network that incorporates multi-scale contextual information time to obtain surface textures utilizing SfM is not cost-effective since
enhancement, enabling effective differentiation between cracks and the crack information cannot be directly obtained through the 3D
background noise. model. All of the above situations reduce the efficiency of acquiring the
Crack quantification mainly focuses on calculating parameters such 3D model. Alternatively, besides SfM, SLAM(simultaneous localization
as length and width. According to China’s industry standard “Technical and mapping) is often used for trajectory measurement of objects and 3D
Specifications for Field Inspections of Existing Highway Bridges” [12], reconstruction of scenes. For camera-based systems, visual SLAM can be
concrete crack detection should include the location, distribution, di used for defect inspection. Kim and Eustice [27] implemented a
rection, width, depth and quantity of cracks. The location of the crack is monocular SLAM algorithm for automated underwater hull inspection.
a crucial factor in determining its nature (whether it is a bending crack For LiDAR-based systems, lidar SLAM can also obtain good results in
or shearing crack). Meanwhile, the crack width serves as an important localization and mapping, such as Cartographer [28], LOAM [29], Fast-
indicator for assessing the severity of the crack. Lee et al. [13] defined LIO [30], etc. In recent years, substantial progress has been made in the
the crack width as the distance between the intersection points of the field of SLAM based on multi-sensor fusion [31–36]. The fusion of Li
crack skeleton’s normal line and the crack’s edge line. Liu [14] proposed DARs and RGB cameras can better improve the mapping process to
a stable four-step calculation method for calculating the crack width simulate the real environment [20]. In this regard, Multi-sensor fusion
based on the above approach. Furthermore, Nishikawa et al. [15] SLAM can efficiently obtain high-precision surface textures and provide
introduced a sub-pixel interpolation method to enhance the accuracy of sufficient information for crack localization.
crack width measurement. Mirzazade et al. [16] conducted 3D recon Furthermore, existing studies have rarely focused on the calculation
struction for the structure, generated an orthophoto image for the crack accuracy of relatively fine cracks. As mentioned above, the size of bridge
region, and leveraged the GSD information of the picture to measure the cracks is very small, which naturally increases the difficulty of crack
width of the crack. quantification. The classical crack width calculation method requires
The difficulty of UAV inspection in China lies in the need to detect the crack to occupy at least 3 pixels in the image so that the skeleton and
cracks that measuring 0.1 mm in width [17]. As a result, the images edge lines can be extracted and the width can be calculated, as shown in
captured by the UAV often only represent a small portion of the overall Fig. 2(a). However, in the scene of bridge inspection, cracks usually
scene. This limited perspective poses a challenge to the effective iden occupy <1 pixel in the image due to the working distance and crack
tification of damage to bridge surfaces within large-scale scenes. scale, which is manifested as a light gray sub-pixel feature, as shown in
Therefore, localizing images that contain the areas where damage has Fig. 2(b). Although there is a crack width calculation method based on
occurred becomes formidable. To tackle this issue, it is often necessary gray value [37], it is very sensitive to working distance and illumination,
to build a 3D model of the bridge and associate the location information and the calculation results are not stable. Recovering the features of fine
of the cracks with the model [18]. Moreover, the 3D reconstruction of cracks in images is of great significance to the accurate evaluation of
bridges can be utilized to evaluate the service status of bridges from crack parameters.
other aspects, such as bridge deformation detection [19]. Different in Aiming at the above problems, this paper proposes a fast and accu
spection devices can collect various types of data for bridge damage rate crack assessment method that addresses the aforementioned issues
assessment, with camera-based and LiDAR-based systems playing a using multi-sensor fusion SLAM and image super-resolution. Firstly,
crucial role in the 3D reconstruction of bridges [20]. One of the most multi-sensor fusion SLAM is applied to quickly obtain 3D models and
commonly used camera-based 3D reconstruction methods is SfM sensor trajectories. Secondly, a crack image super-resolution model is
(Structure from Motion). Scholars have successfully obtained 3D models trained using deep learning network. This model helps restore crack
of bridges or their components using the SfM method [21–25]. This features in the image, ensuring the accuracy of crack parameter calcu
method requires ensuring that the captured images have a large over lation. Thirdly, image-based crack detection and segmentation based on
lapping area and precise trajectory planning to control the working deep learning are implemented to obtain pixel representations of cracks.
distance. Photogrammetry based on the SfM method can generate high- Fourthly, crack width feature pair sequences can be obtained through
precision point clouds to meet the requirements of bridge 3D digital image processing. Finally, the crack information is projected and
reconstruction. located on the 3D model. Field test investigations on a real bridge pier
However, using the SfM approach to generate 3D models of bridges are conducted to illustrate and validate the proposed method.
for inspection presents several challenges. Firstly, it is often necessary to
shoot at fixed points and conduct meticulous trajectory planning to 2. Framework of the proposed method
avoid motion blur during data acquisition and ensure successful recon
struction. This time-consuming process leads to inefficiency. Secondly, The contribution of this study is to improve the efficiency of 3D
SfM typically requires an extended computing time when dealing with a model generation and the accuracy of crack parameter calculation. The
large number of images [26]. Moreover, the resulting point cloud model method proposed in this paper is shown in Fig. 3, which consists of 4
2
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
interrelated parts: data acquisition, multi-sensor fusion SLAM, crack inspection. Additionally, the VIO (Visual Inertial Odometry) subsystem
data processing, and crack localization. In the data acquisition part, the in multi-sensor fusion SLAM allows for the acquisition of surface texture,
determination method of maximum working distance and the principle which can be visually displayed to assist in localization. In this paper,
of data collection based on multi-sensor fusion SLAM are proposed. By Inertial-Visual-LiDAR fusion SLAM is used for data acquisition. The
using multi-sensor fusion SLAM and triangular meshing, the textured point cloud and camera odometry are obtained through state estimation,
point clouds and mesh models are obtained accordingly. In the crack enabling the generation of a continuous surface model via triangular
data processing part, an object detection algorithm based on deep meshing.
learning is utilized to cut the crack region. Furthermore, a crack high-
definition image generation method based on super-resolution and 3.1. Brief introduction to multi-sensor fusion SLAM
deep learning is proposed. The information of crack feature points is
then obtained through digital image processing. Finally, the information SLAM is essentially a state estimation problem that involves solving
from the 3D model, camera odometry, and crack features are combined the position of the robot and the surrounding map points using mea
for projection to obtain the location of cracks. surement data from motion and observation sensors. The observation
The method and equipment proposed in this paper can be integrated process requires various sensors, including cameras, IMU, and LiDAR,
on different platforms as a module in bridge inspection, as shown in and the information from these sensors can be used simultaneously for
Fig. 3. Handheld devices, vehicle-mounted equipment, and UAV plat state estimation.
forms can be selected for data acquisition, each scheme requires corre Multi-sensor fusion SLAM contains subsystems such as VO(Visual
sponding additional hardware development. The data results processed odometry), LO(LiDAR odometry), VIO(Visual inertial odometry), and
by the proposed method can be used to generate or correct BIM models LIO(LiDAR inertial Odometry). Depending on whether sensor data are
and issue inspection reports. jointly optimized, multi-sensor fusion SLAM can be classified as loosely
coupled or tightly coupled. In detail, loosely coupled SLAM involves
3. Data acquisition and 3D reconstruction based on multi-sensor individual subsystems performing motion estimation separately and
fusion SLAM then fusing and updating the estimation results. On the other hand,
tightly coupled SLAM utilizes sensor information from different sub
Multi-sensor fusion SLAM is initially proposed to solve the problem systems for motion estimation simultaneously and optimizes them
of degradation of single-sensor SLAM in scenarios with less texture, jointly. In general, tightly coupled SLAM improves the robustness and
drastic changes in illumination, long corridors, situations requiring accuracy of the system while using measurement information more
strenuous movements, etc. Research and practice have demonstrated efficiently [38]. For illustration, Fig. 4 shows the basic framework of
that multi-sensor fusion SLAM exhibits strong robustness and high ac multi-sensor fusion SLAM, summarized from references [35,36,38–40].
curacy in localization and mapping, making it suitable for bridge
3
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
3.2. Data acquisition strategy information of 1 pixel is amplified to α pixels after super-resolution. By
accounting for these considerations, the maximum working distance can
It is important to note that industrial cameras with relatively low be determined as follows:
resolution (e.g., 640 × 480, 800 × 600, 1280 × 720, etc.) are commonly
f • N • α • GSD
used in visual or fusion SLAM. However, in this paper, a high-resolution umax = (2)
l
(3072 × 2048) industrial camera is employed to collect visual data
specifically for crack detection purposes. Although this choice may where umax refers to the maximum working distance corresponding to a
slightly affect the algorithm’s real-time performance, it guarantees given GSD; α refers to the super-resolution factor. Note that since the
better crack detection outcomes. working distance is less than umax most of the time during data acqui
Before acquiring data, it is necessary to determine the working dis sition, cracks smaller than the GSD value can still be calculated, but the
tance of the system in order to obtain photographs that clearly depict reliability is reduced.
cracks of a certain width. We introduce the concept of ground sampling Fine path planning is often required when collecting data based on
distance (GSD) in geomatics, which represents the actual length corre SfM to ensure sufficient overlapping zones and prevent reconstruction
sponding to one pixel in the image. When employing the camera’s failure. For example, for data collection on a cylindrical bridge pier, SfM
pinhole imaging model (as shown in Fig. 5), GSD can be calculated using necessitates locating the shooting points near the envelope circle at a
the following formula: distance u (u ≤ umax ) from the pier surface. The trajectory usually fol
l•u lows a regular pattern, depending on the surface’s shape. In contrast,
GSD = (1) SLAM, dose not impose strict requirements on the shooting path. It only
f •N
requires that the maximum distance from the surface to be measured
where l refers to the size of the camera sensor; u refers to the working during data acquisition does not exceed umax , as illustrated in Fig. 6.
distance; f refers to the focal length of the camera; and N refers to the Consequently, this reduces the difficulty of data acquisition and signif
number of image pixels corresponding to l. icantly improves efficiency.
The GSD of an image is directly related to the result of crack detec In addition, several factors should be carefully considered during the
tion. When collecting images for crack assessment, it is crucial to ensure data acquisition process: (i) Avoid placing the sensor too close to the
that the GSD is greater than the width of the cracks to be detected. object to prevent sensor degradation caused by limited features. When
According to Eq. (1), the maximum working distance can be determined. conducting data collection, it is advisable to include as many structural
Furthermore, since fine cracks cannot occupy 1 pixel in an image, image features in the scene as possible. (ii) Reasonably control the camera’s
super-resolution is implemented to enhance the details of the cracks. A exposure time, gain, white balance, and other parameters to obtain
super-resolution factor, denoted as α, is introduced to signify that the better surface texture in 3D reconstruction. (iii) Since the collected
4
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
In this paper, a tightly coupled algorithm R3 IVE [36] is adopted for Image super-resolution is one of the classic tasks in computer vision
3D reconstruction. R3 IVE consists of two subsystems, namely the LIO aimed at enlarging the image and enhancing texture details. Traditional
and VIO subsystems. The LIO subsystem utilizes the measurement in approaches to image super-resolution, such as linear interpolation,
formation to construct a global map, while the VIO subsystem fuses bilinear interpolation, and bicubic interpolation, are simple and fast to
visual data by minimizing the frame-to-map photometric error. A dense implement, but struggle to recover effective pixel-level features. The
colored point cloud of the scene is obtained by the application of R3 IVE. introduction of deep learning methods, such as SRCNN [43], treated the
Since SLAM can estimate the state of the sensors, the camera problem as a deep learning task that updates the parameters of the
odometry can be obtained accordingly. In the subsequent process of convolution kernels. However, the generated images often suffer from
crack localization, the position of the camera’s optical center can be excessive smoothness. SRGAN [44] introduced the use of generative
directly determined by the camera odometry. Triangulation is per adversarial network(GAN) into the field of image super-resolution. It
formed using the offline tool provided by R3 IVE, which is based on changes the input of the generator from random noise to low-resolution
Delaunay triangulation and graph cuts [41]. The resulting mesh is images and adopts residual connection. Following SRGAN, ESRGAN
textured using vertex colors. [45] removed the batch norm layer and replaced residual blocks with
The applicability of the method described above is demonstrated dense blocks to reduce the training difficulty of the network. Real-
through its successful implementation in a light rail line bridge. The data ESRGAN [46] further improved upon this by simulating the degrada
acquisition and subsequent 3D reconstruction process are presented. For tion process of high-resolution images in real life and employing pure
the bridge scene shown in the Fig. 7, multi-sensor fusion SLAM was composite data for training, which performs better in real photo super-
utilized, and the acquisition process took 149 s, resulting in a point cloud resolution.
containing 3,198,550 points. In this section, a deep network with residual-in-residual dense blocks
(RRDB) proposed by ESRGAN is used as parts of the generator network
4. Crack data processing for training. On this basis, by referring to the idea of transfer learning,
the layer before the upsampling layer is frozen, and the collected dataset
The goal of crack data processing is to obtain crack shape and width is used for fine-tuning, as shown in Fig. 9. The convolution layer and
parameters from images accurately. In this section, crack detection for RRDB blocks before the upsampling layer are mainly used to extract
acquired images is carried out. Image cropping is utilized to reduce the image features. Therefore, the ability to extract image features can be
interference of background noise. A crack image super-resolution model better guaranteed by using the model parameters pre-trained with a
5
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
large amount of data. The upsampling layer and the layers after it are 5. Crack localization
used for image generation. In this task, fine-tuning with targeted crack
data can achieve a better super-resolution effec. The pre-trained model After obtaining the 3D model and crack information, it is necessary
used in this section is the X4 model for general images provided by real- to project the vector information of the crack onto the model to localize
ESRGAN [46], which was trained using open-source datasets including the cracks. Liu et al. [23] project the crack information based on colli
DIV2K, Flickr2K, and OutdoorSceneTraining. sion detection and axis aligned bounding box(AABB) tree [47]. Based on
Twelve hundred surface photos of 3 concrete bridges are used for the the work of Liu et al. [23], this paper proposes a crack projection method
training of the image super-resolution model. These photos are pro based on multi-sensor fusion SLAM, including determining the camera
cessed by the second-order degradation model proposed by Real- pose and projecting the information of cracks.
ESRGAN and trained together with the original photos as a training
set. Fig. 10 shows the crack areas in the same images before and after 5.1. Determination of the camera’s pose
image super-resolution. To demonstrate the effectiveness of image
super-resolution, two images representing the same area of a crack are The camera pose can be obtained using the camera odometry output
resized using linear interpolation to the same size, as depicted in Fig. 10 by SLAM. The quaternion in the odometer is first converted into a
(a)(b). It becomes apparent that image super-resolution is generally rotation matrix, which represents the rotation from the world coordinate
capable of restoring the fine details of cracks more clearly, consequently system to the camera coordinate system. Considering a crack width
enhancing the accuracy of crack width and other parameter calcula point p, the relationship between the world coordinate system and the
tions. For relatively fine cracks, image super-resolution can effectively camera coordinate system is given by Eq. (3), and the translation matrix
increase the number of pixels occupied by the cracks, enabling the from the world coordinate system to the camera coordinate system can
calculation of previously unmeasurable crack widths (as shown in be obtained by Eq. (4):
Fig. 10(c) and (d)). This improvement has significant implications for
the evaluation of cracks of varying sizes. T WC = − RWC XWC (3)
6
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
of point p from the pixel coordinate system to the world coordinate denotes the coordinate of point p on the world coordinate system; Z
system is: denotes the coordinate of this point in the z direction on the camera
⎡ ⎤ coordinate system; and u, v denote the coordinate of point p on the pixel
u
( ) ⎢ ⎥ coordinate system.
K RWC XWp + T WC = Z ⎣ v ⎥
⎢
⎦ (5) Since Z is unknown, world coordinates cannot be determined from
1 pixel coordinates alone. Considering the normalized plane when Z = 1,
the world coordinates of the points on the normalized plane can be
where K denotes the calibrated camera intrinsic parameter matrix; XWp obtained according to Eq. (6), and then the projection ray can be
7
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
obtained by Eq. (7): SLAM and shown in Fig. 12(c), where a random trajectory is followed to
⎧ ⎡ ⎤ ⎫ control the working distance within the inspection zone less than umax .
⎪ u ⎪
⎨
⎢ ⎥
⎬ Cracks with relatively narrow widths (from 0.1 mm to 2.0 mm) were
Xn = R−WC1 • K − 1 ⎣ v ⎦ − T WC (6) carefully selected on the pier for measurement (as shown in Fig. 12(e))
⎪
⎩ ⎪
⎭
1 to verify the robustness of the proposed method in practical applica
tions. The above cracks are marked as #1 ~ #13 for subsequent
p = X n − XWC (7) analysis.
where Xn denotes the world coordinate of point p on the normalized 6.2. Results of crack data processing
plane; p denotes the projection vector, which determines the orientation
of the projection ray. The crack data was processed using the technical route described in
The projection ray intersects the 3D model to obtain the world co Section 4. Firstly, within the inspection zone that satisfied the maximum
ordinates of the crack feature points. The absolute distance between the working distance constraint, 9 surface photos that could cover the crack
corresponding crack width feature points can represent the real crack regions on the pier surface were selected for processing. Following the
width at that point. procedure outlined in Section 4.2, the crack images underwent crack
region detection and background region cropping. Subsequently, the
6. Field tests on a bridge pier processed images were subjected to 4× super-resolution. Afterwards,
adaptive threshold segmentation was applied to transform the images
To illustrate and validate the proposed method, inspection tests of a into binary images. Finally, the width features of these images were
bridge pier are carried out using a handheld device integrated with identified. The entire process of crack data processing is illustrated in
multi-sensor fusion SLAM. Fig. 13. It is important to note that due to the complexity of the real
environment, some noise and misidentification are inevitably intro
6.1. Equipment, site, and data acquisition duced during crack detection and segmentation. To mitigate this
concern, certain overtly irrelevant error information is manually elim
The parameters of the equipment are listed in Table 1. Livox Avia is a inated during data processing to emphasize the focus of the proposed
lightweight, high-performance solid-state LIDAR with a FOV of 70.4◦ × methodology more effectively. At the same time, the author believes
77.2◦ and a weight of 498 g, which is suitab le for on-site inspection. The that in the engineering application and practice of this technology, some
BMI088 IMU is built into the LiDAR. Hikvision MV-CE060-10UC is a manual intervention may be necessary for crack segmentation.
high-resolution, high-performance CMOS industrial camera that sup
ports USB 3.0 for data transmission. The camera sensor size is 1/1.8,” 6.3. Results of 3D reconstruction and crack localization
and the image’s pixel resolution is 3072 × 2048. A lens of 6 mm focal
length is used with the camera. The equipment used for field tests is The textured point cloud calculated and output by SLAM is shown in
shown in Fig. 12(a). Fig. 14(a). Furthermore, a mesh model that can be used for projection is
A bridge pier near Yongding River in Fengsha Second line, Beijing, generated after triangular meshing, as shown in Fig. 14(b). The pro
was selected for field tests, as shown in Fig. 12(b). The pier in question posed method can obtain textured 3D models efficiently and reproduce
features a truncated cone-shaped bottom half with a distribution of the engineering scene intuitively.
continuous narrow spatial cracks. Data acquisition was conducted in the After obtaining the 3D model of the bridge pier, the pose information
field rather than in a laboratory environment, without any artificial of the aforementioned 9 photos is obtained first according to the method
intervention. Therefore, the results of the field tests can largely reflect described in Section 5.1. The relationship between them and the bridge
the real bridge inspection situation. The field tests were conducted pier is shown in Fig. 15(a). Then, according to the projection method in
under good weather conditions with a cloudless sky and suitable tem Section 5.2 and 5.3, the crack feature points are projected into the 3D
perature to ensure uniform lighting on the pier surface during image model, and crack widths are calculated, as shown in Fig. 15(b)(c)(d).
acquisition. Test parameters are listed in Table 2. The current bridge
inspection standards and codes in China rarely specify the required ac 6.4. Discussion of proposed methods
curacy for crack detection but provide information on the relationship
between crack width and the degree of structural damage. Considering To demonstrate the contributions of the proposed method, 3D
the current standard specification [48–50] and the working distance reconstruction based on SfM and crack width calculation based on
limitation of the LiDAR, we choose the GSD to be 0.4 mm to ensure that bilinear interpolation are carried out for comparison.
the quantification of cracks wider than 0.4 mm in all pictures is reliable.
When the super-resolution factor α is 4.0, the maximum working dis 6.4.1. Faster reconstruction compared with SfM
tance umax can be controlled as 4.11 m. To demonstrate the efficiency of the proposed method, we compare it
Data acquisition strategy mentioned in Section 3.2 is adopted to with other methods based on SfM for 3D reconstruction of the scene
ensure the reliability of the results. Zhang’s method [51] and livox_ca using the data collected above. We crop out the part of the bridge pier
mera_calib [52] are adopted for intrinsic and extrinsic calibration of that we are interested in and show the texture details obtained by
LiDAR and the camera. The trajectory of data acquisition is computed by different reconstruction methods in Fig. 16. Based on the study by Chen
et al. [53], a comparison of the point cloud quality for the inspection
Table 1 zone was conducted from three aspects: nonuniform distribution, sur
Main parameters of equipment. face deviation, and geometric accuracy, as shown in Table 3. In the
Equipment Type Specification nonuniform distribution comparison, the number of neighboring points
FOV:70.4◦ × 77.2◦ within a radius of 0.01 m for each point is calculated, and its mean and
Lidar Livox Avia Range precision: 2 cm standard deviation are reported. In the comparison of surface deviation,
Angular precision: <0.5◦ the thickness of the point cloud was measured at 5 randomly selected
Hikvision points on the surface of the pier, and the averaged results was obtained.
Industrial camera Resolution: 3072 × 2048
MV-CE060-10UC
Hikvision
In the comparison of geometric accuracy, the diameter of the bottom of
Lens Focal length: 6 mm the pier is selected as the reference value. The data shown in Table 3
MVL-HF0628M-6MP
represents: point cloud measurement value/reference true value/the
8
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
9
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
inspection zone. The point cloud quality obtained by SLAM-based SfM calculation. Conversely, due to its inherent characteristics, the
reconstruction is slightly inferior, but the difference is relatively small. SLAM-based method can achieve both engineering precision re
As can be seen from Figs. 16 and 17, for the same inspection zone, the 3D quirements and a boarder range of reconstruction results. Moreover, the
model obtained based on SfM has superior texture accuracy, but the SfM-based method necessitates meticulous consideration of photo
scope of its reconstruction scene is strictly limited by the photos used in overlap to ensure the successful reconstruction. In the example
10
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
presented in this section, reconstruction failure occurs despite an uation of bridge inspection. (ii) SLAM significantly improves inspection
average overlapping rate of 50% among the photos. Only when the efficiency by simultaneously obtaining point cloud results during map
average overlapping rate is increased to around 60% and the photos are ping, unlike the time-consuming process of obtaining point cloud results
carefully selected can the reconstruction process be successful. Ac through feature matching in 3D reconstruction based on SfM. (iii) Both
cording to the reconstruction results in Table 4, the SLAM-based method methods can obtain the texture features of the structure surface, the
takes approximately 94% less time than the SfM-based method to texture obtained by the SfM method is more refined; however, its
complete the same large scene reconstruction task. In engineering refinement comes at the cost of data processing speed. (iv) The method
practice, the advantage of the SLAM-based method becomes more based on SfM requires precise trajectory planning before data acquisi
evident as the size of the scene increases. tion, whereas the SLAM method only requires controlling the working
From the above analysis, it can be concluded that: (i) In terms of data distance less than umax .
acquisition, the method based on multi-sensor fusion SLAM has stronger
flexibility and convenience, making it more in line with the actual sit
11
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
Fig. 16. Comparison of pier reconstruction based on SfM and multi-sensor fusion SLAM.
12
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
Table 4 range.
Comparison between SfM-based and SLAM-based methods in scene According to the code “Technical specification for engineering
reconstruction. structures inspection by digital image method” [17] promulgated by
Method SfM-based SLAM-based China Association for Engineering Construction Standardization, the
Number of points 5,766,525 7,653,839
relative accuracy of digital image method crack detection for width
SfM: 10 min 54 s calculation should be controlled within 20%, so the proposed super-
Point cloud: simultaneous
MVS: 8 min 24 s resolution method can meet the industry requirements. The minimum
Time taken in 3D reconstruction Meshing: 1 min 12 s
Meshing: 1 min 4 s
Total: 1 min 12 s detectable crack width is 0.147 mm (not illustrated in Table 5) in the
Total: 20 min 22 s
case study. This is close to 0.1 mm, which is suggested in the code [17] .
Texture quality High Medium
Requirements of path planning Fine planning Few restrictions However, the reliability of the calculation is reduced when the crack
Scope of reconstructed scene Medium Large width is small, as demonstrated in the field tests. This is due to the
accumulated errors in a series of steps such as data acquisition, crack
segmentation, 3D reconstruction, and crack projection. Therefore,
calculated. Here, deep learning-based image super-resolution proves to further research is warranted to improve the precise calculation of fine
be adept at restoring the intricate details of fine cracks and obtaining crack widths.
relatively reasonable width values.
The analysis shows that when the crack width is small, the relative 7. Conclusions
error of calculation is generally large. This is because when the crack
width is very small, the baseline for comparison becomes small, and In this paper, a crack assessment method based on multi-sensor
measurement error, SLAM positioning error, projection error, etc. are fusion SLAM and image super-resolution is described, which can
more likely to affect the width calculation. However, in this case, the significantly improve the efficiency and accuracy of automated bridge
crack calculation values are higher than the actual values, indicating a inspection while meeting engineering needs. It is applicable to various
bias towards safety in engineering applications. The experiment results platforms including handheld devices, vehicle-mounted equipment, and
demonstrate that for cracks with a width >0.9 mm, the relative error of UAVs. The paper provides guidance on data acquisition using multi-
calculation can be controlled within 20%. Hence, the proposed method sensor fusion SLAM and explains how SLAM is used to obtain the
can obtain relatively accurate crack width calculation values within this textured point cloud and 3D model of the bridge. Moreover, the paper
Table 5
Comparison of the accuracy of crack width calculation with two interpolation methods.
Measurement Super-resolution based on ESRGAN Bilinear interpolation
Crack No. Width Calculation Absolute error Relative error Calculation Absolute error Relative error
Fig. 18. Relationship between crack width and calculation relative error.
13
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
highlights the use of deep learning-based image super-resolution to [2] C. Eschmann, T. Wundsam, Web-based georeferenced 3D inspection and
monitoring of bridges with unmanned aircraft systems, J. Surv. Eng. 143 (2017)
recover the details of cracks clearly and calculate the width feature
04017003, https://doi.org/10.1061/(ASCE)SU.1943-5428.0000221.
accurately. In addition, the method of feature point projection is utilized [3] Z. Ameli, Y. Aremanda, W.A. Friess, E.N. Landis, Impact of UAV hardware options
to locate the cracks. Field tests on a bridge pier are conducted to on bridge inspection mission capabilities, Drones. 6 (2022) 64, https://doi.org/
showcase the effectiveness of the proposed method. The main conclu 10.3390/drones6030064.
[4] H. Zakeri, F.M. Nejad, A. Fahimifar, Image based techniques for crack detection,
sions of this paper are as follows: classification and quantification in asphalt pavement: a review, Archiv. Comp.
Methods Eng. 24 (2017) 935–977, https://doi.org/10.1007/s11831-016-9194-z.
1. Multi-sensor fusion SLAM can significantly reduce the acquisition [5] C.M. Yeum, S.J. Dyke, Vision-based automated crack detection for bridge
inspection, Comp. Aid. Civ. Infrastruct. Eng. 30 (2015) 759–770, https://doi.org/
time of textured point clouds in bridge inspection. In the comparative 10.1111/mice.12141.
experiment of this paper, SLAM can save 94% of the time in this [6] I. Abdel-Qader, O. Abudayyeh, M.E. Kelly, Analysis of edge-detection techniques
process compared with SfM, which dramatically improves inspection for crack identification in bridges, J. Comput. Civ. Eng. 17 (2003) 255–263,
https://doi.org/10.1061/(ASCE)0887-3801(2003)17:4(255).
efficiency. The texture quality is slightly lower than that of the SfM [7] Q. Li, X. Liu, Novel approach to pavement image segmentation based on
method but is enough to support the needs of engineering inspection. neighboring difference histogram method, in: Congress on Image and Signal
2. Multi-sensor fusion SLAM does not necessitate detailed path plan Processing, 2008, pp. 792–796, https://doi.org/10.1109/CISP.2008.13.
[8] Y.Z. Ayele, M. Aliyari, D. Griffiths, E.L. Droguett, Automatic crack segmentation for
ning in the process of data acquisition. It is only necessary to ensure UAV-assisted bridge inspection, Energies. 13 (2020) 6250, https://doi.org/
that the working distance does not exceed the calculated maximum 10.3390/en13236250.
distance, which reduces the complexity of inspection operation. [9] Z. Yu, Y. Shen, C. Shen, A real-time detection approach for bridge cracks based on
YOLOv4-FPM, Autom. Constr. 122 (2021), 103514, https://doi.org/10.1016/j.
3. Image super-resolution based on deep learning can significantly
autcon.2020.103514.
enhance the accuracy of crack width calculation and ensure the in [10] H. Fu, D. Meng, W. Li, Y. Wang, Bridge crack semantic segmentation based on
clusion of relatively small cracks in the width calculation. Consid improved Deeplabv3+, J. Mar. Sci. Eng. 9 (2021) 671, https://doi.org/10.3390/
eration of image super-resolution can be made when it is not possible jmse9060671.
[11] L. Zhang, Y. Liao, G. Wang, J. Chen, H. Wang, A multi-scale contextual information
to achieve a high accuracy GSD due to limitations of the constraints enhancement network for crack segmentation, Appl. Sci. 12 (2022) 11135, https://
of acquisition equipment and environment. By using image super- doi.org/10.3390/app122111135.
resolution, it is possible to expand the range of detectable crack [12] Ministry of Transport of the People’’s Republic of China, Technical Specifications
for Field Inspections of Existing Highway Bridges, JTG/T 5214–2022, People’s
widths. The improvement in results from image super-resolution Transportation Press, 2022.
depends on the availability of high-quality data acquisition, such [13] B.Y. Lee, Y.Y. Kim, S.-T. Yi, J.-K. Kim, Automated image processing technique for
as favorable lighting conditions and minimal motion blur. detecting and analysing concrete surface cracks, Struct. Infrastruct. Eng. 9 (2013)
567–577, https://doi.org/10.1080/15732479.2011.593891.
[14] Y. Liu, Multi-scale Structural Damage Assessment Based on Model Updating and
In the future, certain technical aspects of this method can be Image Processing, Phd thesis, Tsinghua University, 2015, https://kns.cnki.net/kcm
enhanced. Firstly, it is important to address the issue of equipment se s/detail/detail.aspx?dbcode=CDFD&dbname=CDFDLAST2022&filename=101
6712246.nh&uniplatform=NZKPT&v=JULOg9yIjl3Q2sUG97jwbVZHSa
lection in order to accommodate the varying scales of bridge structure OW7XTp0NCAP8DB3uC3cVLrbrh6NEyL-unvltt8 (accessed July 13, 2023).
inspection. Secondly, the crack detection algorithm can be further [15] T. Nishikawa, J. Yoshida, T. Sugiyama, Y. Fujino, Concrete crack detection by
improved to acquire the width point features at the pixel level directly. multiple sequential image filtering, Comp. Aid. Civ. Infrastruct. Eng. 27 (2012)
29–47, https://doi.org/10.1111/j.1467-8667.2011.00716.x.
Finally, there is a need to urgently study how to assess the accuracy of
[16] A. Mirzazade, C. Popescu, J. Gonzalez-Libreros, T. Blanksvärd, B. Täljsten, G. Sas,
crack localization results in the world coordinate system, as the pro Semi-autonomous inspection for concrete structures using digital models and a
cesses of point cloud generation and crack projection are independent. hybrid approach based on deep learning and photogrammetry, J. Civ. Struct. Heal.
In conclusion, the crack assessment scheme based on multi-sensor Monit. (2023) 1–20, https://doi.org/10.1007/s13349-023-00680-x.
[17] China Association for Engineering Construction Standardization, Technical
fusion SLAM and image super-resolution in bridge inspection exhibits Specification for Engineering Structures Inspection by Digital Image Method, T/
good universality and great potential in engineering applications. The CECS 1114–2022, China Architecture & Building Press, 2022.
framework proposed in this paper can be used in the engineering [18] G. Morgenthal, N. Hallermann, J. Kersten, J. Taraben, P. Debus, M. Helmrich,
V. Rodehorst, Framework for automated UAS-based structural condition
practice of automated bridge inspection and adapted according to assessment of bridges, Autom. Constr. 97 (2019) 77–95, https://doi.org/10.1016/
different crack detection and localization algorithms to achieve better j.autcon.2018.10.006.
results. [19] G. Cha, S. Park, T. Oh, A terrestrial LiDAR-based detection of shape deformation for
maintenance of bridge structures, J. Constr. Eng. Manag. 145 (2019) 04019075,
https://doi.org/10.1061/(ASCE)CO.1943-7862.0001701.
[20] S. Guan, Z. Zhu, G. Wang, A review on UAV-based remote sensing technologies for
Declaration of Competing Interest construction and civil applications, Drones. 6 (2022) 117, https://doi.org/
10.3390/drones6050117.
[21] Y. Liu, S. Cho, B.F. Spencer, J.-S. Fan, Concrete crack assessment using digital
The authors declare that they have no known competing financial
image processing and 3D scene reconstruction, J. Comput. Civ. Eng. 30 (2016)
interests or personal relationships that could have appeared to influence 04014124, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000446.
the work reported in this paper. [22] Y. Xu, J. Zhang, UAV-based bridge geometric shape measurement using automatic
bridge component detection and distributed multi-view reconstruction, Autom.
Constr. 140 (2022), 104376, https://doi.org/10.1016/j.autcon.2022.104376.
Data availability [23] Y. Liu, X. Nie, J. Fan, X. Liu, Image-based crack assessment of bridge piers using
unmanned aerial vehicles and three-dimensional scene reconstruction, Comp. Aid.
Data will be made available on request. Civ. Infrastruct. Eng. 35 (2020) 511–529, https://doi.org/10.1111/mice.12501.
[24] A. Khaloo, D. Lattanzi, K. Cunningham, R. Dell’Andrea, M. Riley, Unmanned aerial
vehicle inspection of the Placer River Trail Bridge through image-based 3D
Acknowledgment modelling, Struct. Infrastruct. Eng. 14 (2018) 124–136, https://doi.org/10.1080/
15732479.2017.1330891.
[25] M. Pepe, D. Costantino, UAV photogrammetry and 3D modelling of complex
This research is supported by the National Natural Science Founda architecture for maintenance purposes: the case study of the Masonry Bridge on the
tion of China (52121005, 52192662). The authors express their sincere Sele River, Italy, Periodica Polytechnica, Civ. Eng. 65 (2021) 191–203, https://doi.
appreciation for their support. org/10.3311/PPci.16398.
[26] J.L. Carrivick, M.W. Smith, D.J. Quincey, Structure from Motion in the
Geosciences, John Wiley & Sons, 2016. ISBN: 1118895843.
References [27] A. Kim, R.M. Eustice, Real-time visual SLAM for autonomous underwater hull
inspection using visual saliency, IEEE Trans. Robot. 29 (2013) 719–733, https://
doi.org/10.1109/TRO.2012.2235699.
[1] L. Sun, Z. Shang, Y. Xia, S. Bhowmick, S. Nagarajaiah, Review of bridge structural
[28] W. Hess, D. Kohler, H. Rapp, D. Andor, Real-time loop closure in 2D LIDAR SLAM,
health monitoring aided by big data and artificial intelligence: from condition
in: IEEE International Conference on Robotics and Automation (ICRA), 2016,
assessment to damage detection, J. Struct. Eng. 146 (2020) 04020073, https://doi.
pp. 1271–1278, https://doi.org/10.1109/ICRA.2016.7487258.
org/10.1061/(ASCE)ST.1943-541X.0002535.
14
C.-Q. Feng et al. Automation in Construction 155 (2023) 105047
[29] J. Zhang, S. Singh, LOAM: Lidar odometry and mapping in real-time, in: Robotics: [42] B. Li, Y. Qi, J. Fan, Y. Liu, C. Liu, A grid-based classification and box-based
Science and Systems vol. 2, 2014, pp. 1–9, https://doi.org/10.15607/RSS.2014. detection fusion model for asphalt pavement crack, in: Computer-Aided Civil and
X.007, no. 9. Infrastructure Engineering, 2022, https://doi.org/10.1111/mice.12962.
[30] W. Xu, F. Zhang, FAST-LIO: a fast, robust LiDAR-inertial odometry package by [43] C. Dong, C.C. Loy, K. He, X. Tang, Learning a deep convolutional network for image
tightly-coupled iterated kalman filter, in: IEEE Robotics and Automation Letters 6, super-resolution, in: Computer Vision–ECCV 2014: 13th European Conference,
2021, pp. 3317–3324, https://doi.org/10.1109/LRA.2021.3064227. 2014, pp. 184–199, https://doi.org/10.1007/978-3-319-10593-2_13.
[31] S. Jung, D. Choi, S. Song, H. Myung, Bridge inspection using unmanned aerial [44] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken,
vehicle based on HG-SLAM: hierarchical graph-based SLAM, Remote Sens. 12 A. Tejani, J. Totz, Z. Wang, W. Shi, Photo-realistic single image super-resolution
(2020) 3022, https://doi.org/10.3390/rs12183022. using a generative adversarial network, in: IEEE Conference on Computer Vision
[32] A. Gupta, X. Fernando, Simultaneous localization and mapping (SLAM) and data and Pattern Recognition (CVPR), 2017, pp. 105–114, https://doi.org/10.1109/
fusion in unmanned aerial vehicles: recent advances and challenges, Drones. 6 CVPR.2017.19.
(2022) 85, https://doi.org/10.3390/drones6040085. [45] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, C. Change Loy, ESRGAN:
[33] T. Du, Y.H. Zeng, J. Yang, C.Z. Tian, P.F. Bai, Multi-sensor fusion SLAM approach enhanced super-resolution generative adversarial networks, in: Proceedings of the
for the mobile robot with a bio-inspired polarised skylight sensor, IET Radar Sonar European Conference on Computer Vision (ECCV) Workshops, 2018. https://opena
Navigat. 14 (2020) 1950–1957, https://doi.org/10.1049/iet-rsn.2020.0260. ccess.thecvf.com/content_eccv_2018_workshops/w25/html/Wang_ESRGAN_Enha
[34] C. Debeunne, D. Vivet, A review of Visual-LiDAR fusion based simultaneous nced_Super-Resolution_Generative_Adversarial_Networks_ECCVW_2018_paper.
localization and mapping, Sensors. 20 (2020) 2068, https://doi.org/10.3390/ html (accessed July 13, 2023).
s20072068. [46] X. Wang, L. Xie, C. Dong, Y. Shan, Real-ESRGAN: training real-world blind super-
[35] J. Lin, C. Zheng, W. Xu, F. Zhang, R2LIVE: a robust, real-time, LiDAR-Inertial- resolution with pure synthetic data, in: Proceedings of the IEEE/CVF International
visual tightly-coupled state estimator and mapping, IEEE Robot. Autom. Lett. 6 Conference on Computer Vision, 2021, pp. 1905–1914. https://openaccess.thecvf.
(2021) 7469–7476, https://doi.org/10.1109/LRA.2021.3095515. com/content/ICCV2021W/AIM/html/Wang_Real-ESRGAN_Training_Real-World_
[36] J. Lin, F. Zhang, R3LIVE: a robust, real-time, RGB-colored, LiDAR-Inertial-Visual Blind_Super-Resolution_With_Pure_Synthetic_Data_ICCVW_2021_paper.html
tightly-coupled state Estimation and mapping package, in: International (accessed July 13, 2023).
Conference on Robotics and Automation (ICRA), 2022, pp. 10672–10678, https:// [47] C. Ericson, Real-time Collision Detection, Crc Press, 2004. ISBN:1558607323.
doi.org/10.1109/ICRA46639.2022.9811935. [48] Ministry of Transport of the People’’s Republic of China, Standards for Technical
[37] S. Yang, H. Chen, X. Li, Research on crack width judging by image gray scale, Condition Evaluation for Highway Bridges: JTG/T H21–2011, People’s
Highway Traffic Technol. (Appl. Technol. Ed.) 14 (2018) 71–72. https://kns.cnki. Transportation Press, 2011.
net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7i0-kJR0HYBJ [49] Ministry of Transport of the People’’s Republic of China, Specifications for
80QN9L51zrP6slJd751YSWU5Ue1k-U4fQC_5y7IZdi7X7PXU6DmCDM&uniplatfor Maintenance of Highway Bridges and Culverts: JTG 5120–2021, People’s
m=NZKPT (accessed July 13, 2023). Transportation Press, 2021.
[38] J. Mao, H. Fu, C. Chu, X. He, C. Chen, A review of simultaneous localization and [50] Ministry of Transport of the People’’s Republic of China, Technical Specifications
mapping based on inertial-visual-lidar fusion, Navig. Position. Timing. 9 (2022) for Structural Monitoring of Highway Bridges: JT/T 1037–2022, People’s
17–30, https://doi.org/10.19306/j.cnki.2095-8110.2022.04.003. Transportation Press, 2022.
[39] J. Wang, X. Zuo, X. Zhao, J. Lyu, Y. Liu, Review of multi-source fusion SLAM: [51] Z. Zhang, Flexible camera calibration by viewing a plane from unknown
current status and challenges, J. Image Graph. 27 (2022) 368–389, https://doi. orientations, in: Proceedings of the Seventh IEEE International Conference on
org/10.11834/jig.210547. Computer Vision vol. 1, 1999, pp. 666–673, https://doi.org/10.1109/
[40] S. Zhao, H. Zhang, P. Wang, L. Nogueira, S. Scherer, Super odometry: IMU-centric ICCV.1999.791289.
LiDAR-visual-inertial estimator for challenging environments, in: IEEE/RSJ [52] C. Yuan, X. Liu, X. Hong, F. Zhang, Pixel-level extrinsic self calibration of high
International Conference on Intelligent Robots and Systems (IROS), 2021, resolution LiDAR and camera in targetless environments, IEEE Robot. Autom. Lett.
pp. 8729–8736, https://doi.org/10.1109/IROS51168.2021.9635862. 6 (2021) 7517–7524, https://doi.org/10.1109/LRA.2021.3098923.
[41] P. Labatut, J.-P. Pons, R. Keriven, Efficient multi-view reconstruction of large-scale [53] S. Chen, D.F. Laefer, E. Mangina, S.M.I. Zolanvari, J. Byrne, UAV bridge inspection
scenes using interest points, delaunay triangulation and graph cuts, in: IEEE 11th through evaluated 3D reconstructions, J. Bridg. Eng. 24 (2019) 05019001, https://
International Conference on Computer Vision, 2007, pp. 1–8, https://doi.org/ doi.org/10.1061/(ASCE)BE.1943-5592.0001343.
10.1109/ICCV.2007.4408892.
15