0% found this document useful (0 votes)

345 views12 pages

2024, DCNAM - Automatic Detection of Pixel Level Fine Crack Using A Densely Connected - 'Beyene Et Al' (Structures)

The document presents DCNAM, an advanced algorithm for automatic detection of fine cracks in structures using a densely connected network with an attention mechanism. It addresses the gap in existing research by focusing on tiny cracks, which are often overlooked but critical for structural integrity. The proposed method demonstrates superior performance in detecting both common and tiny cracks, achieving significant improvements in evaluation metrics compared to baseline models.

Uploaded by

sba.catchall

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

345 views12 pages

2024, DCNAM - Automatic Detection of Pixel Level Fine Crack Using A Densely Connected - 'Beyene Et Al' (Structures)

Uploaded by

sba.catchall

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Structures 68 (2024) 107073

Contents lists available at ScienceDirect

Structures
journal homepage: www.elsevier.com/locate/structures

DCNAM: Automatic detection of pixel level fine crack using a densely

connected network with attention mechanism
Daniel Asefa Beyene a ,1 , Huangrui a ,1 , Kassahun Demissie Tola b , Fitsum Emagnenehe Yigzew c ,
Minsoo Park d ,∗, Seunghee Park a,c ,∗∗
a
Department of Global Smart City, Sungkyunkwan University, Suwon 16419, South Korea
b Resilient Eco Smart City Education Research Group, Sungkyunkwan University, 2066, Seobu-ro, Jangan-gu, Suwon 16419, South Korea
c School of Civil, Architectural Engineering and Landscape Architecture, Sungkyunkwan University, Suwon 16419, South Korea
d Center for Built Environment, Sungkyunkwan University, Suwon 16419, South Korea

ARTICLE INFO ABSTRACT

Keywords: Deep-learning-based crack identification has emerged as a prominent research area in structural health
Crack detection monitoring. Although the detection of common cracks has been the predominant focus in previous studies,
Deep learning the identification of tiny cracks has often been neglected. Efficiently managing thin cracks is vital, because
FC-denseNet
they can threaten the overall structural integrity over time if left unaddressed. We address this gap by
Attention Block
targeting thin cracks within a broad category of crack types. We introduce a fine-crack-detection algorithm
Structural health monitoring
Tiny cracks
that efficiently detects both common and tiny cracks. Owing to the limited availability of publicly accessible
datasets specifically focused on thin cracks, we collect images of fine cracks to train and evaluate our algorithm.
To validate the efficiency of our method, we conduct experiments on three publicly available crack datasets
and our private dataset. Compared with the baseline neural network, our proposed approach demonstrates
superior performance across all evaluation metrics. Furthermore, our model exhibits impressive generalization
ability across the datasets, with the F1 score and mean intersection over union improving by 22.42% and
28.07%, respectively. Notably, our observations indicate that the advantages of the proposed method become
more pronounced as the dataset size increases.

1. Introduction traditional image processing based techniques have been applied to

detect cracks, which encompass capturing images, processing them,
In recent years, the construction of infrastructure such as bridges, extracting features [3], and identifying cracks [4]. Various techniques,
roads, and buildings has significantly advanced. Maintaining such in- including threshold segmentation [5,6], edge detection [7–9], and
frastructure is crucial, and structural health monitoring has emerged as minimal path methods [10] have been introduced for crack detection.
a crucial task in contemporary maintenance practices. Among structural
However, due to their reliance on a single feature acquisition approach,
defects, cracks are often the earliest signs of structural deterioration.
these methods are susceptible to various challenges, such as noise, vari-
If not addressed promptly, cracks can lead to reduced local stiff-
ness or structural deficiencies [1]. Crack identification involves the ations in illumination, and shadows. These challenges can significantly
process of detecting and locating cracks. Crack identification, which in- impact the performance and robustness of detection methods and lead
volves detecting and locating cracks, is essential for effectively refining to unstable accuracy in real-world applications. Another approach is
maintenance plans. machine-learning-based crack detection, which involves extracting rel-
The visual technique involves onsite personnel examining cracks evant features from images and identifying crack regions. Techniques
and making judgments. However, this method is labor-intensive and such as graph-cut segmentation [11], support vector machines [12],
time-consuming and poses safety risks, particularly for large infrastruc- random forests [13,14] have been employed. However, these methods
tures or those in challenging environments [2]. On the other hand, often require a large number of labeled examples and can exhibit

Abbreviations: CNN, Convolutional Neural Networks; DCNAM, Densely Connected Network with Attention Mode
∗ Corresponding author.
∗∗ Corresponding author at: Department of Global Smart City, Sungkyunkwan University, Suwon 16419, South Korea.
E-mail addresses: [email protected] (M. Park), [email protected] (S. Park).
1
These authors contributed equally to this work.

https://doi.org/10.1016/j.istruc.2024.107073
Received 2 April 2024; Received in revised form 4 July 2024; Accepted 10 August 2024
Available online 22 August 2024
2352-0124/© 2024 Institution of Structural Engineers. Published by Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and
similar technologies.
D.A. Beyene et al. Structures 68 (2024) 107073

unsatisfactory performance in the case of complex backgrounds; thus, and uneven illumination, which can lead to fragmented detection out-
their performance and robustness are affected. comes. To address these limitations, Minimal path selection techniques
Recently, deep learning has demonstrated remarkable performance have been introduced [10,23,24]. These techniques aim to suppress
in computer vision, and its use for detecting concrete cracks has been noise and enhance the continuity of crack detection. However, they are
explored in numerous studies. For example, Zhang et al. [15] pro- often used in scenarios with complex crack topologies, which can limit
posed a convolutional neural network (CNN)-based method employing their applicability.
a sliding-window technique to classify each subregion of input im- Regarding edge detection techniques, Canny [7], Sobel [25], and
ages as cracked or non-cracked. Similarly, Li et al. [14] proposed a morphological filters [26] are employed due to the similarity between
deep bridge crack classification model based on CNNs, incorporating cracks and edges. These methods identify edges in images that may
an optimized sliding-window algorithm for detecting bridge cracks. correspond to crack lines. However, their performance is highly sus-
However, these methods typically only identify the approximate loca- ceptible to noise, and the selection of appropriate hyperparameters is
tion of cracks and lack the capability to extract more detailed crack critical for varying environmental conditions.
information.
An alternative approach is pixel-level crack detection. Schmugge 2.2. Machine learning techniques-based crack detection
et al. [16] proposed a method for remote video crack detection us-
ing semantic segmentation networks. To connect hierarchical features, Machine learning has significantly enhanced crack detection by
Yang et al. [17] introduced feature fusion operations. These methods shifting from basic feature extraction to sophisticated classification
can improve network performance to a certain extent, as low-level fea- algorithms. This evolution marked a pivotal shift in the identification
tures provide intricate crack details and high-level features encompass and analysis of cracks in various images.
more abstract semantic information. However, as the neural network Early machine learning endeavors, as exemplified by
architecture becomes more complex and the number of layers increases, Moon et al. [27] and Moselhi et al. [28], focused on extracting geomet-
acquiring crack features becomes more challenging, with an increased ric features from connected regions, such as the lengths of the major
risk of gradient disappearance. These factors complicate the detection and minor axes and their ratios. These features were then classified
of tiny cracks.. Gao et al. [18] proposed the DenseNet classification using a BP neural network. The effectiveness of these early methods
network to enhance feature propagation and mitigate the vanishing lies in their simplicity and direct approach to feature extraction, and
gradient problem. The DenseNet allows each layer direct access to subsequent research has delved into texture-based features, recognizing
the gradients from the loss function and original input and provides their potential in capturing intricate details of crack surfaces. Michael
implicit deep supervision. The DenseNet enhances feature extraction et al. [29] and Kapela et al. [30] demonstrated the effectiveness of
with densely connected feature maps. However, integrating feature utilizing the pixel intensity values and histograms of oriented gradient
maps of various scales and re-weighting them are crucial owing to features, respectively. Local binary patterns have also been widely
the semantic feature distribution of cracks and the minimal variability adopted for the efficient representation of textural features [31,32].
between fine cracks and the background. To address the challenges in Classifiers are essential in these methods, with various algorithms
crack detection arising from the limited variability between fine cracks such as artificial neural networks [33], support vector machines [29,
and the background as well as the semantic feature distribution of 32], and random structured forests [34] being employed. Each classifier
cracks, we introduced a fully CNN for pixel-level segmentation based has its own strengths and weaknesses, which influence the overall ef-
on FC-DenseNet [19] and squeeze-and-excitation (SE) block [20].The fectiveness of the crack-detection process. However, machine-learning
main contributions of this work are as follows: methods in this domain require a large number of structured labels,
and their performance and robustness are still lacking, particularly in
• The FC-DenseNet is employed to extract feature maps from raw
complex backgrounds.
images at various scales. Dense blocks enhance crack extraction ef-
fectiveness, and skip connections fuse feature maps across different
2.3. Deep learning techniques based crack detection
scales.
• The SE block is utilized to reweight the channel dimensions of
The remarkable feature extraction capabilities of CNNs have driven
feature maps at various scales, thereby enabling dynamic adjustments
numerous studies into the application of deep learning for crack detec-
to channel values.
tion.
• A dataset consisting of 51 images captured using a GoPro camera
Crack detection, an extension of crack classification, involves de-
is collected. These images have a resolution of 6000 × 3384 pixels and
termining the precise location of cracks in an image. Deep-learning
specifically focus on fine cracks.
approaches have shown remarkable robustness and adaptability, often
outperforming traditional and machine-learning-based methods in vari-
2. Related work ous scenarios. For example, Cha et al. [35] developed a CNN model that
uses a 256 × 256 window to scan images. Zhang et al. [15] and Pauly
In this review, we categorize the existing methodologies into three et al. [36] further emphasized the the advantages of deep learning, par-
main approaches: image processing, machine learning, and deep ticularly in handling complex crack patterns. To address the challenge
learning-based methods. of detecting cracks of varying sizes, methodologies like R-CNN and
YOLO have been introduced [37,38]. For road damage identification,
2.1. Image processing techniques-based crack detection Tsuchiya et al. [39] employed YOLOv3, illustrating the benefits of
data augmentation in improving detection accuracy. However, these
In traditional crack identification, several image-processing tech- methods generally provide only approximate crack locations, lacking
niques are utilized to detect cracks. Among these techniques, threshold- the detail needed for thorough analysis. This limitation underscores
ing, minimal path selection, and edge detection methods are commonly the need for continued research to develop techniques that offer more
employed for crack detection [2]. precise crack detection and characterization.
Threshold-based methods rely on the characteristics that cracks are Another alternative solution is a semantic-segmentation-based tech-
darker than the surrounding pixels [5,6,21,22], employ a threshold nique that segments cracks at the pixel level. For example, the holis-
value to distinguish cracks from the background. By employing a tically nested edge detection (HED) framework introduced by Xie
threshold value, these methods distinguish cracks from the background. et al. [40] marked a pivotal shift toward more accurate edge detection,
However, their effectiveness is often compromised by shadows, noise, influencing subsequent models in crack detection. Liu et al. [41]

2
D.A. Beyene et al. Structures 68 (2024) 107073

developed the Deepcrack model based on the HED framework which 3.1.2. FC-Densenet
integrating Conditional Random Fields and Guided Filtering techniques Simon Jégou et al. [19] proposed the FC-DenseNet, which is a
to refine segmentation accuracy. Despite these advancements such variant of the DenseNet architecture specifically designed for semantic
models can still produce false positive pixels due to background in- segmentation tasks. The FC-DenseNet preserves the dense connectivity
terference in the encoder’s features. The Fully Convolutional Network pattern of the DenseNet while adapting it to enable dense predictions
(FCN) [42] enhances semantic segmentation by introducing fully con-
at each spatial location. The key components include dense blocks,
volutional layers, significantly enhancing crack segmentation tasks.
transition-down blocks, and skip connections for multi-scale feature
This approach allows the model to capture detailed features without the
fusion. The FC-DenseNet demonstrates excellent performance in pixel-
limitation of fully connected layers.The UNet architecture [43] further
wise segmentation by effectively capturing and propagating features,
improves segmentation accuracy through its innovative use of skip
connections and upsampling techniques, resulting in a more detailed thereby allowing for accurate and detailed segmentation results. It has
and accurate representation of crack features. Extensions to UNet, as been widely adopted in the field of semantic segmentation and serves
seen in models like Deepcrack [44] and UHDN [45], introduce novel as a foundation for advancements in this field. Owing to the superior
methods for feature fusion at various levels. advantages of the dense connectivity modules and encoder-decoder
Accurate boundary localization remains a challenge in crack detec- structure, which enable better attention to fine crack features, we chose
tion. To address this, Guo et al. [46] proposed a model that combines the FC-DenseNet as the foundational framework for our model.
the original image edge with the output of a base predictor module,
which refined the results with a separate refinement module. Although
this model effectively enhances boundary precision, it comes at the 3.1.3. SE block
cost of increased complexity and processing requirements. In response SE block [20], is a widely used architectural component in deep-
to the need for speed and efficiency, Choi et al. [47] introduced neural networks, particularly for computer vision tasks. The SE block
a semantic damage-detection network that employed separable and enhances the representational power of intermediate feature maps
dilated convolutions to expedite the segmentation process without com- within a CNN by learning channel-wise dependencies and adaptively
promising accuracy. Ensemble methods, such as the one proposed by recalibrating feature responses. This allows the network to emphasize
Fan et al. [48], utilize multiple network outputs to improve detection informative features and suppress less relevant ones. The SE block
accuracy. Although equal weighting in feature fusion can overlook achieves this through two key operations: the squeezing and exciting
minor crack details, indicating areas for potential improvement, most stages. The structure of the SE block is shown in Fig. 5.
of the aforementioned methods employ concatenation operations for
In the squeeze stage, a fully connected hidden layer is employed
feature fusion or merging, where features are equally weighted. In these
to compress channel-wise feature descriptors into a lower-dimensional
operations, the information from tiny cracks may be neglected because
space. This mapping operation utilizes a nonlinear activation function
of their weak representation. This study addresses the challenge arising
from the limited variability between tiny cracks and the background, as such as the rectified linear unit to learn the correlation and importance
well as the semantic distribution of cracks, for the precise segmentation of each channel. In the excitation stage, another fully connected hidden
of both coarse and fine cracks. layer is utilized to map the compressed feature descriptors back to
the original channel dimensions. This mapping operation employs a
3. Research methodology sigmoid function to generate a weight vector ranging from 0 to 1, which
indicates the importance of each channel.
3.1. Proposed method The two core stages of the SE attention module, squeezing and
excitation, work together to enhance the representational power of
3.1.1. Overview of the method the CNN by adaptively adjusting channel weights. This process em-
We consider crack identification through crack segmentation, where phasizes important feature channels, leading to significant performance
each pixel is classified as either a crack or background. The data- improvements in various computer vision tasks. By enhancing the rep-
processing workflow involves obtaining, labeling, and preprocessing resentation of fine cracks, the SE block increases contrast between fine
data. After preprocessing, we created a target dataset that met the ex- cracks and the background. This reweighting capability improves the
perimental requirements. The main structure of the proposed network, network’s ability to detect and represent fine cracks more effectively.
named a Densely Connected Network with Attention Mode(DCNAM) is
presented in Fig. 1. Our network architecture utilizes FC-DenseNet [19]
with the addition of SE blocks [20]. SE blocks are integrated af- 3.1.4. Loss function
ter each convolutional module and dense block, except for the final In our experiments, we employed three types of loss functions:binary
convolutional layer.
cross-entropy (BCE) loss, Dice loss, and class-balanced loss. These
A dense block was used to capture fracture characteristics and
functions were utilized to train our model and evaluate its performance
ensure effective gradient propagation. Fig. 2 shows how the modules
in crack detection.
are tightly connected. The dense connections differ slightly from those
in DenseNet [18], as shown in Fig. 3. (a) BCE loss function. The cross-entropy loss function is commonly
During the training phase, crack images were initially fed into the used in segmentation tasks to measure the discrepancy between the
encoder-decoder structure to generate multi-level feature maps. In the predicted distribution (p) and true distribution (q) of the cracks. For
encoder, the cracked image passed through a convolution module, three
each sample, the BCE loss is computed as the negative logarithm of the
SE blocks, three dense blocks, and two transition-down modules. In the
predicted probability of the true label.
decoder, the feature map was processed through three SE blocks, two
transition-up modules, two dense blocks, and one convolution module. ∑
𝐻𝑊
( ( ) ( ) ( ))
Each SE block was connected after every dense block and the initial BCE(P, Q) = − 𝑄𝑖 log 𝑃𝑖 + 1 − 𝑄𝑖 log 1 − 𝑃𝑖 (1)
i=1
convolution module. Finally, a sigmoid classifier and class-balance loss
were employed to train the network. Fig. 4 provides a comprehensive where Q is the true binary label (zero or one), P is the predicted
overview of our proposed method, offering a visual representation that probability of the positive class, H is the height of the feature map,
clarifies the overall approach. and W is its width of the feature map.

3
D.A. Beyene et al. Structures 68 (2024) 107073

Fig. 1. DCNAM main structure. SE: Squeeze-and-Excitation block, TU: transition up modules, TD: transition down modules.

following equation:
∑
𝐻𝑊
( ( ) ( ) ( ))
L𝑝 = − 𝛽𝑄𝑖 log 𝑃𝑖 + (1 − 𝛽) 1 − 𝑄𝑖 log 1 − 𝑃𝑖 (3)
𝑖=1
N𝑛
𝛽= (4)
𝑁𝑛 + 𝑁𝑐
where N𝑐 represents the pixel number of cracks, N𝑛 represents for non-
crack pixels, 𝛽 means the class-balancing weight, 𝑄 means ground truth
Fig. 2. Dense connection in our method. label, 𝑃 means prediction probability map, 𝐻 means the height of the
feature map, and W means the width of the feature map.

3.2. Implementation details

The proposed network was implemented using the PyTorch frame-

work. Prior to training, the input images and corresponding labels
were cropped to a resolution of 480 × 360 pixels. For comparison, we
selected three classical models: U-Net [43], FCN [42], and FPHBN [44].
The batch size was set to 1, and the number of training epochs was set
to 200. The initial learning rate was set to 0.0001 and was adjusted
according to the following formula:
cur−epoch
𝑙𝑟 = 𝑙𝑟 × 0.995 200 (5)

where 𝑐𝑢𝑟𝑒 𝑝𝑜𝑐ℎ denotes epoch number. An RMSprop optimizer was

used with a weight decay of 0.0001 to optimize the neural network. All
Fig. 3. Dense connection in FC-Densenet [19].
neural network training and testing were performed on a CPU (Intel(R)
Core(TM) i5, 4.10 GHz × 12) and a single GPU of GeForce GTX 3060
12G.
(b) Dice loss function. The Dice loss function, also known as the
Sørensen–Dice coefficient loss, is a similarity-based loss commonly
3.3. Dataset
used in image segmentation tasks. It measures the agreement between
the predicted and ground truth segmentation masks. The Dice loss is
We evaluated the performance of our model on three publicly
defined as
available datasets, namely Crack500 [49], DeepCrack537 [41], and
( )
Dice Loss = 1 − 2∗ |𝐴 ∩ 𝐵| ∕(|𝐴| + |𝐵|) (2) CFD [34] as well as on a private dataset. All datasets were labeled at
the pixel level. The specific characteristics of these datasets are listed
where A represents the predicted segmentation mask (typically a binary
in Table 1.
mask), B represents the ground-truth segmentation mask (typically a
In addition to public datasets, we collected a private concrete crack
binary mask), |𝐴 ∩ 𝐵| denotes the intersection between 𝐴 and 𝐵, that
image dataset from the abutment surfaces of local bridges with different
is, the number of pixels where both masks are positive, |𝐴| represents
backgrounds and tiny cracks. The surfaces of most local bridges are
the number of positive pixels in A, |𝐵| represents the number of positive
not well-plastered, resulting in numerous spots and rough textures.
pixels in B.
They also contain shadows and complex backgrounds. Additionally,
(c) Class-balance loss function. In real-world scenarios, the number of local bridges often exhibit more cracks owing to limited maintenance
cracked pixels is significantly less than the number of non-cracked budgets. Furthermore, different backgrounds, cluttered surfaces, and
pixels. To address this imbalance, we adjusted the cross-entropy loss heavy shadows introduce distractions in the detection of the cracks.
function to better balance cracked and non-cracked pixels using the To address these challenges, we captured 51 images of concrete cracks

4
D.A. Beyene et al. Structures 68 (2024) 107073

Fig. 4. Workflow of our approach for crack segmentation.

Fig. 5. Structure of the SE block.

Table 1 (a) Pixel Accuracy (PA). where is the ratio of the correctly predicted
Information on the datasets used in this study. Train Num: number of training images;
pixels to the total number of pixels:
Val Num: number of validation images; Test Num: number of testing images.
Dataset 𝑇𝑃 + 𝐹𝑁
𝑃𝐴 = (6)
𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁
Crack500 Deepcrack537 CFD Ours
(b) Mean Intersection over Union (miou). is the average ratio of the
Train Num 1896 300 72 141
Val Num 348 0 0 63 intersection to union between the predicted and ground truths for each
Test Num 1124 257 46 54 class.
Resolution 360 × 640 554 × 384 320 × 480 600 × 600 ( )
1 𝑇𝑃
𝑚𝐼𝑜𝑈 = (7)
2 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁
(c) Precision (P). is the proportion of correctly predicted instances to
from bridges around the natural campus of Sungkyunkwan University, all predicted instances:
South Korea. 𝑇𝑃
𝑃 = (8)
The images were captured at different ISO settings: 25 images at ISO 𝑇𝑃 + 𝐹𝑃
200, 13 images at ISO 400, and 13 images at ISO 800. The original size (d) Recall (R). is the proportion of correctly predicted instances to all
of these images is 6000 × 3368 pixels. Some representative images are positive instances:
shown in Fig. 6, display varying backgrounds, lighting conditions, and 𝑇𝑃
𝑅= (9)
surface textures. Due to the predominance of tiny cracks, the ratio of 𝑇𝑃 + 𝐹𝑁
crack pixels to non-crack pixels is lower compared to other datasets. In (e) F-measure (F1). represents the balance between the precision and
addition, the contrast between cracked and non-cracked pixels varies recall:
across images. Annotation files were prepared by experts using Darwin 𝑃 ×𝑅
V7 software to label cracks at the pixel level. After labeling, the final an- 𝐹1 = 2 (10)
𝑃 +𝑅
notations were converted into binary format. To enlarge the dataset and where FP, FN,TP,and TN represent false positives, false negatives, true
mitigate computational resources, we segmented the images 600 × 600 positives, and true negatives, respectively.
pixel tiles, following the approach used for the Crack500 dataset. A
total of 258 non-overlapping segments were selected and randomly 4. Experimental results
divided into training, validation, and testing sets, consisting of 141, 63,
and 54 images, respectively. We conducted four different types of experiments: a comparison
study, a cross-data generation experiment, an ablation study, and an
experiment on special cases in which we analyzed the segmentation
3.4. Evaluation metrics
results of the proposed model under complex scenarios.

In this study, we utilized pixel accuracy (PA), F1 score, precision, 4.1. Comparison result
recall, and mean intersection over union (mIoU) as metrics to evaluate
and compare the alignment between ground-truth labeling and model 4.1.1. Results of Crack500 dataset
predictions. Among these, the mIoU is a commonly used metric in crack We compare the detection results of the models using the Crack500
detection. Precision and recall are standard performance measures for dataset. The experimental results for precision, recall, F1-score, accu-
assessing the capabilities of neural networks. The formulas for these racy, and mIoU are listed in Table 2. Our proposed method, DCNAM,
five evaluation metrics are as follows: demonstrates superior performance across all five evaluation metrics,

5
D.A. Beyene et al. Structures 68 (2024) 107073

Fig. 6. Representative samples from the private dataset.

Fig. 7. Segmentation results on the Crack500 dataset. From left to right: original images, ground truth, segmentation results from FCN, U-Net, FPHBN, and our model.

Table 2 Table 3
Evaluation results of models on the Crack500 dataset. Comparative evaluation of models on CFD dataset.
Class Balance loss Metrics Class Balance loss Metrics
P R F1 Acc mIoU P R F1 Acc mIoU
U-Net [43] 69.82 75.68 72.63 96.48 56.28 U-Net [43] 64.48 69.88 67.07 98.58 50.58
FCN [42] 56.36 62.74 59.40 95.96 43.21 FCN [42] 67.03 70.58 68.76 84.06 53.68
FPHBN [44] 66.75 73.65 70.03 96.37 54.06 FPHBN [44] 53.91 70.38 61.05 98.54 43.86
DCNAM 87.23 79.60 83.24 96.83 71.11 DCNAM 67.69 86.49 75.94 98.71 54.61

with scores of 87.23% for precision, 79.60% for recall, 83.24% for F1- consistently achieved the highest values for precision, recall, F1-score,
score, 96.83% for accuracy, and 71.11% for mIoU. Specifically, in terms accuracy, and mean mIoU, with scores of 67.69, 86.49, 75.94, 98.71,
of F1-score, DCNAM achieved improvements of 10.61 %, 23.84 %, and and 54.61 %, respectively. The CFD dataset’s limited size, with only
13.21 % over U-Net, FCN8, and FPHBN respectively. For mIoU, DCNAM approximately a hundred images across all training, validation, and
outperformed U-Net, FCN, and FPHBN by 14.83, 27.90, and 17.05 %, testing sets, likely explains the lack of significant performance dif-
respectively. ference among the four neural networks. Specifically, the proposed
Fig. 7 shows the segmentation results of randomly selected images method outperformed the U-Net, FCN, and FPHBN in terms of the
from the Crack500 dataset. Our approach achieves segmentation re- F1-score by 8.87, 6.88, and 14.59%, respectively. Similarly, DCNAM
sults that closely match the ground truth in both crack shape and achieved mIoU values 4.03, 0.93, and 10.75% higher than those of the
propagation orientation, which is beneficial for crack quantification. In U-Net, FCN, and FPHBN, respectively.
contrast, the U-Net and FPHBN tended to misidentify some non-crack Segmentation results of randomly selected images from the CFD
pixels as cracks, whereas the FCN lost some crack details. dataset are depicted in Fig. 8. Our method’s segmentation output
closely aligns with the ground truth in terms of crack width and overall
4.1.2. Results of CFD dataset shape. In contrast, FPHBN tends to misclassify non-crack pixels as
Table 3 presents the inference results for precision, recall, F1- cracks and segments cracks wider than the actual ground truth. U-Net’s
score, accuracy, and mIoU on the CFD dataset. The proposed method results suffered from a loss of detail and distortion in the overall shape

6
D.A. Beyene et al. Structures 68 (2024) 107073

Fig. 8. Segmentation results of the CFD dataset. From left to right: original images, ground truth, and the segmentation results of FCN, U-Net, FPHBN, and our model.

Table 4 Table 5
Comparative evaluation of models on the Deepcrack537 dataset. Comparison results on our dataset.
Class Balance loss Metrics Class Balance loss Metrics
P R F1 Acc mIoU P R F1 Acc mIoU
U-Net [43] 82.48 82.68 83.58 98.09 71.25 U-Net [43] 83.19 79.16 81.12 98.35 73.85
FCN [42] 82.38 81.23 81.80 96.24 69.26 FCN [42] 77.38 79.22 78.30 98.67 71.56
FPHBN [44] 83.22 76.13 79.52 98.02 68.20 FPHBN [44] 80.16 75.46 77.74 96.84 69.59
DCNAM 84.67 93.45 88.84 98.22 74.69 DCNAM 85.21 79.33 82.16 98.69 74.35

Table 6
of the cracks. Similarly, FCN’s segmentation results also exhibited a loss Comparison result of models on cross data(Deepcrack to CFD).
Class Balance loss Metrics
of crack details.
P R F1 Acc mIoU
U-Net [43] 58.97 57.43 58.19 98.67 41.04
4.1.3. Results of Deepcrack537 dataset FCN [42] 46.47 55.50 50.59 98.25 33.85
Table 4 presents the experimental results for precision, recall, F1- FPHBN [44] 54.22 59.08 56.55 98.53 39.41
DCNAM 72.35 90.99 80.61 99.14 69.11
score, accuracy, and mIoU on the Deepcrack537 dataset. Our proposed
outperforms all metrics, achieving precision, recall, F1-score, accuracy,
and mIoU values of 85.21, 93.45, 88.84, 98.22, and 74.69%, respec-
tively. Specifically, the proposed method’s F1-scores are 5.26, 7.04, and 4.2. Generation on cross data
9.32% higher than those of the U-Net, FCN, and FPHBN, respectively.
Additionally, the proposed method shows substantial improvements in To assess the generalizability of the proposed model, we conducted
mIoU, surpassing the U-Net, FCN, and FPHBN by 3.44, 5.43, and 6.49%, cross-dataset validation experiments. Initially, we trained the models
respectively. on the Deepcrack537 dataset and then tested them on the CFD dataset.
The segmentation results for three randomly selected images from Table 6 lists the experimental results of various models for this cross-
dataset validation. Our proposed method surpassed the baseline models
the Deepcrack537 dataset are shown in Fig. 9. Our proposed method
U-Net, FCN, and FPHBN in all evaluation metrics. Specifically, our
exhibits the least deviation from the ground truth segmentation. In
model achieved precision, recall, F1-score, accuracy, and mIoU values
contrast, the segmentation maps produced by the other methods exhibit
of 73.04, 89.42, 80.40, 99.09, and 68.80%, respectively. In terms of
noticeable detail loss and distortions.
F1-score, our model outperformed U-Net, FCN, and FPHBN by 22.21,
34.98, and 30.34%, respectively. Furthermore, our method demon-
4.1.4. Result of our dataset strated a significant improvement in mIoU, surpassing U-Net, FCN, and
Table 5 presents the inference results of the models in terms of FPHBN by 27.76, 34.95, and 29.93%, respectively.
the precision, recall, F1-score, accuracy, and mIoU on our dataset. Our Fig. 11 shows the segmentation results of different types of struc-
tural surfaces, including asphalt, concrete, and masonry, under vari-
proposed model achieved the highest precision, F1-score, accuracy,
ous backgrounds. The first, second, and seventh rows show pavement
and mIoU which are 85.21, 82.16, 98.69, and 74.35%, respectively.
cracks: the first features a shallow bifurcation crack, the second row
In terms of the F1-score, DCNAM outperforms the U-Net, FCN, and
shows an alligator crack, and the seventh row depicts a complex
FPHBN by 1.04, 0.69, and 4.42%, respectively. In terms of the mIoU,
background pavement crack. Concrete cracks are shown in the third
our proposed method achieved 0.49, 1.88, and 4.76% higher values
to fifth rows, with shallow and deep bifurcation cracks shown under
than those of the U-Net, FCN, and FPHBN, respectively. various backgrounds. The fifth row shows a masonry crack.
The segmentation results of the dataset are presented in Fig. 10. Our In the first row, despite the disconnected prediction on the left
model achieved more accurate segmentation of fine cracks compared side, our model achieved segmentation results that were closest to the
to other models, especially in terms of crack length and propagation ground-truth labeling. The segmentation result generated by FPHBN
orientations. In contrast, other neural networks exhibit distortions in was better than those of U-Net and FCN; however, it displayed dis-
the segmentation of fine crack lengths and their integrity. connected and missing predictions on the right side of the cracks. FCN

7
D.A. Beyene et al. Structures 68 (2024) 107073

Fig. 9. Segmentation results of the Deepcrack537 dataset. From left to right: original images, ground truth, segmentation results of FCN, U-Net, FPHBN, and Our Model.

Fig. 10. Segmentation results of the our dataset. From left to right: original images, ground truth, segmentation results from FCN, U-Net, FPHBN, and our model.

failed to detect cracks on the left side, whereas U-Net failed to produce In the first row, our model’s predicted segmentation results align
crack lines on both the left and right sides. closely with the ground truth and the segmentation results shown in the
In the third row, our model showed the best segmentation, which same row of Fig. 11. In contrast, segmentations produced by FCN, U-
was closest to the ground truth labeling. The segmentation results net, and FPHBN have disconnected cracks and struggle to detect entire
produced by FPHBN and FCN have similar performances, with FPHBN cracks. In the third row, all models except ours fail to detect the crack
having disconnected predictions on the right side and FCN having on the right side of the non-target image. Additionally, in the fourth
them on both the left and right sides. U-Net is the least accurate and fifth rows, our model successfully detects cracks despite changes
compared to the other models. In the fifth row, despite disconnected in the background resembling different material surfaces. However,
crack predictions, our model demonstrated the best performance for FCN, U-net, and FPHBN struggle to detect complete cracks, resulting in
background surface-like cracks on a concrete surface. The predictions misleading results. The predictions from FCN and FPHBN show shorter
of FCN, U-Net, and FPHBN have disconnected and missing predictions and disconnected segments, respectively. Overall, the segmentation
towards the edges, and in the sixth row, it is easy to see that our model predictions of U-net were the least accurate compared to the other
predicts segmentation results similar to the ground truth labeling of models. Our model demonstrates strong generalization to small domain
the masonry wall. FCN misses the predictions at the joints, whereas shifts or gaps, which can be further enhanced by utilizing a dataset
both U-Net and FPHBN have missing predictions at the bottom edge. with varied lighting conditions, backgrounds, and crack widths into the
In addition, the U-Net exhibited disconnected cracks. training and validation process.
To evaluate the stability of the models against small changes or
domain shifts in the input data, we generated images from raw images 4.3. Ablation study
using different mechanisms. Fig. 12 presents the segmentation predic-
tions for these generated images. In the first row, a wet surface was We conducted an ablation experiment in two parts: 1) comparing
created on a shadowed pavement surface. The third row features a different loss functions, including BCE, Dice loss, and class-balanced
non-target image inserted into the background images, with an image loss and 2) evaluating model performance with and without an atten-
containing cracks used as the background. In the fourth and fifth rows, tion block to examine its impact on segmentation performance.
the background color was changed to resemble red marble, stone, The results of the entire ablation study are presented in Table 7.
sandstone walls, and wood surfaces. Incorporating attention with class-balanced loss outperformed models

8
D.A. Beyene et al. Structures 68 (2024) 107073

Fig. 11. Segmentation results of models on structural surfaces under different material types and environmental conditions.

Fig. 12. Segmentation results of models on generated images.

Table 7 loss function enhances results by 4.62 and 0.04 % compared to the BCE
Ablation study result of both loss functions and attention blocks.
and Dice loss functions, respectively. Fig. 14 provides a comparative
DCNAM Metrics on Crack500 analysis between results obtained with and without the SE attention
P R F1 Acc mIoU module to explore the impact of attention mechanisms. Utilizing the SE
BCE Loss 72.16 88.75 79.60 96.40 66.49 attention module enhances performance compared to models without
Dice Loss 84.24 80.08 82.11 97.47 69.01 it. In particular, integrating the SE attention module into the model
Class-Balance Loss 87.23 79.60 83.24 96.83 71.11
resulted in enhancements of 0.84 and 2.06 % in F1 score and mIoU,
Without SE 81.61 83.20 82.40 95.17 69.05
respectively.

4.4. Result of special cases

without attention blocks across different loss functions. To provide
a clearer view of the impact of these factors on the experimental To assess the impact of different lighting conditions and back-
results, we divided the contents of Table 7 into Figs. 13 and 14. grounds on crack identification and the robustness of our model, we
Fig. 13 shows the comparison of loss functions on the Crack500 dataset randomly selected crack images from our dataset with ISO values of
across all evaluation metrics. Overall, employing the class-balanced 200, 400, and 800. In Fig. 15, , the segmentation results of the proposed
loss function yields better results in terms of both the F1 score and model are presented. Overall, the predictions demonstrated a high level
mIoU metrics. Specifically, the class-balanced loss function leads to of accuracy in segmenting cracks, particularly in successfully identi-
enhancements of 3.64 and 1.13 % in F1 score compared to the BCE and fying small cracks against different backgrounds. Moreover, despite
Dice loss functions, respectively. Regarding mIoU, the class-balanced varying ISO conditions, our model maintained consistency with the

9
D.A. Beyene et al. Structures 68 (2024) 107073

Fig. 13. Ablation result of different loss function.

Fig. 14. Ablation result of with and without SE attention blocks.

ground truth segmentation in terms of crack shape and length. This

indicates that our model is effective at detecting fine cracks, even in
dimly lit environments.
Furthermore, analyzing the prediction results revealed that decreas-
ing the ISO value led to greater loss of detail in crack detection. In
other words, when the lighting conditions were brighter, the model per-
formed better, resulting in fewer details being lost during crack detec-
tion. This observation suggests that environmental lighting conditions
significantly influence crack detection outcomes.
Crack detection tasks often involve identifying tiny cracks that
exhibit subtle features in images, making accurate detection and local-
ization challenging. To address this, high-resolution images, advanced
feature extraction methods, and sensitive algorithms are essential for
capturing and analyzing these subtle crack lines. Existing methods
struggle to effectively segment fine cracks in complex environments.
For instance, methods employing addition operations in the FCN and
concatenation in the UNET and FPHBN for feature fusion tend to
equally weight features, which may lead to neglecting information from
tiny cracks due to their weak representation. Our method, however,
incorporates an attention mechanism, specifically the SE attention mod-
ule. This module reweights the channel values of feature maps across Fig. 15. Segmentation results of our model for randomly selected tiny cracks. Mask:
Ground Truth.
various image scales, enhancing subtle differences and amplifying the

10
D.A. Beyene et al. Structures 68 (2024) 107073

Table 8 generative capability. Our proposed method consistently achieved the

Computational efficiency of different models.
highest mIoU values across all datasets. Notably, on the Crack500
Model Params(M) FLOPs(G) Time(s) and Cross datasets, DCNAM’s mIoU values exceeded those of all com-
U-Net [43] 31.04 260.40 0.28 parative neural networks by an impressive margin of 15% or more,
FCN [42] 62.92 226.44 1.23
showcasing its exceptional segmentation capabilities. Ablation exper-
FPHBN [44] 63.96 226.80 1.40
DCNAM 8.05 16.89 0.12 iments further validated the contributions of the SE attention module
and the class-balanced loss function. The incorporation of SE modules
and the use of a class-balanced loss function led to improvements in F1
and mIoU scores. Our method excelled in segmenting tiny cracks and
contrast between fine cracks and the background, especially in dim cracks under low-light conditions. Separate experiments conducted on
environments. This capability is a distinctive advantage of our approach specific cases confirmed that DCNAM performs well even in dimly lit
and is not available in other algorithms. environments with varying ISO values.
In addition, the upsampling path built from dense blocks in DC- Moreover, we found that the quantity and distribution of data in
NAM performs better than the upsampling paths with more standard the dataset significantly affected results. For example, experiments on
operations [19], such as those in UNET and FCN. Therefore, our model the CFD and Crack500 datasets revealed that a larger quantity of data
addresses the challenges in crack detection arising from the limited benefited crack segmentation assessment. The Crack500 dataset, with
variability between fine cracks and the background as well as the over 1000 samples for training and testing, allowed clear observation
semantic feature distribution of cracks. of performance differences among models. In contrast, the CFD dataset,
with approximately 118 images, made it challenging to distinguish
4.5. Computational efficiency performance variations. These findings highlight the significance of the
data quantity in evaluating the model performance. In terms of data
To evaluate the computational efficiency of the models, three in- distribution, our method performed better with a balanced dataset. For
dicators, namely, floating-point operations (FLOPs), model parameters example, in terms of F1 and mIoU metrics, our method achieved scores
(Params), and inference time, are utilized. FLOPs measure the com- of 75.94 and 54.61% on the CFD dataset and 80.40 and 68.80% on
plexity of a model by quantifying the number of FLOPs a system can the Cross dataset, respectively. This difference is attributed to the more
perform per second. Model Params refer to the number of trainable balanced data distribution in the Cross dataset’s training set compared
Params in the model and determine its size. The inference time is the to the CFD dataset. The Cross dataset had a more even distribution of
average time it takes for a model to process an input image and gener- images depicting both fine and coarse cracks, enabling better segmenta-
ate an output. Both FLOPs and model Params are computed for a single tion performance. Finally, we provided a dedicated dataset focusing on
forward pass, and higher FLOPs in models result in longer inference dark and tiny cracks, reflecting real-life crack occurrence in buildings
times to produce output. Additionally, as more Params increase the more accurately than public datasets.
FLOPs required to perform a task, further increasing the inference time,
In future work, we will focus on further enhancing the algorithm
we employed an image size of 480 × 360 to evaluate the computational
to develop a lightweight model robust against different backgrounds
efficiency.
and crack types. In addition, we will integrate this lightweight model
Table 8 shows the computational efficiencies of all the models.
into portable edge applications to enable real-time automated structural
Among the models, DCNAM achieved the lowest number of Params,
monitoring.
with a value of 8.05M (Params in millions). This represents reductions
of 74.07%, 87.21%, and 87.41% compared to U-Net, FCN, and FPHBN,
CRediT authorship contribution statement
respectively. In terms of FLOPs, our proposed model also achieved
the lowest value, 16.89G (where 1 GFLOP = 1 billion floating-point
Daniel Asefa Beyene: Writing – original draft, Visualization, Soft-
operations). The reduced number of Params and FLOPs is due to the
ware, Methodology, Conceptualization. Huangrui: Writing – original
advantages of the simplified and efficient FC-DenseNet model. In terms
draft, Visualization, Software, Methodology, Conceptualization. Kas-
of inference time, DCNAM performed the best, with times 57.14%,
sahun Demissie Tola: Writing – review & editing, Data curation.
90.24%, and 91.43% lower than those of the U-Net, FCN, and FPHBN,
Fitsum Emagnenehe Yigzew: Writing – review & editing, Data cura-
respectively. This improvement is attributed to the model’s lower num-
tion. Minsoo Park: Supervision. Seunghee Park: Supervision, Funding
ber of Params. A smaller number of Params contributes to a more
acquisition.
compact model, which is advantageous for developing crack inspection
mobile applications for real-time structural health monitoring.
Declaration of competing interest
5. Conclusion
The authors declare that they have no known competing finan-
In this study, we proposed an innovative model, DCNAM, to en- cial interests or personal relationships that could have appeared to
hance crack identification capabilities. By incorporating attention mod- influence the work reported in this paper.
ules, we strengthened the model’s ability to represent tiny cracks and
reduced interference from complex backgrounds. DCNAM primarily Funding
consists of two components: the FC-DenseNet framework and the SE
attention module. The FC-DenseNet framework includes dense mod- This research was supported by a grant [2022-MOIS38-002 (RS-
ules, an encoder, and a decoder. The dense modules enhance the 2022-ND630021)] from the Korea Ministry of Interior and Safety
representation capability of feature maps, while the encoder and de- (MOIS)’s project for proactive technology development safety accident
coder structures, equipped with skip connections, fuse feature maps for vulnerable groups. This research was also supported by the Na-
of different scales to further improve representation. The SE attention tional Research Foundation of Korea (NRF) grant funded by the Korea
module further enhanced the expression of tiny cracks while reducing government(MSIT). (RS-2024-00336270).
background interference by reweighting the channel values.
Extensive experiments demonstrated the effectiveness of DCNAM. Data availability
Across nearly all datasets, DCNAM outperformed baseline neural net-
works in all evaluation metrics, showing excellent robustness and Data will be made available on request.

11
D.A. Beyene et al. Structures 68 (2024) 107073

References [25] Ayenu-Prah A, Attoh-Okine N. Evaluating pavement cracks with bidimensional

empirical mode decomposition. EURASIP J Adv Signal Process 2008;2008:1–7.
[1] König J, Jenkins M, Mannion M, Barrie P, Morison G. What’s cracking? A [26] Maode Y, Shaobo B, Kun X, Yuyao H. Pavement crack detection and analysis
review and analysis of deep learning methods for structural crack segmentation, for high-grade highway. In: 2007 8th international conference on electronic
detection and quantification. 2022, arXiv preprint arXiv:2202.03714. measurement and instruments. IEEE; 2007, p. 4–548.
[2] Beyene DA, Maru MB, Kim T, Park S, Park S, et al. Unsupervised domain [27] Moon H-G, Kim J-H, et al. Intelligent crack detecting algorithm on the concrete
adaptation-based crack segmentation using transformer network. J Build Eng crack image using neural network. In: Proceedings of the 28th ISARC. 2011,
2023;80:107889. 2011, p. 1461–7.
[3] Zalama E, Gómez-García-Bermejo J, Medina R, Llamas J. Road crack detection [28] Moselhi O, Shehab-Eldeen T. Classification of defects in sewer pipes using neural
using visual features extracted by Gabor filters. Comput-Aided Civ Infrastruct networks. J Infrastruct Syst 2000;6(3):97–104.
Eng 2014;29(5):342–58. [29] O’Byrne M, Schoefs F, Ghosh B, Pakrashi V. Texture analysis based damage
[4] Al-Amri SS, Kalyankar NV, et al. Image segmentation by using threshold detection of ageing infrastructural elements. Comput-Aided Civ Infrastruct Eng
techniques. 2010, arXiv preprint arXiv:1005.4020. 2013;28(3):162–77.
[5] Liu F, Xu G, Yang Y, Niu X, Pan Y. Novel approach to pavement cracking auto- [30] Kapela R, Śniatała P, Turkot A, Rybarczyk A, Pożarycki A, Rydzewski P, et al.
matic detection based on segment extending. In: 2008 international symposium Asphalt surfaced pavement cracks detection based on histograms of oriented
on knowledge acquisition and modeling. IEEE; 2008, p. 610–4. gradients. In: 2015 22nd international conference mixed design of integrated
[6] Oliveira H, Correia PL. Automatic road crack segmentation using entropy circuits & systems. MIXDES, IEEE; 2015, p. 579–84.
and image dynamic thresholding. In: 2009 17th European signal processing [31] Quintana M, Torres J, Menéndez JM. A simplified computer vision system
conference. IEEE; 2009, p. 622–6. for road surface inspection and maintenance. IEEE Trans Intell Transp Syst
[7] Zhao H, Qin G, Wang X. Improvement of canny algorithm based on pavement 2015;17(3):608–19.
edge detection. In: 2010 3rd international congress on image and signal [32] Gavilán M, Balcones D, Marcos O, Llorca DF, Sotelo MA, Parra I, et al.
processing. Vol. 2, IEEE; 2010, p. 964–7. Adaptive road crack detection system by pavement classification. Sensors
[8] Rong W, Li Z, Zhang W, Sun L. An improved CANNY edge detection algorithm. 2011;11(10):9628–57.
In: 2014 IEEE international conference on mechatronics and automation. IEEE; [33] Zakeri H, Nejad FM, Fahimifar A, Torshizi AD, Zarandi MF. A multi-stage expert
2014, p. 577–82. system for classification of pavement cracking. In: 2013 joint IFSA world congress
[9] Canny J. A computational approach to edge detection. IEEE Trans Pattern Anal and NAFIPS annual meeting (IFSA/NAFIPS). IEEE; 2013, p. 1125–30.
Mach Intell 1986;(6):679–98. [34] Shi Y, Cui L, Qi Z, Meng F, Chen Z. Automatic road crack detection using random
[10] Amhaz R, Chambon S, Idier J, Baltazart V. Automatic crack detection on two- structured forests. IEEE Trans Intell Transp Syst 2016;17(12):3434–45.
dimensional pavement images: An algorithm based on minimal path selection. [35] Cha Y-J, Choi W, Büyüköztürk O. Deep learning-based crack damage detec-
IEEE Trans Intell Transp Syst 2016;17(10):2718–29. tion using convolutional neural networks. Comput-Aided Civ Infrastruct Eng
[11] Boykov YY, Jolly M-P. Interactive graph cuts for optimal boundary & region 2017;32(5):361–78.
segmentation of objects in ND images. In: Proceedings eighth IEEE international [36] Pauly L, Hogg D, Fuentes R, Peel H. Deeper networks for pavement crack
conference on computer vision. ICCV 2001. Vol. 1, IEEE; 2001, p. 105–12. detection. In: Proceedings of the 34th ISARC. IAARC; 2017, p. 479–85.
[12] Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. [37] Cha Y-J, Choi W, Suh G, Mahmoudkhani S, Büyüköztürk O. Autonomous struc-
IEEE Intell Syst Appl 1998;13(4):18–28. tural visual inspection using region-based deep learning for detecting multiple
[13] Breiman L. Random forests. Mach Learn 2001;45:5–32. damage types. Comput-Aided Civ Infrastruct Eng 2018;33(9):731–47.
[14] Li H, Zong J, Nie J, Wu Z, Han H. Pavement crack detection algorithm [38] Du Y, Pan N, Xu Z, Deng F, Shen Y, Kang H. Pavement distress detection and
based on densely connected and deeply supervised network. IEEE Access classification based on YOLO network. Int J Pav Eng 2021;22(13):1659–72.
2021;9:11835–42. [39] Tsuchiya H, Fukui S, Iwahori Y, Hayashi Y, Achariyaviriya W, Kijsirikul B. A
[15] Zhang L, Yang F, Zhang YD, Zhu YJ. Road crack detection using deep con- method of data augmentation for classifying road damage considering influence
volutional neural network. In: 2016 IEEE international conference on image on classification accuracy. Procedia Comput Sci 2019;159:1449–58.
processing. ICIP, IEEE; 2016, p. 3708–12. [40] Xie S, Tu Z. Holistically-nested edge detection. In: Proceedings of the IEEE
[16] Schmugge SJ, Rice L, Lindberg J, Grizziy R, Joffey C, Shin MC. Crack segmenta- international conference on computer vision. 2015, p. 1395–403.
tion by leveraging multiple frames of varying illumination. In: 2017 IEEE winter [41] Liu Y, Yao J, Lu X, Xie R, Li L. DeepCrack: A deep hierarchical feature learning
conference on applications of computer vision. WACV, IEEE; 2017, p. 1045–53. architecture for crack segmentation. Neurocomputing 2019;338:139–53.
[17] Zhou Q, Qu Z, Cao C. Mixed pooling and richer attention feature fusion for crack [42] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic
detection. Pattern Recognit Lett 2021;145:96–102. segmentation. In: Proceedings of the IEEE conference on computer vision and
[18] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected pattern recognition. 2015, p. 3431–40.
convolutional networks. In: Proceedings of the IEEE conference on computer [43] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomed-
vision and pattern recognition. 2017, p. 4700–8. ical image segmentation. In: Medical image computing and computer-assisted
[19] Jégou S, Drozdzal M, Vazquez D, Romero A, Bengio Y. The one hundred intervention–mICCAI 2015: 18th international conference, Munich, Germany,
layers tiramisu: Fully convolutional densenets for semantic segmentation. In: October 5-9, 2015, proceedings, part III 18. Springer; 2015, p. 234–41.
Proceedings of the IEEE conference on computer vision and pattern recognition [44] Yang F, Zhang L, Yu S, Prokhorov D, Mei X, Ling H. Feature pyramid and
workshops. 2017, p. 11–9. hierarchical boosting network for pavement crack detection. IEEE Trans Intell
[20] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the Transp Syst 2019;21(4):1525–35.
IEEE conference on computer vision and pattern recognition. 2018, p. 7132–41. [45] Fan Z, Li C, Chen Y, Wei J, Loprencipe G, Chen X, et al. Automatic crack
[21] Kamaliardakani M, Sun L, Ardakani MK. Sealed-crack detection algorithm using detection on road pavements using encoder-decoder architecture. Materials
heuristic thresholding approach. J Comput Civ Eng 2016;30(1):04014110. 2020;13(13):2960.
[22] Yamaguchi T, Nakamura S, Saegusa R, Hashimoto S. Image-based crack detection [46] Guo J-M, Markoni H, Lee J-D. BARNet: Boundary aware refinement network for
for real concrete surfaces. IEEJ Trans Electr Electron Eng 2008;3(1):128–35. crack detection. IEEE Trans Intell Transp Syst 2021;23(7):7343–58.
[23] Kaul V, Yezzi A, Tsai Y. Detecting curves with unknown endpoints and ar- [47] Choi W, Cha Y-J. SDDNet: Real-time crack segmentation. IEEE Trans Ind Electron
bitrary topology using minimal paths. IEEE Trans Pattern Anal Mach Intell 2019;67(9):8016–25.
2011;34(10):1952–65. [48] Fan Z, Li C, Chen Y, Di Mascio P, Chen X, Zhu G, et al. Ensemble of deep
[24] Amhaz R, Chambon S, Idier J, Baltazart V. A new minimal path selection convolutional neural networks for automatic pavement crack detection and
algorithm for automatic crack detection on pavement images. In: 2014 IEEE measurement. Coatings 2020;10(2):152.
international conference on image processing. ICIP, IEEE; 2014, p. 788–92. [49] Yang F, Zhang L, Yu S, Prokhorov D, Mei X, Ling H. Feature pyramid and
hierarchical boosting network for pavement crack detection. IEEE Trans Intell
Transp Syst 2019;21(4):1525–35.