Segundo
Segundo
sciences
Article
A Method for Extracting Joints on Mountain Tunnel Faces Based
on Mask R-CNN Image Segmentation Algorithm
Honglei Qiao 1 , Xinan Yang 1 , Zuquan Liang 1 , Yu Liu 1 , Zhifan Ge 1 and Jian Zhou 2,3, *
1 The Key Laboratory of Road and Traffic Engineering, Ministry of Education, Tongji University,
Shanghai 201804, China; [email protected] (H.Q.); [email protected] (X.Y.)
2 Department of Civil Engineering, Hangzhou City University, Hangzhou 310015, China
3 Key Laboratory of Safe Construction and Intelligent Maintenance for Urban Shield Tunnels of Zhejiang
Province, Hangzhou City University, Hangzhou 310015, China
* Correspondence: [email protected]
Abstract: The accurate distribution of joints on the tunnel face is crucial for assessing the stability and
safety of surrounding rock during tunnel construction. This paper introduces the Mask R-CNN image
segmentation algorithm, a state-of-the-art deep learning model, to achieve efficient and accurate
identification and extraction of joints on tunnel face images. First, digital images of tunnel faces were
captured and stitched, resulting in 286 complete images suitable for analysis. Then, the joints on
the tunnel face were extracted using traditional image processing algorithms, the commonly used
U-net image segmentation model, and the Mask R-CNN image segmentation model introduced in
this paper to address the lack of recognition accuracy. Finally, the extraction results obtained by the
three methods were compared. The comparison results show that the joint extraction method based
on the Mask R-CNN image segmentation deep learning model introduced in this paper achieved
the best joint extraction effect with a Dice similarity coefficient of 87.48%, outperforming traditional
methods and the U-net model, which scored 60.59% and 75.36%, respectively, realizing accurate and
efficient acquisition of tunnel face rock joints. These findings suggest that the Mask R-CNN model
can be effectively implemented in real-time monitoring systems for tunnel construction projects.
Citation: Qiao, H.; Yang, X.; Liang, Z.; Keywords: mountain tunnel; tunnel construction safety; rock mass joints; image processing; deep
Liu, Y.; Ge, Z.; Zhou, J. A Method for learning; Dice similarity coefficient
Extracting Joints on Mountain Tunnel
Faces Based on Mask R-CNN Image
Segmentation Algorithm. Appl. Sci.
2024, 14, 6403. https://doi.org/ 1. Introduction
10.3390/app14156403
The development degree of joints on the tunnel face reflects rock mass integrity, which
Academic Editor: Syed Minhaj is crucial for dynamic evaluation during tunnel construction. Currently, joint development
Saleem Kazmi descriptions rely on hand-drawn records and qualitative judgments, limiting accuracy and
efficiency. Digital image capturing is primarily used for records, yet these images contain
Received: 17 June 2024
valuable rock mass information. If an effective method can be established to digitally
Revised: 15 July 2024
extract and obtain complete rock mass information from tunnel face images, it would
Accepted: 19 July 2024
Published: 23 July 2024
significantly improve the efficiency of real-time dynamic grading of the surrounding rock
on construction sites.
Early methods, such as manual counting, were inefficient and prone to human error.
Ross-Brown and Atkinson [1] first used camera images for rock mass characterization,
Copyright: © 2024 by the authors. while subsequent studies introduced digital image processing techniques [2] for better
Licensee MDPI, Basel, Switzerland. accuracy and efficiency. Advancements in image processing, such as Fourier and Hough
This article is an open access article transforms [3,4], grayscale elevation methods [5], and structural analysis techniques [6],
distributed under the terms and have improved joint extraction but still rely heavily on manual intervention and experience.
conditions of the Creative Commons Recent studies [7,8] developed algorithms to overcome these limitations but faced chal-
Attribution (CC BY) license (https:// lenges in complex environments. However, traditional image processing methods heavily
creativecommons.org/licenses/by/ rely on experience, and their processing effectiveness still needs improvement.
4.0/).
(a)(a)
EOSEOS
6D6D MARK
MARK II Camera
II Camera (b)(b)
50 50
mm mm
f1.4f1.4 Lens
Lens (c) (c) DSLR
DSLR Tripod
Tripod
Figure
Figure 1. Image
1.1.
Image acquisition
acquisition equipment.
equipment.
Figure Image acquisition equipment.
2.1.2.
2.1.2.
2.1.2. Principles
Principles
Principles for for
for Selecting
Selecting
Selecting Light
Light
Light Sources
Sources
Sources
To
ToTo improve
improve
improve image
image
image quality
quality
quality under
under
under low-light
low-light
low-light conditions,
conditions,
conditions, various
various
various lighting
lighting
lighting equipment
equipment
equipment
such
such as
as flash,
flash, reflector
reflector lamps,
lamps, and
and mechanical
mechanical equipment
equipment
such as flash, reflector lamps, and mechanical equipment light sources are used. Each light
light sources
sources are
are used.
used. Each
Each has
has
has
its
its advantages
advantages and
and
its advantages and limitations. limitations.
limitations.
Flash
Flash
Flash isisthe
isthe
the most
mostmost direct
direct
direct complementary
complementary
complementary light
light
light source
source
source forfor
for digital
digital
digital cameras.
cameras.
cameras. It produces
It produces
It produces a
strong
a strong lighting
lighting effect
effect at the
at themoment
moment ofofexposure.
exposure. However,
However,
a strong lighting effect at the moment of exposure. However, in the dusty environment of ininthe
thedusty
dusty environment
environment ofof
tunnel
tunnel
tunnel engineering,
engineering,
engineering, it itcan
itcan
can easily
easily
easily cause
cause
cause diffuse
diffuse
diffuse reflection
reflection
reflection ofof
of dust
dust dust particles,
particles,
particles, which
which
which affects
affects
affects the
thethe
imaging quality. Reflector lamps are a spotlight
imaging quality. Reflector lamps are a type of spotlight with a wide lighting range and a a
imaging quality. Reflector type of spotlight with
with aa wide
wide lighting
lighting range
range and
and
astable
stable
stable light light
light source.
source.
source. They They
They provide
provide
provide betterbetter
betterlightinglighting
lighting forfor for
the the the tunnel
tunnel
tunnel working
working
working surface
surface
surface butbut but
are
are are
less
less
less portable
portable and and
requirerequire a a
power powersupplysupply that that
may may
be
portable and require a power supply that may be inconvenient at the construction site. be inconvenient
inconvenient at at
the the construction
construction site.
site.
MechanicalMechanical
Mechanical equipment equipment
equipment at atthethe attunnel
the construction
tunnel tunnel construction
construction site,
site, site,
such
such such
as as as wet
loaders,
loaders, loaders,
wet wet
spray
spray spray
trucks,
trucks,
trucks,
and and
dump dump
trucks, trucks,
generally generally
have have
lighting lighting
systems. systems.
These These
light light
sources sources
have have
wide wide
cover-
and dump trucks, generally have lighting systems. These light sources have wide cover-
coverage and stable illumination, which can provide a good lighting effect for the tunnel
ageageandand stable
stable illumination,
illumination, which
which cancan provide
provide a good
a good lighting
lighting effect
effect forfor
thethe tunnel
tunnel work-
work-
working surface. Although they may be interfered with by mechanical shadows, they can
ing surface. Although they may be interfered with by mechanical shadows, they can bebe
ing surface. Although they may be interfered with by mechanical shadows, they can
be effectively
effectively avoidedininpractice.
avoided practice.Therefore,
Therefore,ininthis thisstudy,
study, mechanical
mechanical equipment
equipment shown
shown in
effectively avoided in practice. Therefore, in this study, mechanical equipment shown in in
Figure
Figure 22isisselected
selected for
for lighting
lighting and
and fill
filllight.
light.
Figure 2 is selected for lighting and fill light.
Figure
Figure
Figure 2.Tunnel-lined
2.2. Tunnel-lined
Tunnel-lined platform
platform
platformcarcar light
light
car source.
source.
light source.
2.1.3.
2.1.3.
2.1.3. Partitioned
Partitioned
Partitioned Shooting
Shooting
Shooting Plan
Plan
Plan andand
and Timing
Timing
Timing for
forforTunnel
Tunnel
Tunnel Face
Face
Face Photography
Photography
Photography
ToTo
To obtain
obtain
obtain high-quality
high-quality
high-quality images,
images, the
thethe camera
camera
camera should
should
should bebe beplaced
placed
placed 10–20
10–20
10–20 mm inmin in front
front
front of of of
thethe
the tunnel
tunnel face,face, perpendicular
perpendicular to it.to
Theit. The
tunnel tunnel
face is face is
divided divided
into into
sections
tunnel face, perpendicular to it. The tunnel face is divided into sections to ensure compre- sections
to ensureto ensure
compre-
comprehensive
hensive coverage.coverage.
hensive coverage.
Various
Various construction
construction activities
activities can obstruct the tunnel face and complicate photog-
Various construction activities cancan obstruct
obstruct thethe tunnel
tunnel face
face andand complicate
complicate photog-
photog-
raphy. During drilling and charging, the drilling jumbo and the tunnel-lined platform
raphy. During drilling and charging, the drilling jumbo and the tunnel-lined platform carcar
raphy. During drilling and charging, the drilling jumbo and the tunnel-lined platform
car can block the view.During
Duringmucking,
mucking,rubblerubblecovers
coversthe the tunnel
tunnel face,
face, and high dust
cancan block
block thethe view.
view. During mucking, rubble covers the tunnel face, andand high
high dust
dust
Appl. Sci.
Appl. 2024,
Sci. 14,14,
2024, x FOR PEER
x FOR REVIEW
PEER REVIEW 4 4of of2424
Appl. Sci. 2024, 14, 6403 4 of 24
concentration
concentrationmakes
concentration makesphotographing
makes photographingdifficult.
photographing difficult.During
difficult. Duringthe
During theinstallation
the installationofof
installation ofsteel
steelarches
steel archesand
arches and
and
shotcrete
shotcrete application,
shotcreteapplication, the
application,the tunnel-lined
thetunnel-lined
tunnel-lined platform
platform
platform car can
carcar
cancan again
again block
block
again the
the the
block tunnel
tunnel face,
face,
tunnel and
face,and
the
and
thetheshotcrete
shotcrete
shotcreteprocess
process
process reduces
reducesreduces visibility,
visibility, affecting
affecting
visibility, photo
affectingphoto quality.
quality.
photo Therefore,
Therefore,
quality. thetheoptimal
the optimal
Therefore, times
optimal
times
for for
forphotography
photography
times are
are after
photography areafter
aftermucking
mucking and
andbefore
and before
mucking beforeinstalling
installing steel
steelarches,
steel arches,
installing arches,avoiding
avoiding the
the adverse
avoiding the
adverse interferences
interferences shown shown
in Figurein Figure
3 and 3 and
ensuring ensuring
clear clear visibility.
visibility.
adverse interferences shown in Figure 3 and ensuring clear visibility.
(a)(a)
Tunnel-lined platform
Tunnel-lined car
platform shadow
car shadow (b)(b)
Tunnel-lined platform
Tunnel-lined obstruction
platform obstruction
(c)(c)
Rubble obstruction
Rubble obstruction (d)(d)
Shotcrete coverage
Shotcrete coverage
Figure
Figure
Figure 3.3.
Adverse
3. Interferences
Adverse
Adverse inin
Interferences Tunnel Face
Tunnel Photography.
Face (The
Photography. red
(The
(The boxes
red
red are
boxes
boxes where
are
are the
where
where tunnel
the
the tunnel
tunnel
face
faceis obscured).
faceisisobscured).
obscured).
This
Thisarticle
This articlerelies
article relieson
relies onthe
on theLuanchuan–Lushi
the Luanchuan–LushiExpressway
Luanchuan–Lushi ExpresswayTunnel
Expressway Tunnelinin
Tunnel inHenan
Henanand
Henan andthe
and the
the
Hangzhou–Wenzhou
Hangzhou–Wenzhou Railway
Railway Tunnel
Tunnel inin Zhejiang
Zhejiang (shown
(shown inin
Hangzhou–Wenzhou Railway Tunnel in Zhejiang (shown in Figure 4), where the tunnel Figure
Figure 4),4), where
where the the tunnel
tunnel face
face
area
facearea
is
areais isgenerally
generally lessless
generally lessthan
than than100
100 100square
square meters.
meters.
square Considering
Considering
meters. the the
Considering onsite
the onsite shooting
shooting
onsite condi-
conditions,
shooting condi-
tions,
tions, the shooting plan shown in Figure 5 is adopted: the tunnel face is divided intosixsix
the the shooting
shooting plan plan
shown shown
in in
Figure Figure
5 is 5 is
adopted: adopted:
the the
tunnel tunnel
face is face is
divided divided
into six into
sections,
and the and
sections,
sections, camera
andthe iscamera
placed
thecamera 10
is is m in10
placed
placed front
10mmin ofinfront
the face.
frontofoftheThe
the optimal
face.
face.The time
Theoptimal
optimalfor photography
time
timeforforphotog-
photog-is
after
raphy mucking
raphyis isafter and
aftermucking before
muckingand installing
andbefore steel
beforeinstalling arches.
installingsteel During this
steelarches. period,
arches.During
Duringthis uniform
thisperiod, lighting
period,uniform can
uniform
be provided
lighting can using
be the
provided tunnel-lined
using the platform
tunnel-lined car light source,
platform car improving
lighting can be provided using the tunnel-lined platform car light source, improvinglight the
source, lighting
improving quality
thethe
for photographing
lighting quality for the tunnel
photographing face. the
lighting quality for photographing the tunnel face. tunnel face.
Figure 4. 4.
Figure Map ofof
Map tunnel locations.
tunnel locations.
Figure 4. Map of tunnel locations.
Appl. Sci. 2024, 14, x FOR PEER REVIEW 5 of 24
Appl.Sci.
Appl. Sci.2024,
2024,14,
14,6403
x FOR PEER REVIEW 55 of
of 24
24
Figure 5. Onsite digital image shooting plan. (Each number corresponds to a part of the tunnel
face that is divided.
2.2.Figure
Stitching
Figure anddigital
5.5. Onsite
Onsite Fusion
digital of Partitioned
image
image shooting Photography
shootingplan.
plan. Images
(Each number
(Each number correspondsto
corresponds toaapart
partof
ofthe
thetunnel
tunnelface
faceisthat
that is divided.
divided.
After obtaining the six partitioned photographic images of the tunnel face, stitching
them2.2.together
2.2. Stitching
Stitching andto form
and Fusion
Fusiona complete
of
of Partitionedand
Partitioned clear tunnel
Photography
Photography face image is a prerequisite for the next
Images
Images
step of After
joint extraction.
After obtaining
obtaining the the sixsix partitioned
partitioned photographic
photographic images
images of of the
the tunnel
tunnel face,
face, stitching
stitching
When
them
themtogethertaking
together to the
toform
form images,
aa completeto ensure
complete and
andclear complete
clear tunnel coverage
tunnelface
face image
imageisof
is aaeach partition,
prerequisite
prerequisite forthe
for the area
next cov-
thenext
eredstepbyof
step ofeach
jointpartition
joint extraction.
extraction. image is often slightly larger than the actual corresponding parti-
tion area. When
When This inevitably
taking the leadstoto
images,
the images, toensure
overlapping
ensure complete
complete images inofadjacent
coverage
coverage of each
each regions,
themaking
partition,
partition, cov-it im-
the area
area
covered
possible
ered by by
toeach each
achieve partition
image
partition image image
stitching is often
is oftenthrough slightly larger
simplethan
slightly larger than the
positioning. actual corresponding
The followingparti-
the actual corresponding example
partition
illustrates
tion area. area.
theThis This
stitching inevitably
inevitablyandleads leads
fusion to overlapping
algorithm
to overlapping images
ofimages
partitioned in adjacent
images
in adjacent regions,
on
regions,the making
tunnel
making it us-
it face,
im-
impossible
possible to to achieve
achieve image
image stitching
stitching through
through simple
simple positioning.
positioning.
ing the right arch foot region and the floor region of a tunnel face as examples. The
The following
following example
example
illustrates
illustratesthe
thestitching
stitchingand
andfusion
fusionalgorithm
algorithmofof
partitioned images
partitioned imagesonon
thethe
tunnel face,
tunnel using
face, us-
the right arch foot region and the floor region of a tunnel face as examples.
ing the right arch foot region and the floor region of a tunnel face as examples.
2.2.1. Image Stitching of Tunnel Work Face Partitions
2.2.1.
As Image
shown Stitching
in Figureof 6,
Tunnel Work
the blue FaceofPartitions
part the floor partition image overlaps with the red
2.2.1. Image Stitching of Tunnel Work Face Partitions
part of the
As right arch foot partition image. This
shown in Figure 6, the blue part of the overlapping
floor areaoverlaps
partition image needs to be the
with stitched
As shown in Figure 6, the blue part of the floor partition image overlaps with the red
red to-
part of the right arch foot partition image. This overlapping area needs to be stitched together.
gether.
part of the right arch foot partition image. This overlapping area needs to be stitched to-
gether.
Figure 6. Image
Figure
Figure 6.6. Imageoverlapping
Image overlappingarea.
overlapping area.
area.
The
The
The image
image
image stitchingprocess
stitching
stitching process uses
process uses the
uses the SURF
theSURF (Speeded-Up
SURF(Speeded-Up
(Speeded-Up Robust Features)
Robust algorithm,
Features) algorithm,
which
which performs
performs faster
faster compared
compared to
to other
other algorithms
algorithms [33].
[33]. First,
First, the
the
which performs faster compared to other algorithms [33]. First, the Hessian matrix Hessian
Hessian matrix
matrix of the
of theof the
image
image is
is calculated
calculated according
according to
to Equation
Equation
image is calculated according to Equation (1): (1):
(1):
f f ∂2 f 2 ∂2 f 2
H ( f ( x, y)) = f f
2
∂x2 ∂x∂y
2
(1)
x xy
∂2 f 2∂2 f
H ( f ( x, y ) ) = x xy
∂x∂y ∂y22
( )
H f ( x, y ) = 2 f2 2 f2
(1)
(1)
value
where H() is the Hessian matrix, and f (x, y) is the color f image coordinates (x, y).
f the
2 at
xy y
xy y
2
where H() is the Hessian matrix, and f(x, y) is the color value at the image coordinates (x,
y). H() is the Hessian matrix, and f(x, y) is the color value at the image coordinates (x,
where
y).
Appl. Sci. 2024, 14, 6403 6 of 24
Next, the determinant of the Hessian matrix is calculated using Equation (2) to obtain
the local extremum points of the pixels, which are used as the SURF feature points of
the image.
Appl. Sci. 2024, 14, x FOR PEER REVIEW 6 of 24
∂2 ∂2 f
2
∂ f
det( H ) = 2 2 − (2)
∂x ∂y ∂x∂y
whereNext,
H is the determinant of the
Hessian matrix, andHessian
f is thematrix is calculated
color value usingcoordinates
at the image Equation (2)(x,
toy).
obtain
the local
Afterextremum
obtaining points of the
the feature pixels,
points of which are used
the reference as the
image SURF
and feature points
the matching image,of the
the
similarity
image. of the feature points is calculated using the Euclidean distance criterion shown
in Equation (3):
2 2 f 2 f
= ( H∑)i=
q
n
ldet =1 ( X21 (i) −2 X−2(i))2 (3)
(2)
x y xy
where l is the distance between the two points, n is the dimension of the feature points, X1
where
is H is the Hessian
the descriptor vector ofmatrix, and fpoint
the feature is theincolor value at the
the reference image
image, andcoordinates (x, y).
X2 is the descriptor
After obtaining the feature points of the
vector of the feature point in the matching image. reference image and the matching image, the
similarity
Whenofthe thedistance
feature is
points is calculated
less than using the(found
the set threshold Euclidean distance
to be criterion
optimally betweenshown
0.6
and 0.8), the (3):
in Equation two feature points are considered successfully matched.
Figure 7.
Figure 7. Unnatural
Unnatural edges
edges in
in image
image stitching.
stitching. (As
(As the
the yellow
yellow box
box marks).
marks).
The final complete tunnel work face fusion image compared with the full tunnel work
face captured image is shown xin1 (Figure
a, b ) 8. ( a, b ) x1
X ( a, b ) = (1 − ) x1 ( a, b ) + x2 ( a, b ) ( a, b ) x1 x2 (4)
x ( a, b ) ( a, b ) x2
1
where x1 , x2 are the images to be stitched, X is the stitched image, is the weighting
wd
factor, γ= ( 0,1) , w is the horizontal coordinate distance of the overlapping part
w
of the stitched images, and wd is the horizontal coordinate distance of the pixel points in
the overlapping part of the stitched images from the start of the overlapping section.
The final complete tunnel work face fusion image compared with the full tunnel
Appl. Sci.
Appl. Sci.2024, 14,x6403
2024,14, FOR PEER REVIEW 7 7ofof24
24
Stitched
Stitchedand
andfused
fusedimages
imagessignificantly
significantlyimprove
improvequality
qualityand
andrestore
restoregeological
geologicalstruc-
struc-
ture
ture information,
information, laying
laying aa solid
solid foundation
foundation for
for subsequent
subsequent joint
joint extraction
extraction (see
(see Figure
Figure 8).
8).
3. Joint
3. JointExtraction
Extractionfrom
fromTunnel
Tunnel Face
Face Based
Based on
on Traditional
Traditional Image Processing Methods
This section
This section employs
employs traditional
traditionalcomputer
computerimage
imageprocessing
processingmethods
methodsfor forjoint
jointex-
ex-
traction from
traction from tunnel
tunnel face
face images.
images. The
Themain
mainprocess
processincludes
includesgrayscale
grayscaleprocessing,
processing, spatial
spatial
filtering, image
filtering, image binarization,
binarization,morphological
morphologicalprocessing,
processing,noise
noiseremoval,
removal,and andfinally,
finally,out-
out-
putting the joint extraction image of the tunnel
putting the joint extraction image of the tunnel face. face.
The following
The following demonstrates the the image
imageprocessing
processingprocedure
procedureusing
usingthe
thecomplete
complete tunnel
tun-
faceface
nel image
imageobtained through
obtained stitching
through and and
stitching fusion in Section
fusion 2.2, as2.2,
in Section shown in Figure
as shown 8b.
in Figure
8b.
3.1. Grayscale Processing
Grayscale
3.1. Grayscale processing reduces image dimensions, facilitating feature extraction by
Processing
converting RGB images to grayscale
Grayscale processing using dimensions,
reduces image Equation (5).facilitating
The result is shownextraction
feature in Figure 9.
by
converting RGB images to grayscale using Equation (5). The result is shown in Figure 9.
Gray = ( R + G + B)/3 (5)
(
Gray = R + G + B / 3 )
where Gray is the calculated grayscale value of the pixel, R is the red component value of
(5)
the pixel,
where G is the
Gray green
is the component
calculated valuevalue
grayscale of theofpixel, and BRisisthe
the pixel, theblue component value
red component value
of the pixel.
of the pixel, G is the green component value of the pixel, and B is the blue component
value of the pixel.
Appl.
Appl.Sci. 2024,14,
Sci.2024, 14,6403
x FOR PEER REVIEW 88 of
of 24
24
( )
coefficient is determined by the product of the spatial kernel and the range kernel, with2the
x, y, k , l is the weighting coefficient
expression given by Equation (7):
( x − ) (
for 2the neighboring
k + y − l )
2
( () ) ( k , l at
)
x, y k−, l f centered
fpixel
−
( ) (
point x, y . Thisx,coefficient)
y, k , l ="isexpdetermined
2
by the2 product −
2 2
of the spatial #
2 2
kernel
2 and the
(7
( x − k) + (y − l )d || f ( x, y) − f (k, l )|| r
range kernel,ωwith
( x, y,the
k, l )expression
= exp − given by Equation −(7):
(7)
2σd2 2σr2
f x, y − f k, l
( σr is) and ( r )isradius
2
( xthe ) + ( ydomain
− kspatial l)
2
−kernel,
2
whereΣ isdtheis the filter radius
where d (filter k , l ) = of
x, y, radius exp −of
the spatial domain
−
kernel,
and the filter the filter
of the radius o
(7)
the range
range domaindomain
kernel.kernel. 2 d2 2 r2
The
Theeffect of of
effect thethe
tunnel
tunnel
image
faceface after after
image bilateral filtering
bilateral is shown
filtering is in in10.
Figure
shown Figure 10.
where d is the filter radius of the spatial domain kernel, and r is the filter radius of
the range domain kernel.
The effect of the tunnel face image after bilateral filtering is shown in Figure 10.
Figure10.
Figure 10.Bilateral
Bilateral filtering
filtering effect
effect on tunnel
on tunnel face image.
face image.
Figure 13.
13. Schematic diagram
diagram of morphological
morphological processing.
Figure 13. Schematic
Figure Schematic diagram of
of morphological processing.
processing.
After applying
applyingmorphological
morphologicalprocessing
processingto thethe
joints with breakpoints shown in Fig-
After applying morphological processing totothe joints
joints with
with breakpoints
breakpoints shown
shown in
in Fig-
ure 12a,
Figure thethe
12a, result is as
result shown
is as shown ininFigure
Figure12b.
12b.ItItcan
canbebeseen
seenthat
thatmorphological
morphological processing
ure 12a, the result is as shown in Figure 12b. It can be seen that morphological processing
connects pixels
connects pixelsin
injoints
jointsbybyapplying
applyingdilation
dilationandanderosion
erosion operations,
operations, effectively
effectively address-
addressing
connects pixels in joints by applying dilation and erosion operations, effectively address-
ing disconnected
disconnected points
points due due to lighting
to lighting or filling
or filling materials.
materials.
ing disconnected points due to lighting or filling materials.
3.5.
3.5. Noise
Noise Removal
Removal
3.5. Noise Removal
As
As shown in
shown in Figure
Figure 14a,
14a, after
after morphological
morphological processing
processing ofof the
the joints,
joints, aa large
large number
number
of As pixels
noise shownandin Figure 14a, pixel
significant after morphological
interference fromprocessing
the of the joints,
surrounding rock a large
still existnumber
in the
of noise pixels and significant pixel interference from the surrounding rock still exist in
of noise
image. pixels
Noise and
removal significant
is requiredpixel
to interference
address these from the
issues. surrounding rock still exist in
the image. Noise removal is required to address these issues.
the image. Noise removal is required to address these issues.
(a) Image with Noise Points (b) Surrounding Rock Area Removed
(a) Image with Noise Points (b) Surrounding Rock Area Removed
(c) Small Noise Points Removed (d) Non-Joint Areas Removed and Contours Added
(c) Small Noise Points Removed (d) Non-Joint Areas Removed and Contours Added
Figure 14.
Figure 14. Image
Image noise
noise removal
removal process.
process.
Figure 14. Image noise removal process.
Noiseremoval
Noise removalinvolves
involveseliminating
eliminating large
large surrounding
surrounding rockrock
areasareas (as shown
(as shown in Figure
in Figure 14b),
Noise removal involves eliminating large surrounding rock areas (as shown in Figure
14b), small
small noise noise
pointspoints (as shown
(as shown in Figure
in Figure 14c),14c),
and and non-joint
non-joint areasareas (as shown
(as shown in Figure
in Figure 15)
14b), small noise points (as shown in Figure 14c), and non-joint areas (as shown in Figure
through region-growing
15) through algorithms
region-growing and and
algorithms geometric shape
geometric analysis.
shape analysis.
15) through region-growing algorithms and geometric shape analysis.
Appl. Sci.Sci.
Appl. 2024, 14,14,x 6403
2024, FOR PEER REVIEW 11
11 of 24
Figure15.15.
Figure Comparison
Comparison of non-joint
of non-joint andareas.
and joint joint areas.
After removing the non-joint areas and importing the tunnel contour curve, the final
After removing the non-joint areas and importing the tunnel contour curve, the
recorded structure of the tunnel face is obtained, as shown in Figure 14d.
recorded structure of the tunnel face is obtained, as shown in Figure 14d.
4. Joint Extraction on Tunnel Faces Based on Image Segmentation Neural
Network
4. Joint Models
Extraction on Tunnel Faces Based on Image Segmentation Neural Network
As can be seen from the tunnel face structure catalog obtained in Section 3, traditional
Models
image processing methods for extracting joints are generally ineffective, involve substantial
manual Asintervention,
can be seenhave froma the tunnel
complex face structure
processing catalog
workflow, and obtained in Section
result in some loss of3, tradit
image
joint processing
information. methods
This makes it for extracting
difficult to meetjoints are generally
the requirements ineffective,
for quick involve sub
and accurate
tial manualofintervention,
identification havetunnel
joints in mountain a complex processing
engineering workflow,
faces. To and
address this, result
recent in some lo
image
segmentation algorithms
joint information. Thishavemakesbeenitintroduced
difficult totomeet
achievethemore intelligent and
requirements foraccurate
quick and accu
extraction of faceof
identification joints.
jointsIn in
thismountain
section, based on digital
tunnel image samples
engineering faces. of
Tothe face obtained
address this, recent im
through onsite shooting and stitching, the U-Net convolutional neural network algorithm
segmentation algorithms have been introduced to achieve more intelligent and accu
and the Mask R-CNN convolutional neural network algorithm are employed for learning
extraction
and of face
recognition joints.The
extraction. In this section,
extraction based
results areon
thendigital image
analyzed andsamples
compared. of the face obta
through onsite shooting and stitching, the U-Net convolutional neural network algor
4.1.
andData
theCollection,
Mask R-CNN Annotation, and Augmentation
convolutional neural network algorithm are employed for lear
4.1.1. Onsite Data Collection
and recognition extraction. The extraction results are then analyzed and compared.
In image recognition, the dataset is the foundation for training and evaluation, and
selecting an appropriate dataset is crucial for the algorithm’s performance and accuracy.
4.1. Data Collection, Annotation, and Augmentation
The onsite tunnel face image collection was carried out as described in Section 2.1, and
4.1.1.
the Onsite
collected Data Collection
partitioned digital images were stitched and fused using the algorithm de-
scribedIninimage
Sectionrecognition,
2.2. The onsitethe
tunnel face image
dataset is thecollection
foundationresulted
for in 1,716 partitioned
training and evaluation
photographs, which were stitched into 286 complete images.
selecting an appropriate dataset is crucial for the algorithm’s performance and accu
The onsite
4.1.2. tunnel face image collection was carried out as described in Section 2.1, an
Data Annotation
collected partitioned
Data annotation digital
with images were
the interactive stitched annotation
segmentation and fused software
using the algorithm
EISeg 1.1.1 descr
in Section
(Efficient 2.2. TheSegmentation
Interactive onsite tunnel1.1.1)
face (shown
image collection resulted
in Figure 16) in for
is crucial 1,716 partitioned ph
accurately
marking
graphs,joint
whichareas andstitched
were facilitating precise
into model training.
286 complete The effect of segmentation is
images.
shown in Figure 17.
4.1.2. Data Annotation
Data annotation with the interactive segmentation annotation software EISeg
(Efficient Interactive Segmentation 1.1.1) (shown in Figure 16) is crucial for accur
marking joint areas and facilitating precise model training. The effect of segmentati
shown in Figure 17.
Appl.
Appl.Sci. 2024,14,
Sci.2024, 14,6403
x FOR PEER REVIEW 12 of
of 24
24
Appl. Sci. 2024, 14, x FOR PEER REVIEW 1
Figure16.
16.Main
Maininterface
interface viewofofEISeg
EISegannotation
annotationsoftware.
software.
Figure Figure 16.view
Main interface view of EISeg annotation software.
(a) Original Image (b) Left–Right Flip (c) Up–Down Flip (d) Rotation (e) Translation
(a) Original Image (b) Left–Right Flip (c) Up–Down Flip (d) Rotation (e) Translation
Figure 18. Dataset augmentation operations. (The orange line is added later to determine the direc
tion of
Figure
Figure 18.the
18. picture).
Dataset
Dataset augmentationoperations.
augmentation operations. (The
(The orange
orange lineline is added
is added laterlater to determine
to determine the direc-
the direction
tion of the picture).
of the picture).
4.2. Joint Extraction of Tunnel Face Based on U-Net Deep Learning Architecture
4.2.Joint
4.2. JointExtraction
Extraction of of Tunnel
Tunnel Face
FaceBased
Basedon onU-Net
U-NetDeep
DeepLearning
LearningArchitecture
Architecture
4.2.1.U-Net
4.2.1. U-Net Convolutional
Convolutional Neural
Neural Network
Network Architecture
Architecture
4.2.1. U-Net Convolutional Neural Network Architecture
TheU-Net
The U-Net convolutional
convolutional neural
neural network
network was proposed
was proposed in 2015in 2015
[21] and[21]
has and has achieve
achieved
The U-Net
goodresults
results convolutional neural network was proposed in 2015 [21] and has achieved
good in in
thethe field
field of medical
of medical imageimage cell segmentation.
cell segmentation. The network,
The U-Net U-Net network,
suitable suitabl
good
for
results
forjoint
joint in the field
extraction,
extraction,
of medical
classifies
classifies
image
allallpixels
pixels cell
inin
segmentation.
ananimage.
image.Its The U-Netneural
Itsconvolutional
convolutional network,
neural suitablestruc
network
network
for joint extraction,
structure is shown classifies all pixels in an image. Its convolutional neural network struc-
ture is shown in in Figure19.
Figure 19.
ture is shown in Figure 19.
Figure
Figure
Figure19.
19.
19.U-Net
U-Net
U-Netconvolutional
convolutional neural
neural
convolutional network
network
neural architecture.
architecture.
network architecture.
4.2.2. U-Net Convolutional Neural Network Parameter Selection
4.2.2. U-Net
4.2.2. U-Net Convolutional
Convolutional Neural
Neural Network
Network Parameter Selection
Parameter Selection
The input image size of this convolutional neural network is 512 × 512. After four
TheTheinput
down-sampling
image
input image
and
sizesize
of this
of this
four up-sampling
convolutional
convolutional
processes,
neural network
neural
the output
is 512is×512
network
image size remains
512. × After
512 512.
× 512,
four fou
After
down-sampling
down-sampling
the and
same as the input. four
andThe up-sampling
four up-sampling
U-Net processes,
uses a 3 × 3processes, the
convolution output
the image
output
kernel, ReLU size
image remains function,512,
512
size remains
activation × 512 × 512
the
and same
2 × as
2 the
max input.
pooling Thefor U-Net uses
down-sampling. a 3 × 3 convolution kernel, ReLU activation function,
the same as the input. The U-Net uses a 3 × 3 convolution kernel, ReLU activation function
and 2 × 2 max pooling for down-sampling.
and Convolution
(1) 2 × 2 max pooling for down-sampling.
layer parameter selection
(1) Convolution layer parameter selection
(1) TheConvolution
convolutionlayer
kernelparameter
size is 3 × 3, selection
and its convolution processing principle is shown
The convolution
in Figure kernel size is 3convolution
× 3, and its convolution
is 1. Toprocessing thatprinciple is size
shown
The convolution kernel size is 3 × 3, andkernel
20. The sliding step of the its convolution ensure
processing the image
principle is show
inafter
Figure 20. The
convolution sliding
remains step of
consistentthe convolution
with kernel
the original image, is 1. To
the ensure
original that the
image needs imageto be size
in
after Figure 20.
convolution The sliding
remains step
consistentof the convolution kernel is 1. To ensure that the
needs to siz
image
padded with a value of 0. The numberwith the original
of output image, the
image channels original
depends onimage
the number
be after convolution remains consistent
numberwith the original image, the original onimage needs t
ofpadded withkernels
convolution a value inofthe
0. convolution
The of output
layer. image channels depends the num-
berbeofpadded with akernels
convolution value in of the
0. The number of
convolution output image channels depends on the num
layer.
ber of convolution kernels in the convolution layer.
Appl. Sci. 2024, 14, x FOR PEER REVIEW 14 of 24
Appl. Sci. 2024, 14, x
6403
FOR PEER REVIEW 1414of
of 24
24
Figure
Figure 21.
21. ReLU
ReLU Function
Function Graph.
Graph.
Figure 21. ReLU Function Graph.
The
The ReLU
ReLU function
function isis chosen
chosen forforseveral
severalreasons.
reasons.Firstly,
Firstly, ititincreases
increases thethenon-linearity
non-linearity
of
of the
the network,
The ReLU which
network, whichisisessential
function chosenfor
isessential for learning
for learning
several complex
complex
reasons. patterns.
patterns.
Firstly, Secondly,
it increases it improves
Secondly,
the the
it improves
non-linearity
computational
thethe
of network,speed
computational which due
speed todue
its simple
is essential to for mathematical
its learning
simple operation.
mathematical
complex Lastly,
operation.
patterns. Secondly,unlike
Lastly, the sigmoid
unlike
it improves the
function,
computationalReLUspeed
sigmoid function, does not
ReLU suffer
duedoes from
to itsnot the
suffer
simple vanishing
from gradient
the vanishing
mathematical problem.
gradient
operation. Lastly, The
problem.vanishing
unlike gra-
Thesigmoid
the vanish-
ing gradient
dient problem
function, ReLU problem
occurs occurs
when
does not sufferwhen
fromgradients
gradients used
the forused for updating
updating
vanishing gradientneural
problem. neural
network network
Theweights
vanishingweights
dimin-
gra-
diminish,
ish,
dient making making
problem training
training
occurs ineffective.
ineffective.
when ReLU
gradients ReLU avoids
avoids
used for thisthis issue
issue
updating byby
neural allowing
allowing
network gradients
gradients toflow
weightstodimin-flow
through
through
ish, the network
the
making network without
trainingwithout significant
significant
ineffective. diminishment,
ReLU diminishment,
avoids this issue making
making itit particularly
by allowing particularly
gradients suitable
suitable for
for
to flow
large-scale
large-scale
through convolution
theconvolution
network withoutoperations.
operations. This justification
This
significant justification highlights
diminishment, highlights
makingwhy why ReLU is
ReLU
it particularly is preferred
preferred
suitable forin
in
deep learning
deep learning
large-scale applications,
applications,
convolution particularly
particularly
operations. Thisin convolutional
justification neural why
highlights networks
ReLU(CNNs).
is preferred in
deep
(3) learningmethod
(3) Pooling
Pooling applications,
method selection
selectionparticularly in convolutional neural networks (CNNs).
(3) Pooling
Pooling
Pooling ismethod selection method
is aa down-sampling
down-sampling method that
that can
can reduce
reduce the
the image
image size
size and
and help
help prevent
prevent
overfitting. There
PoolingThere
overfitting. are two main
is a down-sampling types of
method
are two main types pooling: max
that canmax
of pooling: pooling
reduce and
the image
pooling average pooling.
size andpooling.
and average help prevent
The
The principle
overfitting. There are
principle of
of max
two pooling
max main is
is shown
types
pooling in
in Figure
of pooling:
shown max22,
Figure where
pooling
22, the
whereand maximum
theaverage
maximum pixel
pixel value
pooling. value
within
The
within theprinciple
neighborhood
of maxispooling
taken as
is the center
shown pixel value.
in Figure 22, where the maximum pixel value
within the neighborhood is taken as the center pixel value.
Appl. Sci. 2024, 14, x FOR PEER REVIEW 15 of 24
Appl.
Appl.Sci. 2024,14,
Sci.2024, 14,6403
x FOR PEER REVIEW 15 of
of 24
24
Appl. Sci. 2024, 14, x FOR PEER REVIEW 15 of 24
The principle of average pooling is shown in Figure 23, where the average pixel value
within
Figure the
Figure22.
22. neighborhood
Max
Max is taken as the center pixel value.
poolingdiagram.
pooling diagram.
FigureThis
22. Max pooling
paper diagram.
introduces the U-Net network to identify the structural information of the
The
The
tunnel principle
principle
face. of
ofaveragethepooling
average
To maximize pooling isisshown
distinction shown in
betweeninFigure
Figure 23,
23,where
structural where the
theaverage
informationaverage pixel
pixelvalue
value
and background
The
within
within theprinciple
the of average
neighborhood
neighborhood is
is pooling
taken
taken as
astheis center
the shown in
pixel
center Figure
pixel 23, where the average pixel value
value.
value.
information, a 2 × 2 max pooling method is used for image down-sampling.
within thepaper
This neighborhood
introduces is the
taken as the
U-Net center to
network pixel value.
identify the structural information of the
This paper introduces the U-Net network to identify the
tunnel face. To maximize the distinction between structural information structural information of the
and background
tunnel face. To maximize the distinction between structural
information, a 2 × 2 max pooling method is used for image down-sampling. information and background
information, a 2 × 2 max pooling method is used for image down-sampling.
Figure 23.Average
Figure23. Averagepooling
poolingdiagram.
diagram.
4.2.3.This paper of
Analysis introduces the U-Net network
U-Net Convolutional NeuraltoNetwork
identify for
the Tunnel
structuralFaceinformation of the
Joint Extraction
tunnel
Figure face. To maximize
23.8580
Average pooling the distinction between structural information and background
diagram.
The sample images were divided into training, validation, and test sets in a 60%,
information, a 2 ×pooling
Figure 23. Average diagram.method is used for image down-sampling.
2 max pooling
20%, and 20% ratio. The preprocessed dataset was input into the U-Net convolutional
4.2.3.
neural Analysis
network of U-Net in Convolutional Neural Network for Tunnel Face Joint Extraction
4.2.3.
4.2.3. Analysis
Analysis ofwritten
of U-Net PyTorch using
U-Net Convolutional
Convolutional Python
Neural
Neural 3.7 forfor
Network
Network training.
for TunnelBy
Tunnel Facecalculating
Face Joint the loss
JointExtraction
Extraction
The 8580 sample images were divided into training, validation,
and accuracy, the network parameters were iteratively updated to minimize the loss and test sets in a 60%,
on
20%, The
The
and8580
8580
20%sample
sample
ratio. images
images
The were
were divided
divided
preprocessed into
into
dataset training,
training,
was validation,
validation,
input into the and
and test
U-Nettest sets
sets in
in a 60%,
convolutional
the validation set. Once the minimum loss value stabilizes, the model converges.
20%,
20%, and
and 20%
20% ratio. The preprocessed dataset was input into the U-Net convolutional
neural Under thisratio.
network written
U-Net The inpreprocessed
PyTorch using
convolutional dataset
neural Python was
network input
3.7structure,intothethe
for training. ByU-Net inconvolutional
calculating
changes thefunc-
the loss loss
neural
neural network
networkthe written
written in PyTorch
in PyTorch using
using Python
Python 3.7 for training. By calculating
calculating the
3.7 for training.toByminimize the loss
loss
and accuracy,
tion and accuracy network
for parameters
the training were
and validationiteratively
sets areupdated
shown in Figures 24 the andloss on
25, re-
and
and accuracy,
accuracy,the thenetwork parameters were iteratively updated to minimize the the
loss loss
on the
the validation
spectively. set. network
Once the parameters
minimum loss were iteratively
value updated
stabilizes, the modelto minimize
converges. on
validation set. OnceOnce
the validation the minimum loss loss
value stabilizes, the model converges.
Under thisset. the minimum
U-Net convolutional neuralvalue
networkstabilizes,
structure,thethe
model converges.
changes in the loss func-
Under
Under
0.18 this U-Net
this U-Net convolutional
convolutional neural
neural network
network structure,
structure, the
the in changes
changes in24 inloss
the thefunc-
loss
tion and accuracy for the training and validation sets are shown Figures and 25, re-
function
tion and and accuracy
accuracy for for
the the training
training and and validation
validation sets sets
are are
shown shownin in Figures
Figures 24 24
and and
25, 25,
re-
spectively.
respectively.
0.16
spectively. Training set
0.18 Validation set
0.18
0.14
0.16 Training set
0.16
0.12 Training setset
Validation
Loss
0.12
0.08
Loss
0.10
0.10
0.06
0.08
0.08
0.04
0.06 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
0.06
Epoch
0.04
Figure
0.04 24. Changes in Loss Values for Training and Validation Sets.
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Epoch
Epoch
Figure 24. Changes in Loss Values for Training and Validation Sets.
Figure24.
Figure 24. Changes
Changes in
in Loss
LossValues
Valuesfor
forTraining
Trainingand
andValidation
ValidationSets.
Sets.
Appl. Sci.
Appl. Sci. 2024,
2024, 14,
14, 6403
x FOR PEER REVIEW 16
16 of 24
of 24
0.80
0.84
0.76
0.80
Accuracy
0.72
0.76
0.68
Accuracy
0.72
0.64
0.68 Training set
Validation set
0.64
0.60 Training set
Validation set
0.60
0.56
0.56
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
0 2 4 6 8 Epoch
10 12 14 16 18 20 22 24 26 28 30
Epoch
Figure 25.
Figure 25. Changes in accuracy
accuracy for
for training
training and
and validation
validation sets.
sets.
Figure 25. Changes in accuracy for training and validation sets.
As
As shown in inFigures
Figures2424and
and25,25,
thethe U-Net
U-Net achieved
achieved an accuracy
an accuracy of 82.2%
of 82.2% on theon the
train-
ing As
setshown
training in Figures
set 82.6%
and and 82.6% 24
onand
on the the25, the U-Net
validation
validation achieved
set, an
ataccuracy
stabilizing
set, stabilizing of 82.2%
at epoch
epoch 29. 29. on the train-
ing setThe
and trained
82.6% onU-Net
the validation set, stabilizing
convolutional neural at epoch
network29. was used to test the test set. A
The trained U-Net convolutional neural network was used to test the test set. A com-
The trained
comparison U-Net
of randomlyconvolutional
selected neural
predictednetwork
images was used
and to test
their the test set. Alabeled
corresponding com- images
parison of randomly selected predicted images and their corresponding
parison of randomly selected predicted images and their corresponding labeled images is
labeled images is
is shown
shown inin Figure
Figure 26.
26.
shown in Figure 26.
(a) Original image (b) Labeled image (c) U-Net predicted image
Figure 26. Comparison of U-Net prediction results.
(a) Original image (b) Labeled image (c) U-Net predicted image
As seen in Figure 26, despite good segmentation, it struggled with ‘rough edges’ and
Figure 26. Comparison of U-Net prediction results.
Figure 26. Comparison of U-Net prediction results.
smaller targets.
Comparing
As seen in Figurethe 26,
predicted groups,
despite good it is evident
segmentation, that when
it struggled the segmented
with ‘rough and target occu-
As seen in Figure 26, despite good segmentation, it struggled withedges’
‘rough edges’ and
smaller
pies targets.
a larger proportion of the total image, the overall segmentation effect is better. How-
smaller targets.
ever, in Comparing
the fourth the predicted
and fifthgroups,
groups,it is evident the that when the segmented target occu- a smaller pro-
Comparing the predicted groups, where
it is evident segmented
that when thetarget
pies a larger proportion of the total image, the overall segmentation effect is better. How-
segmentedoccupies
target occupies
portion
a of
larger the total
proportion image,
of the non-target
total image, images
the overallappear after
segmentation segmentation.
ever, in the fourth and fifth groups, where the segmented target occupies a smaller pro- However,
effect is better.This issue is re-
lated
in to
the the
fourth principle
and fifthof U-Net,
groups, which
where the calculates
segmented classification
target occupies
portion of the total image, non-target images appear after segmentation. This issue is re- loss
a pixel
smaller by pixel. When
proportion
the of
lated the
target total
to the image,
segmentation non-target
principle of U-Net,object images
occupies
which appear
calculates a smallafter segmentation.
portion
classification lossof This
thebyentire
pixel issue is
pixel. image, related to
When the iterative
lossthe
thevalueprinciple
target can of U-Net,
segmentation
easily drop which
object calculates
occupies
very low,amaking classification
small portion loss pixel
of the entire
it difficult for the by
image, pixel.
target theto When
iterative the target
be fully segmented.
segmentation
loss object
value can easily drop occupies
very low,amaking
small portion
it difficultoffor
thetheentire
targetimage, thesegmented.
to be fully iterative loss value
To address the shortcomings of semantic segmentation methods like the U-Net neu-
canTo address
easily dropthevery
shortcomings
low, making of semantic
it difficultsegmentation
for the target methods
to belike thesegmented.
fully U-Net neu-
ralralnetwork,
network, theauthor
the author uses
uses an an instance
instance segmentation
segmentation algorithm algorithm
that combines that combines
object de- object de-
To address the shortcomings of semantic segmentation methods like the U-Net neural
tection
tection and
network,andthesemantic
semantic segmentation,
authorsegmentation,
uses Mask
Masksegmentation
an instance R-CNN, R-CNN,
to extract to extract
joints.
algorithm This joints.
that approach
combines This approach
allows
object allows
detection
forforprecise
precise segmentation
segmentation of of object
object edges edges
based based
on on
bounding bounding
boxes from
and semantic segmentation, Mask R-CNN, to extract joints. This approach allows for precise boxes
object from object
detection, detection,
achieving more
achieving more accurate
accurate segmentation
segmentationresults.results.
segmentation of object edges based on bounding boxes from object detection, achieving
more accurate segmentation results.
4.3. Joint Extraction of Tunnel Face Based on Mask R-CNN Deep Learning Architecture
4.3. Joint Extraction of Tunnel Face Based on Mask R-CNN Deep Learning Architecture
4.3.1. Mask R-CNN Convolutional Neural Network Architecture
4.3. Joint Extraction of Tunnel Face Based on Mask R-CNN Deep Learning Architecture
4.3.1. Mask
The
4.3.1. R-CNN
Mask
Mask R-CNNConvolutional
R-CNN Neural
convolutional neural
Convolutional Neural Network
network Architecture
[26] Architecture
Network was proposed by He et al. in
2017. It adds
The Maskan FCN (Fullyconvolutional
Convolutional Network) structure to the was
Faster-RCNN net-
The MaskR-CNN neural
R-CNN convolutional neural network
network [26] was[26] proposed
proposed by He et by
work, achieving precise segmentation while detecting objects. Its network architecture is
He
al. in et al. in
2017.
2017.
It It
addsadds
an an
FCN FCN
shown in Figure 27.
(Fully
(Fully Convolutional
Convolutional Network)Network)
structure structure
to the to the Faster-RCNN
Faster-RCNN network, net-
work, achieving
achieving precise
precise segmentation
segmentation while detecting
while detecting objects. Itsobjects.
network Itsarchitecture
network architecture
is shown is
in Figure
shown 27. 27.
in Figure
As shown in Figure 27, the Mask R-CNN combines object detection and instance seg-
mentation, using ResNet and FPN for feature extraction and ROIAlign for accurate pool-
ing.
Figure
Figure27.27.
Mask R-CNN
Mask R-CNNnetwork architecture.
network architecture.
4.3.2. Mask R-CNN Convolutional Neural Network Parameter Selection
As As
The shown
input
shown ininFigure
image Figure 27,
size for27, the
this Mask R-CNN
convolutional
the Mask combines
neural
R-CNN networkobject
combines 512 ×detection
is object512. Afterand
detection instance
feature
and seg- seg-
instance
mentation,
map using
generation, ResNet
region and FPN
proposal, and for feature
region extraction
extraction, the and ROIAlign
final output is forimage
an accurate
of pooling.
mentation, using ResNet and FPN for feature extraction and ROIAlign for accurate pool-
size 512 × 512 with a mask overlay, class labels, and target region positions. This study
ing.4.3.2. Mask R-CNN Convolutional Neural Network Parameter Selection
The input image size for this convolutional neural network is 512 × 512. After feature
4.3.2.
map Mask R-CNNregion
generation, Convolutional Neural
proposal, and Network
region Parameter
extraction, the finalSelection
output is an image of
size
The × 512image
512input with asize
maskforoverlay, class labels, and
this convolutional target
neural region is
network positions. This
512 × 512. study
After feature
map generation, region proposal, and region extraction, the final output is an image of
size 512 × 512 with a mask overlay, class labels, and target region positions. This study
Appl. Sci. 2024, 14, 6403 18 of 24
uses ResNet101 + FPN for the backbone and ROIAlign for pooling, enhancing small object
recognition accuracy.
(1) Backbone architecture parameter selection
The backbone architecture of the Mask R-CNN convolutional neural network consists
of ResNet + FPN. The commonly used configurations are ResNet50 + FPN and ResNet101
+ FPN. The network structures of ResNet50 and ResNet101 are compared in Table 1.
Figure
Figure 28. Bilinear
4.3.3. Analysis
28. Bilinear interpolation
ofinterpolation
Mask R-CNN effect.
Convolutional Neural Network for Tunnel Face Joint
effect.
Extraction
4.3.3. Analysis of Mask R-CNN Convolutional Neural Network for Tunnel Face Face Joint
Joint In this experiment, the dataset is the same as in Section 4.1.3. The 8580 sample images
Extraction
Extraction
of size
In 512
this×experiment,
512 are divided
experiment, into
thedataset training
dataset is theand test sets
same ininSection
an 80% to 20% The
ratio.8580
The sample
prepro-
cessedIn this
dataset is input the
into the Mask is the same
R-CNN as inasSection
convolutional 4.1.3. 4.1.3.
Thenetwork
neural 8580 sample for images
training.
images
of size of size
512 ×learning512 × 512
512 are divided are divided
into training into training
and test sets and test sets
in an 80% in an
to 20%of 80%
ratio. to 20% ratio.
The prepro-
The
The initial
preprocessed rate
dataset is set
is input to 0.001,
into R-CNN and the maximum
the Maskconvolutional
R-CNN convolutional number iterations
neural fornetwork (max
for
cessed dataset is input into the Mask neural network training.
training.
The The
initial initial
learning learning rate is set to 0.001,loss
and the
cls maximum number loss
of iterations
bbox (max
epochs) is set to 200. rate
The is set to 0.001,
classification and
loss ( the maximum number
), localization loss (of iterations (max
), segmen-
epochs) is set to 200. The classification loss (losscls ), localization loss (lossbbox ), segmentation
loss (loss loss losscls in Equation (11). lossbbox
epochs)
tation is
loss
mask( ),to
set and total
mask
200. ),Theloss
and are loss
calculated,
classification
total areloss as
( shown
calculated, ), localization
as loss ( (11). ), segmen-
shown in Equation
lossmask ==
cls + loss
tation loss ( loss
loss
), and total lossloss
loss
are calculated, ++
+ lossbbox as loss
loss (11)
cls bbox shownmaskin Equation (11).
mask (11)
When
Whenthetheloss
lossfunction
function loss
reaches=itsloss
reaches minimum + lossvalue
its clsminimum +and
lossand
bboxvalue
stabilizes, the model converges.
(11)
mask stabilizes, the model con-
The changes
verges. in the loss
The changes in functions are shown
the loss functions areinshown
Figurein29.
Figure 29.
When the loss function reaches its minimum value and stabilizes, the model con-
verges.
1.2 The changes in the loss functions are shown in Figure 29.
1.1
1.2 loss_mask
1.0
1.1 loss_cls
0.9 loss_mask
loss_bbox
1.0
0.8 loss_cls
loss_total
0.9 loss_bbox
0.7
loss_total
Loss Loss
0.8
0.6
0.7
0.5
0.6
0.4
0.5
0.3
0.4
0.2
0.3
0.1
0.2
0.0
0.1
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
0.0 Epoch
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
Figure 29.
Figure 29. Changes
Changes in
in loss
loss values.
values. Epoch
Figure
As29.shown
Changes
ininFigure
loss values.
29, the Mask R-CNN achieved stable loss values at epoch 35,
with localization loss lower than classification and segmentation losses. The trained Mask
R-CNN convolutional neural network is then used to test the test set. The comparison of the
five groups of predicted images with their corresponding labeled images, as in Section 4.1.3,
is shown in Figure 30.
4.1.3, is shown in Figure 30.
As shown in Figure 30, after classification, bounding box selection, and mask calcu-
lation, the Mask R-CNN network prediction results achieve good joint segmentation ef-
fects compared to the annotated results of the original images. Additionally, comparing
Appl. Sci. 2024, 14, 6403 the prediction results of the U-Net network shows that the joint segmentation effect is not
20 of 24
affected by the proportion of the segmentation target in the image. Both the overall and
local details are accurately segmented.
fissure1
fissure2
fissure3
fissure4
fissure1
fissure2
fissure1
fissure1
fissure1
fissure1
fissure1
fissure1
fissure1
fissure1 fissure1
fissure2
fissure3
fissure4
fissure5
fissure6
fissure7
fissure1
fissure2
fissure1
fissure4 fissure3
fissure5
fissure1
fissure3
fissure4
fissure2
(a) Original Image (b) Labeled Image (c) Object Detection (d) Object Mask
Figure 30. Comparison of Mask R-CNN prediction results. (The red boxes in subfigure (c) are the
Figure 30. Comparison of Mask R-CNN prediction results. (The red boxes in subfigure (c) are the
identified joints).
identified joints).
As shown in Figure 30, after classification, bounding box selection, and mask calcula-
tion, the Mask R-CNN network prediction results achieve good joint segmentation effects
compared to the annotated results of the original images. Additionally, comparing the
prediction results of the U-Net network shows that the joint segmentation effect is not
affected by the proportion of the segmentation target in the image. Both the overall and
local details are accurately segmented.
4.4. Comparison of Tunnel Face Joint Recognition Effect and Acquisition of Joint
Morphology Parameters
Figure 31 presents the prediction results of five test sample images after traditional
image processing, the U-Net convolutional neural network, and the Mask R-CNN convolu-
tional neural network.
4.4. Comparison of Tunnel Face Joint Recognition Effect and Acquisition of Joint Morphology
Parameters
(a) Original Image (b) Annotated image (c) Image Processing (d) U-Net (e) Mask R-CNN
Figure 31.
Figure 31. Comparison
Comparison of
of prediction results.
prediction results.
From Figure31,
From Figure 31,it is
it evident
is evident
thatthat overall,
overall, all image
all three three image segmentation
segmentation methodsmethods
achieve
achieve certain joint segmentation effects. Specifically, the Mask R-CNN
certain joint segmentation effects. Specifically, the Mask R-CNN convolutional neural convolutional
neural network
network demonstrates
demonstrates the bestthe best segmentation
segmentation results, followed
results, followed by the
by the U-Net U-Net con-
convolutional
volutional
neural neuraland
network, network, and traditional
traditional image processing
image processing shows the shows
least the least effective
effective results. re-
To
sults. Toquantitatively
further further quantitatively
compare thecompare the segmentation
segmentation effectivenesseffectiveness of these
of these three three
methods,
methods, appropriate
appropriate metrics will metrics will befor
be selected selected for subsequent
subsequent comparative comparative
analysis. analysis.
Dice= =2TP/(2TP
Dice 2TP / (2TP + FN
+ FN + +FP)
FP) (12)
(12)
where TP represents true positives, which are predicted positive samples that are indeed
positive; FN represents false negatives, which are predicted negative samples that are
indeed positive; and FP represents false positives, which are predicted positive samples
that are indeed negative.
Appl. Sci. 2024, 14, 6403 22 of 24
(2) Precision
Precision represents the proportion of predicted positive samples that are actually
positive, calculated as shown in Equation (13).
(3) Recall
Recall represents the proportion of actual positive samples that are predicted correctly,
calculated as shown in Equation (14).
Table 2. Comparison of joint segmentation effects. (The bolded portion is the highest value.)
As shown in Table 2, the Dice similarity coefficient, Precision, and Recall of traditional
image processing are 60.59%, 57.31%, and 70.03%, respectively. For the U-Net network, the
Dice similarity coefficient, Precision, and Recall are 75.36%, 70.85%, and 85.58%, respectively.
For the Mask R-CNN network, these values are 87.48%, 89.74%, and 84.73%, respectively.
Among the three metrics, the Dice similarity coefficient most accurately reflects the
true effect of target segmentation. Although the Recall value of U-Net is slightly higher
than that of Mask R-CNN, its Precision is significantly lower, indicating that the U-Net
network’s segmentation results are rougher and contain more non-target information points.
Comprehensive comparison and analysis show that the Mask R-CNN network has the best
segmentation effect on the face joints of the tunnel.
5. Conclusions
Based on the digital images of the tunnel face obtained through sectional shooting,
this paper obtained complete and clear images of the tunnel face through image stitching
and fusion algorithms. Then, the tunnel face joint information was extracted using three
methods: traditional image processing, U-Net convolutional neural network, and Mask
R-CNN convolutional neural network. The extraction effects were compared, and the main
conclusions are as follows:
(1) Using the SURF algorithm and weighted fusion, sectional images of the tunnel face
were stitched into complete, high-clarity images suitable for deep learning algorithms.
(2) Traditional image processing methods, including grayscale processing, spatial filter-
ing, binarization, morphological processing, and noise removal, produced suboptimal
results with a Dice similarity coefficient of 60.59%. These methods are inefficient,
involve significant manual intervention, and lose joint information, making them
unsuitable for tunnel engineering applications.
(3) The U-Net convolutional neural network achieved relatively good segmentation
results with a Dice similarity coefficient of 75.36%. However, it lacked precision and
lost target details, indicating room for improvement.
Appl. Sci. 2024, 14, 6403 23 of 24
(4) The Mask R-CNN model excelled in both overall and detailed segmentation, achieving
a Dice similarity coefficient of 87.48%. This model demonstrated efficient and accurate
extraction of tunnel face joints, outperforming traditional and U-Net methods.
Author Contributions: Data curation, Y.L.; formal analysis, H.Q.; funding acquisition, X.Y.; investiga-
tion, H.Q.; methodology, H.Q.; project administration, X.Y.; resources, Y.L.; software, H.Q. and, Z.L.;
supervision, J.Z.; validation, H.Q.; visualization, Z.G.; writing—original draft, H.Q.; writing—review
& editing, J.Z. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: This study did not require ethical approval.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Ross-Brown, D.M.; Atkinson, K. Terrestrial photogrammetry in open-pits: 1-description and use of the Phototheodolite in mine
surveying. Inst. Min. Metall. 1972, 81, 7–11.
2. Huang, S.L.; Speck, R.C. Digital image processing for rock joint surface studies. Photogramm. Eng. Remote Sens. 1988, 54, 395–400.
3. Krishnan, R.; Sommer, H.J. Estimation of Rock Face Stability; The Pennsylvania State University: University Park, PA, USA, 1994.
4. Fitton, N.; Cox, S. Optimising the application of the Hough transform for automatic feature extraction from geoscientific images.
Comput. Geosci. 1998, 24, 933–951. [CrossRef]
5. Reid, T.R.; Harrison, J.P. A semi-automated methodology for discontinuity trace detection in digital images of rock mass exposures.
Int. J. Rock Mech. Min. Sci. 2000, 37, 1–5. [CrossRef]
6. Holden, E.-J.; Dentith, M.; Kovesi, P. Towards the automated analysis of regional aeromagnetic data to identify regions prospective
for gold deposits. Comput. Geosci. 2008, 34, 1505–1513. [CrossRef]
7. Liu, C.; Wang, B.; Shi, B.; Tang, C. Analytic method of morphological parameters of cracks for rock and soil based on image
processing and recognition. Chin. J. Geotech. Eng. 2008, 30, 1383–1388.
8. Chen, B.; Wang, Y.; Wang, H.; Zhu, C.; Fu, J. Identification of tunnel surrounding rock joint and fracture based on SLIC super
pixel segmentation and combination. J. Highw. Transp. Res. Dev. 2022, 39, 139–146.
9. Jung, S.Y.; Lee, S.K.; Park, C.I.; Cho, S.Y.; Yu, J.H. A method for detecting concrete cracks using deep-learning and image
processing. J. Archit. Inst. Korea Struct. Constr. 2019, 35, 163–170.
10. Bhowmick, S.; Nagarajaiah, S.; Veeraraghavan, A. Vision and deep learning-based algorithms to detect and quantify cracks on
concrete surfaces from UAV videos. Sensors 2020, 20, 6299. [CrossRef] [PubMed]
11. Yu, Y.; Rashidi, M.; Samali, B.; Yousefi, A.M.; Wang, W. Multi-image-feature-based hierarchical concrete crack identification
framework using optimized SVM multi-classifiers and D-S fusion algorithm for bridge structures. Remote Sens. 2021, 13, 240.
[CrossRef]
12. Zhao, S.; Zhang, D.; Xue, Y.; Zhou, M.; Huang, H. A deep learning-based approach for refined crack evaluation from shield
tunnel lining images. Autom. Constr. 2021, 132, 103934. [CrossRef]
13. Dang, L.M.; Wang, H.; Li, Y.; Park, Y.; Oh, C.; Nguyen, T.N.; Moon, H. Automatic tunnel lining crack evaluation and measurement
using deep learning. Tunn. Undergr. Space Technol. 2022, 124, 104472. [CrossRef]
14. Zhou, Z.; Zhang, J.; Gong, C. Hybrid semantic segmentation for tunnel lining cracks based on Swin Transformer and convolutional
neural network. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38, 2491–2510. [CrossRef]
15. Song, F.; Liu, B.; Yuan, G.X. Pixel-level crack identification for bridge concrete structures using unmanned aerial vehicle
photography and deep learning. Struct. Control. Health Monit. 2024, 2024, 1299095. [CrossRef]
16. Wang, F.; Chen, T.; Gai, M. A dual-tree-complex wavelet transform-based infrared and visible image fusion technique and its
application in tunnel crack detection. Appl. Sci. 2024, 14, 114. [CrossRef]
17. Liu, H.X.; Li, W.S.; Zha, Z.Y.; Jiang, W.J.; Xu, T. Method for surrounding rock mass classification of highway tunnels based on
deep learning technology. Chin. J. Geotech. Eng. 2018, 40, 1809–1817.
18. Chen, J.; Zhou, M.; Huang, H.; Zhang, D.; Peng, Z. Automated extraction and evaluation of fracture trace maps from rock tunnel
face images via deep learning. Int. J. Rock Mech. Min. Sci. 2021, 142, 104745. [CrossRef]
19. Lee, Y.-K.; Kim, J.; Choi, C.-S.; Song, J.-J. Semi-automatic calculation of joint trace length from digital images based on deep
learning and data structuring techniques. Int. J. Rock Mech. Min. Sci. 2022, 149, 104981. [CrossRef]
20. Peng, L.; Wang, H.; Zhou, C.; Hu, F.; Tian, X.; Hongtai, Z. Research on intelligent detection and segmentation of rock joints based
on deep learning. Adv. Civ. Eng. 2024, 2024, 8810092. [CrossRef]
Appl. Sci. 2024, 14, 6403 24 of 24
21. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the
18th International Conference, Munich, Germany, 5–9 October 2015.
22. Li, G.; Ma, B.; He, S.; Ren, X.; Liu, Q. Automatic tunnel crack detection based on U-Net and a convolutional neural network with
alternately updated clique. Sensors 2020, 20, 717. [CrossRef]
23. Chang, H.; Rao, Z.; Zhao, Y.; Li, Y. Research on tunnel crack segmentation algorithm based on improved U-Net network. Comput.
Eng. Appl. 2021, 57, 215–222.
24. Zhao, S.; Zhang, G.; Zhang, D.; Tan, D.; Huang, H. A hybrid attention deep learning network for refined segmentation of cracks
from shield tunnel lining images. J. Rock Mech. Geotech. Eng. 2023, 15, 3105–3117. [CrossRef]
25. Shi, Y.; Ballesio, M.; Johansen, K.; Trentman, D.; Huang, Y.; McCabe, M.F.; Bruhn, R.; Schuster, G. Semi-universal geo-crack
detection by machine learning. Front. Earth Sci. 2023, 11, 1073211. [CrossRef]
26. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 16th IEEE International Conference on Computer
Vision (ICCV), Venice, Italy, 22–29 October 2017.
27. Lin, Z.; Ji, K.F.; Leng, X.G.; Kuang, G. Squeeze and excitation rank faster R-CNN for ship detection in SAR images. IEEE Geosci.
Remote Sens. Lett. 2019, 16, 751–755. [CrossRef]
28. Yu, Y.; Zhang, K.L.; Yang, L.; Zhang, D. Fruit detection for strawberry harvesting robot in non-structural environment based on
Mask-RCNN. Comput. Electron. Agric. 2019, 163, 104846. [CrossRef]
29. Jia, W.; Tian, Y.; Luo, R.; Zhang, Z.; Lian, J.; Zheng, Y. Detection and segmentation of overlapped fruits based on optimized mask
R-CNN application in apple harvesting robot. Comput. Electron. Agric. 2020, 172, 105380. [CrossRef]
30. Hao, Z.; Lin, L.; Post, C.J.; Mikhailova, E.A.; Li, M.; Chen, Y.; Yu, K.; Liu, J. Automated tree-crown and height detection in a young
forest plantation using mask region-based convolutional neural network (Mask R-CNN). ISPRS J. Photogramm. Remote Sens. 2021,
178, 112–123. [CrossRef]
31. Xu, X.Y.; Zhao, M.; Shi, P.X.; Ren, R.; He, X.; Wei, X.; Yang, H. Crack detection and comparison study based on faster R-CNN and
mask R-CNN. Sensors 2022, 22, 1215. [CrossRef] [PubMed]
32. Qin, J.; Zhang, Y.; Zhou, H.; Yu, F.; Sun, B.; Wang, Q. Protein crystal instance segmentation based on Mask R-CNN. Crystals 2021,
11, 157. [CrossRef]
33. Bay, H.; Tuytelaars, T.; van Gool, L. SURF: Speeded up Robust Features; Springer: Berlin/Heidelberg, Germany, 2006.
34. Otsu, N. Threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [CrossRef]
35. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In
Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28
June 2014.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.