0% found this document useful (0 votes)
70 views25 pages

Road Vec Net

a new approach for simultaneous road network segmentation and vectorization from aerial and google earth imagery in a complex urban set up

Uploaded by

Yue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views25 pages

Road Vec Net

a new approach for simultaneous road network segmentation and vectorization from aerial and google earth imagery in a complex urban set up

Uploaded by

Yue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

GIScience & Remote Sensing

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tgrs20

RoadVecNet: a new approach for simultaneous


road network segmentation and vectorization
from aerial and google earth imagery in a complex
urban set-up

Abolfazl Abdollahi, Biswajeet Pradhan & Abdullah Alamri

To cite this article: Abolfazl Abdollahi, Biswajeet Pradhan & Abdullah Alamri (2021) RoadVecNet:
a new approach for simultaneous road network segmentation and vectorization from aerial and
google earth imagery in a complex urban set-up, GIScience & Remote Sensing, 58:7, 1151-1174,
DOI: 10.1080/15481603.2021.1972713

To link to this article: https://doi.org/10.1080/15481603.2021.1972713

Published online: 30 Aug 2021.

Submit your article to this journal

Article views: 1696

View related articles

View Crossmark data

Citing articles: 8 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=tgrs20
GISCIENCE & REMOTE SENSING
2021, VOL. 58, NO. 7, 1151–1174
https://doi.org/10.1080/15481603.2021.1972713

RoadVecNet: a new approach for simultaneous road network segmentation and


vectorization from aerial and google earth imagery in a complex urban set-up
Abolfazl Abdollahia, Biswajeet Pradhan a,b
and Abdullah Alamric
a
University of Technology Sydney (UTS), CAMGIS, Sydney, Australia; bEarth Observation Center, Institute of Climate Change, Universiti
Kebangsaan Malaysia, Bangi, Sekangor, Malaysia; cDepartment of Geology & Geophysics, College of Science, King Saud University, Riyadh,
Saudi Arabia

ABSTRACT ARTICLE HISTORY


In this study, we present a new automatic deep learning-based network named Road Vectorization Received 22 April 2021
Network (RoadVecNet), which comprises interlinked UNet networks to simultaneously perform Accepted 20 August 2021
road segmentation and road vectorization. Particularly, RoadVecNet contains two UNet networks. KEYWORDS
The first network with powerful representation capability can obtain more coherent and satisfac­ Deep learning; RoadVecNet;
tory road segmentation maps even under a complex urban set-up. The second network is linked to remote sensing; road
the first network to vectorize road networks by utilizing all of the previously generated feature segmentation; GIS; road
maps. We utilize a loss function called focal loss weighted by median frequency balancing vectorization
(MFB_FL) to focus on the hard samples, fix the training data imbalance problem, and improve
the road extraction and vectorization performance. A new module named dense dilated spatial
pyramid pooling, which combines the benefit of cascaded modules with atrous convolution and
atrous spatial pyramid pooling, is designed to produce more scale features over a broader range.
Two types of high-resolution remote sensing datasets, namely, aerial and Google Earth imagery,
were used for road segmentation and road vectorization tasks. Classification results indicate that
the RoadVecNet outperforms the state-of-the-art deep learning-based networks with 92.51% and
93.40% F1 score for road surface segmentation and 89.24% and 92.41% F1 score for road
vectorization from the aerial and Google Earth road datasets, respectively. In addition, the pro­
posed method outperforms the other comparative methods in terms of qualitative results and
produces high-resolution road segmentation and vectorization maps. As a conclusion, the pre­
sented method demonstrates that considering topological quality may result in improvement of
the final road network, which is essential in various applications, such as GIS database updating.

CONTACT Biswajeet Pradhan [email protected];[email protected] [email protected]


© 2021 Informa UK Limited, trading as Taylor & Francis Group
1152 A. ABDOLLAHI ET AL.

1. Introduction Couloigner 2004). Road characteristic theories rely


on their own features, such as context and spectral
Automatic extraction of road networks from high-
characteristics, for road extraction. The multi-source
resolution remote sensing imagery (HRSI) has been
data fusion approach uses the existing road data­
an active research topic in the remote sensing field
bases, such as vector maps, to assist the extraction
(Zhang et al. 2018). The urban road network is one of
of roads. The efficiency of such methods is not ideal,
the major components of a city that plays a significant
and they have complex designs.
role in its development and expansion. Non-spatial
2) The object-level group consists of regional sta­
attributes can be integrated into the spatial informa­
tistics (Yi et al. 2010) and multi-resolution analysis
tion with the aid of vectorized roads, and they can
(Shen, Luo, and Gao 2010). In the regional statistics
then be used to efficiently model the traffic informa­
approach, the image is initially segmented into
tion and assist traffic management (Hong et al. 2018).
objects, and a “word-theme” method is then built to
However, urban road traffic networks and digital
extract roads. The multi-resolution analytical
maps are incredibly tedious and time-consuming to
approach combines a single image at various scales
produce and update via manual digitization of HRSI
or resolutions of remote sensing imagery. The initial
because the task has long production cycles and
segmentation of images by the region-based models
involves large workloads. Thus, up-to-date traffic
leads to the “adhesion” phenomenon (Maboudi et al.
maps are difficult to maintain (Hong et al. 2018).
2017).
Automatically extracting roads from HRSI and obtain­
3) The feature level group contains the edge and
ing road information with the aid of advanced image
parallel lines (Unsalan and Sirmacek 2012), the filter
processing techniques, artificial intelligence, and
approach (Chaudhuri, Kushwaha, and Samal 2012),
machine learning tools is an efficient and economical
and template matching (Miao et al. 2012). The filter
approach. The road information can be utilized in
approach utilizes a special filter to enhance road pix­
various Geospatial Information System applications,
els and extract road. However, this approach achieves
such as vehicle navigation (Luo et al. 2019), vector
low extraction accuracy in the complex areas and
map database updating (Hong et al. 2018), citizen
leads to the “salt and pepper” phenomenon. The
tourism planning, and image registration (Abdollahi
edge and parallel lines utilize the fact that road bor­
and Pradhan 2021). However, an accurate road extrac­
ders are generally parallel lines. In Unsalan and
tion from the HRSI is always a challenging task. This
Sirmacek (2012), the initial road edge was extracted,
difficulty is because of the existence of complicated
and graph theory and Binary Ballon (BB) method were
features in the HRSI, such as trees, building roofs,
used to extract roads. The template matching
shadows, cars, road marking lines, which result in
approach utilizes a specific template or seed pixels
low geometric precision when extracting road infor­
to form roads initially and then extract them.
mation outcomes (Gao et al. 2018). The HRSI has a
Recently, deep learning models have been widely
large number of mixed pixels, obscuring the borders
utilized in the remote sensing field, such as object
between other objects and roads. Accordingly, incor­
detection (Ševo and Avramović 2016), image retrieval
rect boundary information can be easily produced by
(Zhou et al. 2017), and image classification (Zhong et
urban roads with rich spectral information in the
al. 2017). Then, these models have been adopted for
image data (Hormese and Saravanan 2016). Several
extracting roads from HRSI by embedding many
recent works have been suggested in the literature to
multi-level and high-level information to reduce
address these challenging issues on road extraction
false predictions, unlike the traditional methods that
(Kaur and Singh 2015); however, they are far from
only utilize low-level information for road extraction
ideal.
(Abdollahi, Pradhan et al. 2020). Mnih and Hinton
Traditional studies on this topic are divided into
(2010) applied a deep belief network for road extrac­
three categories: knowledge, object, and feature
tion from airborne imagery. Sarhan, Khalifa, and Nabil
levels (Hong et al. 2018).
(2011) proposed a convolutional neural network
1) The knowledge level group includes road char­
(CNN), which takes full advantage of the geometric
acteristics (Movaghati, Moghaddamjoo, and Tavakoli
and spectral road characteristics, and achieved an
2010) and multi-source data fusion (Qiaoping and
overall accuracy (OA) of 92.50% for road extraction
GISCIENCE & REMOTE SENSING 1153

from the IKONOS satellite imagery. Saito and Aoki the proposed model could not achieve a single-pixel
(2015) presented a CNN model and achieved 88.66% wide for some road centerlines. Li et al. (2019) used a
accuracy for road extraction from Massachusetts aer­ Y-Net network for road extraction from the Jilin-1
ial imagery. Zhao, Du, and Emery (2017) introduced an satellite imagery and public Massachusetts dataset.
object-based deep learning model with 89.59% OA The proposed model comprises two feature extrac­
for extracting road from Worldview-2 imagery. Li et al. tion and fusion modules. They applied the feature
(2016) applied a CNN model to extract roads from extraction module that contains downsampling to
Pleiades-1A and GeoEye images; then they used a upsampling to extract features in detail and applied
post-processing step to smoothen the results and a fusion module to mix all features for road segmen­
obtain the road centerline and achieved 80.59% accu­ tation. The presented model achieved 67.75% accu­
racy. Although certain outcomes have been achieved racy for mean region intersection over union (mean
in road extraction by using CNN, many errors still IU), while it requires more time for training and did
persist. For instance, the proposed approaches were not exhibit good results for narrow road sections
not efficient in accurately detecting roads in complex when the image has a small number of road pixels.
areas, and the extracted roads still have imperfect In another study by Xu et al. (2018), the roads were
fragments and patches (Xie et al. 2019, Zhao, Shi et extracted from WorldView-2 imagery by using a
al. 2017; Sarhan, Khalifa, and Nabil 2011). guided filter and a deep residual network (Res-UNet)
Fully CNNs (FCNN) can extract high-level features . The experimental outcomes demonstrated that the
with more abstract semantic information (Hu et al. model obtained 92.77% F1 score accuracy, however, it
2015). Zhang et al. (2018) combined the UNet model did not perform well for road detection in areas with
with residual learning to extract a road area from the other objects with a similar spatial distribution and
Massachusetts road dataset. The presented technique spectral values as road class. Yang et al. (2019) applied
achieved 91.87% accuracy for precision metric, how­ recurrent convolution neural network UNet (RCNN-
ever, it was insufficient for road detection in sections UNet) to detect roads and extract road centerlines.
where road networks are covered by trees and park­ They used Google Earth imagery and Roadtracer data­
ing lots. Cheng et al. (2017) proposed a cascaded end- set to test their model. The proposed model was a
to-end network that contains two networks to simul­ supervised multitask learning network for road seg­
taneously detect the road surface and centerline from mentation and road centerline extraction and
the Google Earth imagery. They obtained 88.84% for obtained 81.74% for completeness metric.
quality measure, while they figured out that the In the above literature review, although a numer­
method could not detect roads for the large areas of ous number of approaches have been applied for
obstructions. The proposed approach could not road class identification and road centerline extrac­
obtain precise information about road width. In tion, they have some shortcomings. Specifically, the
Zhong et al. (2016), building and road features were roads in complex areas are covered by obstructions,
simultaneously extracted using FCNN from the such as cars, shadows, and trees, or the existing
Massachusetts aerial images with 68% F1 accuracy. approaches in heterogeneous areas cannot efficiently
Buslaev et al. (2018) presented a deep learning model detect the road part. The existing approaches for road
based on vanilla UNet and ResNet-34 to detect road centerline extraction could not achieve accurate infor­
class from DigitalGlobe’s satellite imagery with 0.5 m mation about road width and location. In this study,
per pixel spatial resolution. Although a loss function we present a new deep learning model called
based on Intersection Over Union (IOU) and binary RoadVecNet to simultaneously extract the road sur­
cross entropy (BCE) were also introduced for perfor­ face and then vectorize the road network. In the
mance improvement, the model could not achieve extraction part, we want to deal with the road seg­
high accuracy for IOU (64%) in road extraction. Liu et mentation issues and detect consistent road parts. We
al. (2019) extracted road centerline from the also want to vectorize the road network by determin­
Massachusetts and EPFL datasets based on a CNN ing and extracting the road vector rather than the
model, edge-preserving filtering, shape feature and road centerline to obtain accurate information about
morphological filtering, and Gabor filters. They the road network’s width and location. The proposed
obtained 89% accuracy for quality metric; however, approach comprised two convolutional UNet
1154 A. ABDOLLAHI ET AL.

networks that are interlinked into one architecture. The rest of the manuscript is organized as follows.
The initial framework is used to identify road surfaces, The details of the suggested RoadVecNet framework
while the second framework is utilized to vectorize for road surface segmentation and vectorization are
roads to achieve road location and width information. presented in Section 2. The detailed explanations of
In the proposed model, we used two encoders, two the datasets are depicted in Section 3, and the eva­
decoders, and two novel modules, namely, dense luation metrics and experimental results are high­
dilated spatial pyramid pooling (DDSPP) (Yang et al. lighted in Section 4. The detailed quantitative
2018) and squeeze-and-excite (SE) (Hu, Shen, and Sun comparisons of the suggested network with the
2018). The DDSPP module is used to achieve a bigger other comparative models are presented in Section
receptive field and create feature pyramids with a 5. Finally, the conclusion and main findings are
more denser scale variability. The SE module is explained in Section 6.
employed to consider the interdependencies
between feature channels and extract more valuable
2. Methodology
information. We also used a loss function named focal
loss weighted by the median frequency balancing This study implemented an interlinked UNet networks
(MFB_FL) to overcome highly unbalanced datasets called RoadVecNet for simultaneous road surface seg­
where positive cases are rare. MFB_FL lessens the mentation and vectorization from HRSI. The main
burden of simple samples, allowing more time to be steps for applying the suggested method listed are
spent on difficult samples, and improves the road as follows: (i) dataset preparation was performed to
extraction and road vectorization results. conduct the testing, training, and validation imagery
Accordingly, we can achieve constant road surface for road surface segmentation and vectorization; (ii)
identification outcomes and complete and smoothen the presented framework was then trained and vali­
road vectorization results with accurate information dated based on the training and validation images;
of road width and location even under obstructions of (iii) the trained framework was then applied to the
shadows, trees, and complicated environments com­ test images to produce road surface and vectorized
pared with other comparative deep learning-based road maps; (iv) the performance of the presented
techniques. The significant contributions of the sug­ framework was evaluated on the basis of the evalua­
gested technique are explained as follows: 1) a new tion metrics, and the results were compared with
RoadVecNet that contains interlinked UNet networks some preexisting deep learning methods.
is introduced to bridge two subtasks of road surface
segmentation and road vectorization together. To the
2.1. RoadVecNet architecture
best of the authors’ knowledge, this work is the first to
apply the proposed cascaded model for the given An overview of the suggested RoadVecNet framework
task. 2) Road vectorization is formulated as binary is shown in Figure 1. The proposed network com­
classification issues (i.e. non-edge and edge) by prises the road surface segmentation and road vec­
using the convolutional network. Next, the Sobel torization networks (Figure 1). Each UNet model
approach is used to achieve a smooth and complete includes a contracting encoder arm where the resolu­
vectorized road. 3) Two challenging large size road tion decreases, and the feature depth increases and
datasets, namely, Ottawa and Massachusetts, are used an expanding decoder arm where the resolution
to test the proposed method. 4) More constant road increases, and the feature depth decreases. We uti­
surface segmentation and smooth road vectorization lized filters sizes of 32, 64, 128, and 256 to consider
results can be achieved by the proposed model even the number of feature maps in encoder–decoder. The
under complex backgrounds compared with the skip connections characteristic of the U-Net frame­
other existing methods when some modules, such work (Ronneberger, Fischer, and Brox 2015) connect
as DDSPP, SE, and MFB_FL loss, and encoder and each upsampled feature map at the decoder arm to
decoder layers are used in the framework. The experi­ the encoder’s arm with an identical spatial resolution.
mental results prove the overall geometric quality of Accordingly, the probability map that indicates the
the road segmentation and vectorization with accu­ likelihood of every road and non-road pixel is
rate road location and width information. obtained with the sigmoid classifier.
GISCIENCE & REMOTE SENSING 1155

Figure 1. Flowchart of the RoadVecNet framework containing (a) road surface segmentation and (b) road vectorization UNet networks.

Figure 2. DDSPP structure. Each dilated convolutional layer’s output is concatenated (C) with the input feature map and then fed to
the subsequent dilated layer.

1) Road surface segmentation architecture: The Then, the proper skip connections of the encoder
detailed configuration of this network is shown in feature maps to the output feature maps are conca­
Figure 2(a). This network was first applied to detect tenated. Thereafter, two 3 � 3 convolutional layers
the road surface, which is categorized into two: road were applied, followed by batch normalization (BN)
and background categories. In this network, pre- and Rectified Linear Unit (ReLU) function. The distri­
trained VGG-19 (Simonyan and Zisserman 2014) was bution of activations varies in the intermediate layers
used as an encoder because VGG-19 can be easily during the training step, which is a problem. This issue
transferred to another task, given that it has formerly slows down the training phase because every layer in
learned features from ImageNet. The key advantages every training phase must learn to adjust to a new
of adopting the VGG-19 network are as follows: (1) its distribution. Thus, BN (Ioffe and Szegedy 2015), which
design is identical to UNet, making it easier to com­ standardizes the inputs to a layer in the network by
bine with UNet, and (2) it will allow much deeper subtracting the batch mean and dividing by the batch
networks to produce superior output segmentation standard deviation, is used to improve the stability of
and vectorization results. We also used the DDSPP a neural network. The speed of a neural network’s
module to extract high-resolution feature maps and training process can be accelerated by BN (Ioffe and
capture contextual information within the architec­ Szegedy 2015). Furthermore, the model’s perfor­
ture and the SE module to pass more relevant data mance is improved in some cases due to the modest
and reduce redundant ones. Every block in the deco­ regularization influence. Subsequently, the SE module
der part implements a 2 � 2 bilinear upsampling on was used, and the mask was generated by applying a
the input features to double the dimension of the convolutional layer with the sigmoid function. In
input feature maps. This avoids artifacts and the use remote sensing imagery, the road samples face the
of slow deconvolution layer and hence decreases the class imbalance issue because of the skewed dispen­
number of learning parameters, which it also contri­ sation of ground objects (Abdollahi, Pradhan, and
butes to a faster total training and inference time. Alamri 2020). The cross-entropy loss does not
1156 A. ABDOLLAHI ET AL.

adequately account for the imbalanced classes connections, and sigmoid layer; however, it is much
because it is calculated by summing up all of the smaller than the road surface segmentation model. A
pixels. A typical approach for considering the imbal­ relatively small architecture was chosen for this part
anced classes is to use a weighting factor (Eigen and for the following reasons. First, the training network
Fergus 2012). The class loss is weighted using median has fewer positive pixels (vectorized road pixels) com­
frequency balancing by the ratio of the training set’s pared with the road segmentation framework. Thus,
median class frequency and the real class frequency applying a relatively deep network may cause over­
(Eigen and Fergus 2012). The presentation of a fitting. In addition, the feature maps generated by the
weighting factor between the simple and the hard final convolutional layer in the decode arm of the
samples is the same; however, it balances the value of road segmentation framework have fewer complex
positive and negative samples. Therefore, the focal backgrounds compared with the original image. A
loss function was implemented by Lin et al. (2017) to relatively small architecture is sufficient to deal with
lessen the burden of simple samples, allowing them the vectorization task. In Figure 2, the inputs of the
to focus more on the hard samples. We used the focal vectorization model are the feature maps generated
loss weighted by the median frequency balancing by the final convolutional layer of the decoder arm in
(MFB_FL) to address the imbalance issue of the train­ the road segmentation model. In every encoder block,
ing data and train the road surface segmentation two 3 � 3 convolutional layers were implemented,
network that is denoted as follows: followed by batch normalization and ReLU.
Thereafter, the SE block is used to enhance the feature
MFB FLseg ðg; f ðoÞ; δ1 Þ ¼ αð1 lc ðIij ÞÞγ :BCEseg (1)
map’s quality. Then, a2 � 2 max-pooling layer with
where stride 2 was applied to decrease the spatial dimension
S X
P X
C
of the feature maps. All the components in the deco­
X
BCEseg ¼ wc � ðgji ¼ CÞ log lc ðIij Þ (2) der arm are comparable to those of the decoder arm
i¼1 j¼1 c¼1 of the road segmentation network. To train the road
vectorization model, its MFB_FL is denoted as follows:
medianðmc jc 2 CÞ
wc ¼ MFB FLvec ðy; hðIÞ; δ2 Þ ¼ αð1 lc ðf ðIij ÞÞÞγ :BCEvec (4)
mc
, where
wheremedianðmc Þ is the median value of every mc , mc
S X
X P X
C
is the modulation of pixels in class c, wc is the class BCEvec ¼ wc � ðyij ¼ CÞ log lc ðf ðIij ÞÞ (5)
weight, f ðIij Þ is the output of the final convolutional i¼1 j¼1 c¼1

layer at pixel Iij , gji


is the surface ground truth label, is Iij where yij is the vectorized ground truth label, hðf ðIij ÞÞ
the jth pixel in the ith patch, C is the amount of is the output of the final convolutional layer in the
classes, P is the amount of pixels in every patch, S is
road vectorization network, f ðIij Þ is the output of the
the batch size, δ1 denotes the road segmentation
final convolutional layer at pixel Iij in the road seg­
model parameters, and lc ðIij Þ is defined as the road
mentation model, C is the amount of classes, P is the
surface likelihood of pixel Iij amount of pixels in every patch, S is the batch size, δ2
expðfc ðIij ÞÞ denotes the road vectorization network parameters,
lc ðIij Þ ¼ : (3)
c
P and lc ðf ðIij ÞÞ is denoted as the vectorized road like­
expðfl ðIij ÞÞ
l¼1 lihood of pixel Iij .
2) Road vectorization architecture: The detailed con­ expðhc ðf ðIij ÞÞÞ
figuration of this network is shown in Figure 2(b). This lc ðf ðIij ÞÞ ¼ c : (6)
P
network was then implemented to vectorize roads expðh1 ðf ðIij ÞÞÞ
l¼1
and extract the accurate width and location of the
road network. The architecture has a similar architec­ We employed an end-to-end strategy to concurrently
ture as the road surface segmentation architecture train the proposed road segmentation network and
that has a contracting arm, expanding arm, skip road vectorization network and utilized a distinct
GISCIENCE & REMOTE SENSING 1157

training dataset for every subtask. Moreover, we used where H � W is the size of this channel, Xfup ði; jÞ is a
the main RGB (red, green, and blue) images and the spatial location of the f th channel, and Fsq is the spatial
corresponding ground truth surface images for the squeeze module. The second operation is excitation,
road surface segmentation task and the main images which takes the global information produced in the
and its corresponding ground truth vectorized squeeze stage. This operation includes two fully con­
images for the road vectorization task. Finally, the nected (FC) layers. The pooled vector is first encoded
overall loss function in RoadVecNet, which is a com­ and then decoded to shape 1 � 1 � Fr and 1 � 1 � F,
bination of losses (1) and (3), can be expressed as respectively, to generate an excitation vector as
follows: F
s ¼ Fex ðz; WÞ ¼ σðW2 <ðW1 zÞÞ, where W1 2 R r �F
F

MFB FLðδ1 þ δ2 Þ ¼ MFB FLseg ðg; f ðoÞ; δ1 Þ denotes the parameters of the initial FC layer RF� r , r
is the reduction ratio, < is ReLU, and σ denotes the
þMFB FLvec ðy; hðIÞ; δ2 Þ ¼ (7)
sigmoid function. The output of the SE block is gen­
αð1 lc ðIij ÞÞγ :BCEseg þ αð1 lc ðf ðIij ÞÞÞγ :BCEvec
erated as X~fup ¼ Fscale ðXfup ; zc Þ ¼ sc Xfup , where X~dup ¼

where the last convolutional layer’s output in the ½X~ up ; X~ up ; . . . ; X~ up is a channel-wise multiplication
1 2 F

road vectorization network is hðf ð�ÞÞ, and the last between the channel attention, sc is the scale factor,
convolutional layer’s output in the road segmenta­ and Fscale is the input feature map.
tion model is f ð�Þ. The focal loss is parameterized by
γ and α, and it controls the degree of downweight­
2.3. DDSPP module
ing of easy examples and the class weights, respec­
tively. The FL simplifies to BCE when γ=0. In this In this work, the DDSPP module was performed on
work, we set the values for γ ¼ 2 and α ¼ 0:25 the feature maps generated by the encoder arms
because the degree of concentrating on hard and to elicit further multi-scale contextual information
easy samples can be increased by higher values of γ and produce a greater number of scale features
and lower values of α. over a broader range. Atrous spatial pyramid pool­
ing (ASPP) was first utilized in DeepLab (Chen et al.
2017) to enhance the suggested networks’ perfor­
2.2. SE module mance. ASPP is a mixture of spatial pyramid pool­
The SE module (Hu, Shen, and Sun 2018) was used to ing and atrous convolution with various atrous
improve the model’s representation power by a con­ rates. This tool is effective in adjusting the recep­
text gating mechanism and attain a clear relationship tive field to catch multi-scale information and in
between the convolutional layer channels. The mod­ controlling the resolution of the features computed
ule encodes feature maps by allocating a weight for by deep learning networks. In particular, ASPP
every channel in the feature map. The SE module includes (a) an image-level feature that is gener­
includes two major parts, called squeeze and excita­ ated by global average pooling and (b) one con­
tion. The first operation is squeeze. The input feature volution with a 1 � 1 filter size and four parallel
maps to SE block are accumulated to generate a convolutions of a 3 � 3 filter size with different
channel descriptor by applying global average pool­ rates of 2, 4, 8, and 12, as illustrated in Figure 2.
ing (GAP) of the entire context of channels. We have Then, bilinear upsampling was applied to upsam­
ple the outcoming features from the entire
Xdup ¼ ½X1up ; X2up ; . . . ; XFup �, in which the input data to
branches to the input size and concatenated and
the SE module are Xfup 2 RW�H , and the spatial
underwent another convolution with 1 � 1.
squeeze is calculated as follows:
However, we used a new module named DDSPP
H X
W (Yang et al. 2018), which combines the benefit of
1 X
zf ¼ Fsq ðXfup Þ ¼ Xfup ði; jÞ (8) cascaded modules with atrous convolution and
H�W i j
ASPP to produce more scale features over a
1158 A. ABDOLLAHI ET AL.

broader range and exploit further multi-scale con­ achieved by the proposed approach and other com­
textual features. The receptive field for atrous con­ parative techniques for road surface segmentation
volution can be defined as follows: and road vectorization tasks are described.
F ¼ ½ðK 1ÞðR 1Þ þ K� � ½ðK 1ÞðR 1Þ þ K� (9)

where R is the rate, and k is the convolution kernel 3.1. Experimental setting
size. For example, when R= 2 and K= 3, the F is then We utilized some data augmentation strategies, such
equal to 5 × 5. However, we can have a bigger recep­ as flipping the images vertically and horizontally as
tive field and can create feature pyramids with a more well as rotating them 90°, 180°, and 270° to expand the
denser scale variability by using dense connections size of our training and validation sets and train a
between stacked dilated layers. Assuming that we proper model. Moreover, to dominate the overfitting
have two convolutional operations with K1 and K2 difficulty, we appended a dropout of 0.5 (Srivastava et
kernel sizes, the receptive field can be defined as al. 2014) to the deeper convolutional layers of the
follows: road segmentation network and road vectorization
F ¼ ðK1 þ K2 1Þ � ðK1 þ K2 1Þ (10) network. A computationally affordable yet strong reg­
ularization to the model can be provided using this
The new receptive field size will result in 13 × 13 when strategy. Adaptive moment estimation (Adam) opti­
the rates are 2 and 4. mizer with 0.001 learning rate was also utilized in this
work to learn the model parameters, such as weights
and biases via optimizing the loss function. The pre­
2.4. Inference stage sented RoadVecNet was trained with batch size 2
The road surface segmentation and road vectoriza­ from scratch except the backbone network that we
tion can be concurrently implemented through the used as the pretrained one. The trained network was
proposed RoadVecNet in the inference stage (Figure then implemented on the test data for road surface
2). A probability road map was achieved by using segmentation and road vectorization. We implemen­
the road segmentation network. Then, the road ted the optimization of the networks for 100 epochs
vectorization network transformed the features through the datasets until no more performance
maps of the final convolutional layer generated by improvements were seen. We applied the suggested
using a road segmentation model into vector-based network for road surface segmentation and road vec­
possibility maps in the inference stage. Finally, the torization on a GPU Nvidia Quadro RTX 6000 with a
Sobel algorithm was applied to achieve a complete memory of 24 GB and a computing capability of 7.5
and smooth road vectorization network with pre­ under Keras framework with Tensorflow backend.
cise road width information (Vincent and Folorunso
2009). The Sobel algorithm is an instance of the
gradient approach. In the gradient method, the 3.2. Dataset descriptions
edges are detected by looking for the minimum Two types of remote sensing datasets called
and maximum in the image’s initial derivative. The Massachusetts road imagery (Mnih 2013) containing
Sobel method computes an estimation of the image aerial images with 0.5 m spatial resolution and Ottawa
intensity gradient function and is a discrete differ­ road imagery (Liu et al. 2018) containing Google Earth
entiation method (Vincent and Folorunso 2009). images with 0.21 m spatial resolution were used to
test the proposed network on the road segmentation
and vectorization. We selected these two different
3. Experiments and Assessment datasets, which contain various road width pixels, to
The experimental settings in the suggested approach show the proposed architecture’s superiority in road
are first introduced in this section. Subsequently, we segmentation and vectorization. Each dataset
described the Massachusetts and Ottawa datasets includes two sub-datasets, namely, road surface seg­
used for road segmentation and vectorization. Next, mentation and road vectorization. The detailed infor­
the evaluation metrics and quantitative results mation of each dataset is highlighted as follows:
GISCIENCE & REMOTE SENSING 1159

Figure 3. Demonstration of three representative imagery, their segmentation ground truth, and vectorized ground truth maps for the
Massachusetts road imagery. (a), (b), and (c) illustrate the original RGB imagery, corresponding segmentation ground truth maps, and
superposition between vectorized and segmentation ground truth maps, respectively.

1) Massachusetts datasets: In this dataset, we used 3.3. Evaluation factors


766 images, which are split into 690 training, 48 vali­
We utilized F1 score (14) and Matthew correlation
dation, and 28 test images with a dimension of
coefficient (MCC) factors to assess the ability of the
512 × 512 and road width of approximately 6–9 pixels.
presented RoadVecNet for road surface segmentation
Figure 3 demonstrates some samples of the original
and vectorization. The IOU factor (13) is calculated by
images in the first column, the corresponding refer­
dividing the total number of mutual pixels between
ence map in the second column, and a superposition
the real and the classified masks by the total number
between vectorized road and road segmentation
of present pixels in both masks. A correlation coeffi­
ground truth maps in the last column.
cient between the predicted and the identified binary
2) Ottawa datasets: We utilized 652 images divided
classification was denoted as MCC (15), providing a
into 598 training, 34 validation, and 20 test images
value between −1 and +1, and a mixture of the recall
with a dimension of 512 × 512 and road width of
(12) and precision (11) factors was denoted as F1
almost 24–28 pixels. Figure 4 illustrates some exam­
score (Abdollahi, Pradhan et al. 2020). These measure­
ples of the main imagery, the corresponding refer­
ment metrics can be computed from the amount of
ence map, and the superposition between
false negative (FN), false positive (FP), true positive
vectorized road and road segmentation ground
(TP), and true negative (TN) pixels as follows:
truth maps in the first, second, and last columns,
respectively. TP
Pr ecision ¼ (11)
TP þ FP
1160 A. ABDOLLAHI ET AL.

Figure 4. Demonstration of three representative imagery and their segmentation ground truth and vectorized ground truth maps for
the Ottawa road imagery. (a), (b), and (c) demonstrate the main RGB images, corresponding segmentation ground truth maps, and
superposition between vectorized and segmentation ground truth maps, respectively.

Tp 3.4. Qualitative comparison of road surface


Recall ¼ (12)
TP þ FN segmentation
We compared the presented RoadVecNet architecture
TP with some other state-of-the-art classification-based
IOU ¼ (13)
TP þ FP þ FN deep learning networks to investigate the capability
of the network in road surface segmentation from
2 � Pr ecision � Recall HRSI. Examples of these networks are as follows:
F1 score ¼ (14)
Pr ecision þ Recall UNet architecture provided by Ronneberger, Fischer,
and Brox (2015); SegNet network implemented by
TP:TN FP:FN Badrinarayanan, Kendall, and Cipolla (2017);
MCC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi DeepLabV3 framework performed by Chen et al.
ðTP þ FPÞðTP þ FNÞðTN þ FPÞðTN þ FNÞ
(2017); VNet model applied by Abdollahi, Pradhan,
(15)
and Alamri (2020); ResUNet provided by Diakogiannis
A buffet width p should be defined because of the et al. (2019); and FCN architecture developed by Long,
differences between the real road width map and the Shelhamer, and Darrell (2015). For denoting segmen­
manually annotated road width map (Shao et al. tation, we utilized the suffix “-S” after each method’s
2021). The matching areas TP are defined as areas in name. The visualization outcomes obtained by the
the anticipated results that are within a p pixel range. presented RoadVecNet architecture and other com­
We followed the work of Shao et al. (2021) to set the parative networks for road surface segmentation
buffer width p ¼ 2. The same indices were used to from the Massachusetts and Ottawa datasets are
assess the outcomes of vectorized roads. demonstrated in Figures 5, 6, and 7. The figures
GISCIENCE & REMOTE SENSING 1161

Figure 5. Visual performance attained by Ours-S against the other comparative networks for road surface segmentation from the
Massachusetts imagery. The cyan, green and blue colors denote the TPs, FPs, and FNs, respectively.

Figure 6. Visual performance attained by the comparative networks for road surface segmentation from the Ottawa imagery. The cyan
green, and blue colors denote the TPs, FPs, and FNs, respectively.

illustrate that the SegNet-S, ResUNet-S, and results than Ours-S. Ours-S could generate high-
DeepLabV3-S networks were sensitive to the barriers resolution road segmentation maps for both datasets
of trees and shadows and predicted more FN pixels by alleviating the effect of obstacles, predicting less FP
(depicted as blue color) and FP pixels (depicted as pixels, and preserving the road border information.
green color), thereby producing low-quality road seg­ The reason is that we used the DDSPP module to
mentation maps for both datasets. Meanwhile, the create feature pyramids with more denser scale varia­
FCN-S, UNet-S, and VNet-S architectures could improve bility and a bigger receptive field. We also utilized the
the results and generate more coherent and satisfac­ SE module to extract more valuable information by
tory road segmentation maps. However, none of the considering the interdependencies between feature
abovementioned models achieved better qualitative channels. In addition, we applied the MFB_FL loss
1162 A. ABDOLLAHI ET AL.

Figure 7. Visual performance attained by Ours-S against VNet-S network for road surface segmentation from the Ottawa and
Massachusetts imagery. The cyan, green, and blue colors denote the TPs, FPs, and FNs, respectively.

function to overcome highly unbalanced datasets and 3.5. Qualitative comparison of road vectorization
allow more attention on the hard samples. Therefore,
Here, we compared the results attained by the pre­
we could obtain more constant and smoother road
sented RoadVecNet architecture for road vectoriza­
segmentation and vectorization results.
tion from the Massachusetts and Ottawa datasets

Figure 8. Comparison outcomes of various approaches for road vectorization in visual performance for Ottawa imagery. The first and
second columns demonstrate the original RGB and corresponding reference imagery, respectively. The third, fourth, fifth, sixth, and
last columns demonstrate the results of FCN-V, SegNet-V, UNet-V, DeepLabV3-V, and ResUNet-V. More details can be seen in the
zoomed-in view.
GISCIENCE & REMOTE SENSING 1163

Figure 9. Comparison of the outcomes of the VNet-V approach and Ours-V for road vectorization in terms of visual performance for
Ottawa imagery. The first and second columns demonstrate the original RGB and corresponding reference imagery, respectively. The
third and fourth columns demonstrate the results of VNet-V and Ours-V. More details can be seen in the zoomed-in view.

with the same comparative deep learning methods the FCN-V, SegNet-V ResUNet-V, and DeepLabV3-V
applied in the road surface segmentation part, such as architectures could generate relatively complete
UNet architecture (Ronneberger, Fischer, and Brox road vectorization network, they brought in spurs
2015), DeepLabV3 framework (Chen et al. 2017), and produced some FPs in the homogenous regions
SegNet network (Badrinarayanan, Kendall, and where the road was covered by occlusions and
Cipolla 2017), VNet applied by Abdollahi, Pradhan, around the intersections, reducing the correctness
and Alamri (2020), ResUNet provided by and smoothness of the road vectorization network.
Diakogiannis et al. (2019), and FCN (Long, The UNet-V and VNet-V methods could improve the
Shelhamer, and Darrell 2015). We utilized the suffix “- results and generate a complete network of the road
V” after every approach’s name to denote road vec­ vectorization; however, it failed to vectorize the road
torization. Figures 9 and 10 demonstrate the compar­ in the intersection parts and brought in some discon­
ison outcomes of various approaches and the tinuity and FPs. Figures 10 and 11 demonstrate the
presented RoadVecNet for road vectorization in visual visual performance of the comparative models for
performance for Ottawa imagery. The vectorized road Massachusetts imagery. In this dataset, the complex­
ground truth map is also included in the second ity of obstacles and backgrounds are more, and the
column of the figure to better display the contrast road width is less than those in the Ottawa dataset.
influences. We also used blue rectangular boxes in the Accordingly, all the above-mentioned comparative
figures to show the FP and FN pixels for facilitating models, including VNet-V, could not accurately vec­
comparison. Figures 8 and 9 illustrate that although torize the road, resulting in non-complete and non-
1164 A. ABDOLLAHI ET AL.

Figure 10. Comparison of the outcomes of various approaches for road vectorization in terms of visual performance for Massachusetts
imagery. The first and second columns demonstrate the original RGB and corresponding reference imagery, respectively. The third,
fourth, and fifth columns demonstrate the results of FCN-V, SegNet-V, and DeepLabV3-V, respectively. More details can be seen in the
zoomed-in view.

smooth vectorized road network, especially for com­ along with other comparative convolutional net­
plex backgrounds and intersection areas where they works, could attain satisfactory outcomes for road
brought in more discontinuity and FPs. By contrast, segmentation from both datasets. However, the
Ours-V could detect complete and non-spur vector­ DeepLabV3-S, ResUNet-S, and SegNet-S architectures
ized road network even from the Massachusetts data­ achieved the lowest F1 score accuracy with 85.83%.
set with narrow road width and complex 86.97%, and 87% for Massachusetts and 90.54%,
backgrounds. Our vectorized road map is more similar 90.72%, and 91.48% for Ottawa. The SegNet-S model
to the actual ground truth vectorized road than the could slightly improve the accuracy because it utilizes
other comparative models. the max-pooling indices at the encoder and corre­
sponding decoder paths to upsample the layers in
the decoding process. The model does not need to
4. Discussion
learn the upsampling weights again because this
We obtained the quantitative calculations for the pre­ function makes the training process more
sented technique and other comparative networks straightforward.
applied to the Massachusetts and Ottawa datasets Tables 1 and 2 also show that the VNet-S frame­
for road segmentation, which are summarized in work was the second-best approach in road surface
Tables 1 and 2, respectively. The first four columns in segmentation, with 91.45% for Massachusetts and
both tables are the performance of four test sample 92.02% for Ottawa. By contrast, the accuracy of the
imagery, and the final column is the average accuracy F1 score metric for Ours-S was higher than all the
of the whole test imagery. The bold value is the best comparative approaches. In fact, the presented
in the F1 score metric, while the underlined values are model could improve the F1 score accuracy by
the second-best. Tables 1 and 2 illustrate that Ours-S, 1.06% for Massachusetts and 1.38% for Ottawa
GISCIENCE & REMOTE SENSING 1165

Figure 11. Comparison outcomes of our approach and the other comparative models for road vectorization in visual performance for
Massachusetts imagery. The first column demonstrates the original RGB imagery. The second, third, fourth, and last columns
demonstrate the results of ResUNet-V, UNet-V, VNet-V, and Ours-V, respectively. More details can be seen in the zoomed-in view.

Table 1. Percentage of F1 score, MCC, and IOU attained by Ours-S and other comparative networks for road
segmentation from Massachusetts imagery. The bold and underline F1 scores demonstrate the best and second-
best, respectively.
Image1 Image2 Image3 Image4 Average
FCN-S F1 score 0.9104 0.9117 0.9028 0.9007 0.9064
MCC 0.9008 0.9037 0.8910 0.8901 0.8964
IOU 0.8338 0.8360 0.8212 0.8176 0.8272
SegNet-S F1 score 0.8680 0.8909 0.8701 0.8511 0.8700
MCC 0.8554 0.8838 0.8573 0.8324 0.8572
IOU 0.7654 0.8017 0.7686 0.7394 0.7688
UNet-S F1 score 0.9128 0.9141 0.9075 0.9073 0.9104
MCC 0.9017 0.9057 0.8984 0.8942 0.9000
IOU 0.8378 0.8570 0.8289 0.8286 0.8381
VNet-S F1 score 0.9122 0.9192 0.9084 0.9173 0.9145
MCC 0.9023 0.9108 0.8965 0.9067 0.9040
IOU 0.8385 0.8504 0.8322 0.8473 0.8421
ResUNet-S F1 score 0.8632 0.8882 0.8668 0.8609 0.8697
MCC 0.8493 0.8806 0.8539 0.8453 0.8572
IOU 0.7593 0.7988 0.7649 0.7557 0.7696
DeeplabV3-S F1 score 0.8564 0.8798 0.8468 0.8503 0.8583
MCC 0.8383 0.8693 0.8294 0.8303 0.8418
IOU 0.7475 0.7839 0.7330 0.7382 0.7507
Ours-S F1 score 0.9243 0.9239 0.9282 0.9240 0.9251
MCC 0.9143 0.9168 0.9190 0.9128 0.9157
IOU 0.8574 0.8740 0.8641 0.8568 0.8631

compared with the VNet-S network, which was the based segmentation method (Wei, Zhang, and Ji
second-best model. Furthermore, we compared the 2020), road structure-refined CNN (RSRCNN) techni­
quantitative results achieved by the proposed model que (Wei, Wang, and Xu 2017), and FCNs approach
with more deep learning-based models, such as CNN- (Zhong et al. 2016) applied for road segmentation
1166 A. ABDOLLAHI ET AL.

Table 2. Percentage of F1 score, MCC, and IOU attained by Ours-S and other comparative networks for road
segmentation from Ottawa imagery. The bold and underline values demonstrate the best and second-best,
respectively.
Image1 Image2 Image3 Image4 Average
FCN-S F1 score 0.8829 0.9150 0.9375 0.9282 0.9159
MCC 0.8453 0.8887 0.9103 0.8796 0.8810
IOU 0.7888 0.8415 0.8803 0.8641 0.8437
SegNet-S F1 score 0.8816 0.9302 0.9371 0.9103 0.9148
MCC 0.8432 0.9053 0.9102 0.8572 0.8790
IOU 0.7867 0.8676 0.8797 0.8336 0.8419
UNet-S F1 score 0.8849 0.9329 0.9321 0.9231 0.9183
MCC 0.8477 0.9108 0.9028 0.8711 0.8831
IOU 0.7921 0.8722 0.8709 0.8554 0.8477
VNet-S F1 score 0.8933 0.9294 0.9390 0.9191 0.9202
MCC 0.8597 0.9070 0.9137 0.8678 0.8870
IOU 0.8072 0.8681 0.8850 0.8502 0.8526
ResUNet-S F1 score 0.8761 0.9160 0.9372 0.8995 0.9072
MCC 0.8137 0.8887 0.9110 0.8159 0.8573
IOU 0.7484 0.8450 0.8818 0.7949 0.8175
DeeplabV3-S F1 score 0.8731 0.9274 0.9330 0.8884 0.9054
MCC 0.8101 0.9016 0.9027 0.8229 0.8593
IOU 0.7427 0.8627 0.8725 0.7759 0.8135
Ours-S F1 score 0.8992 0.9412 0.9434 0.9520 0.9340
MCC 0.8666 0.9202 0.9186 0.9190 0.9061
IOU 0.8152 0.8869 0.8909 0.9062 0.8748

Table 3. Percentage of F1 score and MCC attained by Ours-V and other comparative networks for road vectorization
from the Ottawa imagery. The bold and underline values denote the best and second-best, respectively.
Image1 Image2 Image3 Image4 Average
FCN-V F1 score 0.8643 0.9017 0.8893 0.8893 0.8862
MCC 0.8551 0.8953 0.8825 0.8821 0.8788
SegNet-V F1 score 0.8622 0.8702 0.8820 0.8776 0.8730
MCC 0.8563 0.8658 0.8782 0.8722 0.8681
UNet-V F1 score 0.8999 0.9134 0.9072 0.9288 0.9123
MCC 0.8931 0.9076 0.9015 0.9241 0.9066
DeeplabV3-V F1 score 0.8566 0.8699 0.8804 0.8742 0.8703
MCC 0.8513 0.8650 0.8763 0.8695 0.8655
VNet-V F1 score 0.9045 0.9129 0.9038 0.9297 0.9127
MCC 0.8985 0.9071 0.8973 0.9254 0.9070
ResUNet-V F1 score 0.8614 0.8734 0.8839 0.8874 0.8765
MCC 0.8529 0.8659 0.8771 0.8807 0.8691
Ours-V F1 score 0.9203 0.9237 0.9164 0.9358 0.9241
MCC 0.9149 0.9187 0.9110 0.9315 0.9190

Table 4. Percentage of F1 score and MCC attained by Ours-V and other comparative networks for road vectorization
from Massachusetts imagery. The bold and underline values denote the best and second-best, respectively.
Image1 Image2 Image3 Image4 Average
FCN-V F1 score 0.7982 0.8350 0.8204 0.8503 0.8260
MCC 0.8004 0.8273 0.8095 0.8510 0.8221
SegNet-V F1 score 0.7917 0.8326 0.8047 0.8424 0.8179
MCC 0.7763 0.8232 0.7911 0.8333 0.8060
UNet-V F1 score 0.8129 0.8458 0.8237 0.8514 0.8335
MCC 0.7994 0.8478 0.8244 0.8421 0.8284
DeeplabV3-V F1 score 0.7539 0.8263 0.7749 0.8237 0.7947
MCC 0.7350 0.8176 0.7586 0.8114 0.7807
VNet-V F1 score 0.8206 0.8470 0.8208 0.8619 0.8373
MCC 0.8069 0.8388 0.8083 0.8535 0.8268
ResUNet-V F1 score 0.7771 0.8307 0.7814 0.8298 0.8047
MCC 0.7596 0.8218 0.7653 0.8180 0.7911
Ours-V F1 score 0.8854 0.8834 0.8878 0.9129 0.8924
MCC 0.8762 0.8754 0.8794 0.9066 0.8844
GISCIENCE & REMOTE SENSING 1167

from Massachusetts imagery. The presented method model. Nevertheless, this method could not perform
was built and evaluated on an experimental dataset, well in road vectorization using the Massachusetts
while the outcomes for the other three works were imagery (Table 4) and predicted more FPs and less
taken from a previously published study. The F1 score FNs, resulting in less F1 score accuracy with 83.73%,
accuracy achieved by the CNN-based approach, which is not very good. This phenomenon is attribu­
RSRCNN, and FCNs were 82%, 66.2%, and 68%, ted to the aerial images that have more complex
respectively, while that of Ours-S approach is backgrounds and occlusions, and the road width is
92.51%. The results confirmed that the more super­ narrow. The other methods that could not achieve a
vised information in the presented model obtains higher F1 score accuracy than VNet-V for Ottawa
better outcomes against the other preexisting deep images could not obtain a higher accuracy for
learning approaches in road surface segmentation Massachusetts images as well. By contrast, Ours-V
from the HRSI. was able to achieve better results than others for
We calculated the F1 score and MCC metrics to both datasets. Ours-V achieved F1 score accuracy
better probe the capability of Ours-V and other com­ rates of 92.41% and 89.24% for Ottawa and
parative models in road vectorization. The qualitative Massachusetts imagery, respectively. Ours-V could
outcomes for the Ottawa (Google Earth) and improve the results of VNet-V (second-best method)
Massachusetts (aerial) imagery are demonstrated in to 1.14% for Ottawa and 5.51% for Massachusetts,
Table 3 and Table 4, respectively. Table 3 and Table 4 which confirmed its validity for road vectorization
show that the VNet-V could achieve satisfactory from Google Earth and Arial imagery.
results for road vectorization from the Ottawa ima­ The average F1 score accuracy attained by our
gery with 91.27% F1 score accuracy, which could approach and other comparative approaches in road
improve the results of other comparative models, surface segmentation and road vectorization from
such as FCN-V, DeepLabV3-V, ResUNet-V, UNet-V, both datasets is plotted in Figure 12(a,b), respectively.
and SegNet-V, and it was ranked as the second-best The approaches and the average percentage of the F1

Figure 12. Average percentage of the F1 score metric of our method and other methods for road surface segmentation (a) and road
vectorization (b) from Ottawa and Massachusetts imagery.

Figure 13. Performance of the proposed model for road segmentation and vectorization through training epochs: training and
validation losses for the (a) Ottawa and (b) Massachusetts datasets.
1168 A. ABDOLLAHI ET AL.

Table 5. Percentage of the F1 score, IOU, and MCC attained by 4.1. Ablation study
Ours-V network for road segmentation and vectorization from
the Massachusetts and Ottawa imagery after changing several We conducted some tests to see how different set­
settings. tings affected the model’s performance in road sur­
Road Segmentation Ottawa F1 score 0.9177
MCC 0.8984 face segmentation and vectorization. In this case, we
IOU 0.8684 used PSPNet backbone (Zhao, Shi et al. 2017), sto­
Massachusetts F1 score 0.8775
MCC 0.8662 chastic gradient descent with a 0.01 learning rate,
IOU 0.7818 and batch size of 4. The quantitative results for both
Road Vectorization Ottawa F1 score 0.9002
MCC 0.9048 tasks are shown in Table 5. Meanwhile, the visualiza­
Massachusetts F1 score 0.8532 tion results for road segmentation and vectorization
MCC 0.8460
tasks are depicted in Figures 14 and 15, respectively.
Table 5 illustrates that the accuracy of the F1 score
score metric are shown in the horizontal and vertical decreased to 91.77% and 87.75% for road segmenta­
axes, respectively. Figure 12 depicts that the Ours-S tion and 90.02% and 85.32% for road vectorization for
and Ours-V methods achieved the highest F1 score, the Massachusetts and Ottawa images, respectively,
affirming the superiority of the proposed technique after changing some settings. Figures 14 and 15 also
for road vectorization from Google Earth and Arial show that the proposed model brought in spurs and
imagery. Figure 13(a,b) display the training and vali­ produced some FPs in the homogenous regions,
dation losses of the presented approach over 100 thereby considerably decreasing the smoothness of
epochs for Ottawa and Massachusetts imagery, the road vectorization network.
respectively. Based on the decrease in model loss,
the method has learned efficient features for road
surface segmentation and vectorization. The training 4.2. Failure case analysis
and validation losses are close together in the learn­
In this case, we conducted some failure case analysis
ing curve for both datasets. The model reduced over-
by reducing the size of images to 256 × 256 to check
fitting, and the variance of the method is negligible.
the model’s performance on road segmentation and

Figure 14. Visual performance attained by Ours-S network for road surface segmentation from the Ottawa and Massachusetts imagery
after changing several settings. The cyan, green, and blue colors denote the TPs, FPs, and FNs, respectively.
GISCIENCE & REMOTE SENSING 1169

Figure 15. Visual performance attained by Ours-V for road vectorization from the Massachusetts and Ottawa imagery after changing
several settings. The blue rectangle shows the predicted FPs and FNs. More details can be seen in the zoomed-in view.

Table 6. Percentage of the F1 score, IOU, and MCC attained by for road vectorization, especially for complicated
Ours-V network for road segmentation and vectorization from and intersection areas wherein they brought in
the Massachusetts and Ottawa imagery after analyzing a failure
case. more FPs when we decreased the image size. The
Road Segmentation Ottawa F1 score 0.8887 model could learn considerably less, and the images
MCC 0.8291 were distinguished as failure due to overfitting when
IOU 0.7997
Massachusetts F1 score 0.8444 the image size was reduced. Accordingly, the detec­
MCC 0.8226
IOU 0.7308
tion accuracy was greatly diminished. Therefore,
Road Vectorization Ottawa F1 score 0.8702 reducing the image input size was ineffective for
MCC 0.8637
Massachusetts F1 score 0.8161 producing high-quality road segmentation and vec­
MCC 0.8045 torization maps.

vectorization. Table 6 shows the quantitative results 5. Conclusion


for both tasks. Meanwhile, Figures 16 and 17 illus­ A new interlinked end-to-end UNet framework
trate the visualization results for road segmentation called RoadVecNet was proposed in this study to
and vectorization tasks, respectively. In Table 6, simultaneously implement the road surface segmen­
when the size of the image was halved, the accuracy tation and road vectorization. The first network in
of the F1 score was decreased for road segmentation the RoadVecNet architecture was used to produce
to 88.87% and 84.44% and road vectorization to feature maps. Meanwhile, the second network was
87.02% 81.61% for both Massachusetts and Ottawa performed to formulate road vectorization. The
imagery, respectively. In addition, Figures 16 and 17 Sobel method was utilized to achieve a complete
depict that the proposed model showed more noise and smooth vectorized road with accurate road-
and confused lanes with each other for road seg­ width information. Two separate datasets, namely,
mentation from both datasets. Moreover, the model road surface segmentation and road vectorization
produced a non-complete vectorized road network datasets, were used to train the model. The
1170 A. ABDOLLAHI ET AL.

Figure 16. Visual performance attained by Ours-S network for road surface segmentation from the Ottawa and Massachusetts imagery
after analyzing a failure case. The cyan, green, and blue colors denote the TPs, FPs, and FNs, respectively.

Figure 17. Visual performance attained by Ours-V for road vectorization from the Massachusetts and Ottawa imagery after analyzing a
failure case. The blue rectangle shows the predicted FPs and FNs. More details can be seen in the zoomed-in view.
GISCIENCE & REMOTE SENSING 1171

Figure 18. The vectorized road is superimposed with the original Aerial (Massachusetts) and Google Earth (Ottawa) imagery to show
the overall geometric quality of vectorized outcomes. The first and second rows demonstrate the Aerial images, and the third and last
rows illustrate the Google Earth images. The last column also demonstrates the superimposed vectorized road. More details can be
seen in the zoomed-in view.

advantage of the proposed model was verified with it could not segment road and vectorize well,
rigorous experiments: 1) Two different road datasets thereby resulting in discontinuity for large and con­
imagery called Ottawa (Google Earth) and tinuous areas of obstacles. These issues are the
Massachusetts (Aerial) datasets, which comprise the primary drawbacks of the suggested technique for
original RGB images, corresponding ground truth road surface segmentation and vectorization. Future
segmentation maps, and corresponding ground study can address these constraints by incorporat­
truth vector maps, were employed to test the ing topological criteria and gap-filling methods into
model for road segmentation and vectorization. 2) our proposed road extraction and vectorization
In the road surface segmentation tasks, the pro­ method to improve its accuracy.
posed RoadVecNet could achieve more consistent
and smooth road segmentation outcomes than all
the comparative models in terms of visual and qua­ Highlights
litative performance. 3) In the road vectorization
● A new RoadVecNet is applied for road segmentation and
task, RoadVecNet also showed better performance
vectorization simultaneously.
than the other comparative state-of-the-art deep ● SE and DDSPP modules are used to improve the accuracy.
convolutional architectures. Figure 18 demonstrates ● Sobel edge detection method is used to obtain complete
the vectorized road results overlaid on the original and smooth road edge networks
Google Earth and Aerial imagery to prove the over­ ● Two different Massachusetts and Ottawa datasets are used
all geometric quality of the road segmentation and for road vectorization.
● MFB_FL loss function is utilized to overcome highly unba­
vectorization by the model. We calculated the root-
lanced training dataset.
mean-square (RMS) of road widths based on the
quadratic mean distance between the matched
references and extracted widths. The vectorization Data availability
of the classified outcomes achieved width RMS
values of 1.47 and 0.63 m for Massachusetts and The Massachusetts and Ottawa datasets and developed code
Ottawa images, respectively, proving that the pro­ that is uploaded at Github can be downloaded from the online
versions at https://www.cs.toronto.edu/~vmnih/data/, https://
posed model could achieve precise information
github.com/gismodelling/RoadVecNet, and https://github.
about road width. Moreover, the proposed network com/yhlleo/RoadNet.
could extract the precise location of the road net­
work because the vectorized road maps are well
superimposed with the original imagery. The pro­ Disclosure statement
posed RoadVecNet model showed robustness
against the obstacle to a certain extent; however, No potential conflict of interest was reported by the author(s).
1172 A. ABDOLLAHI ET AL.

Funding Chen, L., G. Papandreou, I. Kokkinos, K. Murphy, and A. Yuille.


2017. “Deeplab: Semantic Image Segmentation with Deep
The Centre for Advanced Modelling and Geospatial Convolutional Nets, Atrous Convolution, and Fully
Information Systems, Faculty of Engineering and IT, University Connected Crfs.” IEEE Transactions on Pattern Analysis and
of Technology Sydney funded this research. This research was Machine Intelligence 40 (4): 834–848. doi:10.1109/
also supported by Researchers Supporting Project number TPAMI.2017.2699184.
RSP-2021/14, King Saud University, Riyadh, Saudi Arabia. Cheng, G., Y. Wang, S. Xu, H. Wang, S. Xiang, and C. Pan. 2017.
“Automatic Road Detection and Centerline Extraction via
Cascaded End-to-end Convolutional Neural Network.” IEEE
ORCID Transactions on Geoscience and Remote Sensing 55 (6):
3322–3337. doi:10.1109/TGRS.2017.2669341.
Biswajeet Pradhan http://orcid.org/0000-0001-9863-2054
Diakogiannis, F., F. Waldner, P. Caccetta, and C. Wu. 2019.
“ResUNet-A: A Deep Learning Framework for Semantic
Segmentation of Remotely Sensed Data.” 1–24. https://
Author contributions arxiv.org/abs/1904.00592
Author Contributions: A.A. carried out the investigations, ana­ Eigen, D., and R. Fergus. 2012. “Nonparametric Image Parsing
lyzed the data, and drafted the article; B.P. conceptualized, Using Adaptive Neighbor Sets.” In 2012 IEEE Conference on
supervised, visualized, project administered, resource allo­ Computer Vision and Pattern Recognition: 2799–2806.
cated, wrote, reviewed, edited, and reorganized the article; B. Providence, RI, USA.
P. and A.A.A. effectively enhanced the article, along with the Gao, X., X. Sun, Y. Zhang, M. Yan, G. Xu, H. Sun, J. Jiao, and K. Fu.
funding. All authors have read and consented to the published 2018. “An End-to-end Neural Network for Road Extraction
version of the article. from Remote Sensing Imagery by Multiple Feature Pyramid
Network.” IEEE Access 6: 39401–39414. doi:10.1109/
ACCESS.2018.2856088.
References Hong, Z., D. Ming, K. Zhou, Y. Guo, and T. Lu. 2018. “Road
Extraction from a High Spatial Resolution Remote Sensing
Abdollahi, A., and B. Pradhan. 2021. “Integrated Technique of Image Based on Richer Convolutional Features.” IEEE Access
Segmentation and Classification Methods with Connected 6: 46988–47000. doi:10.1109/ACCESS.2018.2867210.
Components Analysis for Road Extraction from Orthophoto Hormese, J., and C. Saravanan. 2016. “Automated Road
Images.” Expert Systems with Applications 176: 114908. Extraction from High Resolution Satellite Images.”
doi:10.1016/j.eswa.2021.114908. Procedia Technology 24: 1460–1467. doi:10.1016/j.
Abdollahi, A., B. Pradhan, and A. Alamri. 2020. “VNet: An protcy.2016.05.180.
End-to-end Fully Convolutional Neural Network for Road Hu, F., G. Xia, J. Hu, and L. Zhang. 2015. “Transferring Deep
Extraction from High-resolution Remote Sensing Data.” Convolutional Neural Networks for the Scene Classification
IEEE Access 8: 179424–179436. doi:10.1109/ of High-resolution Remote Sensing Imagery.” Remote
ACCESS.2020.3026658. Sensing 7 (11): 14680–14707. doi:10.3390/rs71114680.
Abdollahi, A., B. Pradhan, N. Shukla, S. Chakraborty, and A. Hu, J., L. Shen, and G. Sun. 2018. “Squeeze-and-excitation
Alamri. 2020. “Deep Learning Approaches Applied to Networks.” In Proceedings of The IEEE Conference on
Remote Sensing Datasets for Road Extraction: A State-of- Computer Vision And Pattern Recognition: 7132–7141. Salt
the-art Review.” Remote Sensing 12 (9): 1444. doi:10.3390/ Lake City, UT, USA.
rs12091444. Ioffe, S., and C. Szegedy. 2015. “Batch Normalization:
Badrinarayanan, V., A. Kendall, and R. Cipolla. 2017. “Segnet: A Accelerating Deep Network Training by Reducing Internal
Deep Convolutional Encoder-decoder Architecture for Covariate Shift.”448–456. Available from: https://arxiv.org/
Image Segmentation.” IEEE Transactions on Pattern Analysis abs/1502.03167
and Machine Intelligence 39 (12): 2481–2495. doi:10.1109/ Kaur, A., and R. Singh. 2015. “Various Methods of Road
TPAMI.2016.2644615. Extraction from Satellite Images: A Review.” International
Buslaev, A., S. Seferbekov, V. Iglovikov, and A. Shvets. 2018. Journal of Research 2 (2): 1025–1032.
“Fully Convolutional Network for Automatic Road Extraction Li, P., Y. Zang, C. Wang, J. Li, M. Cheng, L. Luo, and Y. Yu.
from Satellite Imagery.” In Proceedings of the IEEE 2016. “Road Network Extraction via Deep Learning and
Conference on Computer Vision and Pattern Recognition Line Integral Convolution.” IEEE International Geoscience
Workshops: 207–210. Salt Lake City, UT, USA. and Remote Sensing Symposium (IGARSS), Beijing, China,
Chaudhuri, D., N. Kushwaha, and A. Samal. 2012. “Semi- 1599–1602. 10.1109/IGARSS.2016.7729408.
automated Road Detection from High Resolution Satellite Li, Y., L. Xu, J. Rao, L. Guo, Z. Yan, and S. Jin. 2019. “A Y-Net Deep
Images by Directional Morphological Enhancement and Learning Method for Road Segmentation Using
Segmentation Techniques.” IEEE Journal of Selected Topics High-resolution Visible Remote Sensing Images.” Remote
in Applied Earth Observations and Remote Sensing 5 (5): Sensing Letters 10 (4): 381–390. doi:10.1080/
1538–1544. doi:10.1109/JSTARS.2012.2199085. 2150704X.2018.1557791.
GISCIENCE & REMOTE SENSING 1173

Lin, T., P. Goyal, R. Girshick, K. He, and P. Dollár. 2017. “Focal Ševo, I., and A. Avramović. 2016. “Convolutional Neural
Loss for Dense Object Detection.” Proceedings of the IEEE Network Based Automatic Object Detection on Aerial
International Conference on Computer Vision: 2980–2988. Images.” IEEE Geoscience and Remote Sensing Letters 13 (5):
Venice, Italy. 740–744. doi:10.1109/LGRS.2016.2542358.
Liu, R., Q. Miao, J. Song, Y. Quan, Y. Li, P. I. Xu, and J. Dai. 2019. Sarhan, E., E. Khalifa, and A. M. Nabil. 2011. “Road Extraction
“Multiscale Road Centerlines Extraction from Framework by Using Cellular Neural Network from Remote
High-resolution Aerial Imagery.” Neurocomputing 329: Sensing Images.” In 2011 International Conference on Image
384–396. doi:10.1016/j.neucom.2018.10.036. Information Processing: 1–5. Shimla, India.
Liu, Y., J. Yao, X. Lu, M. Xia, X. Wang, and Y. Liu. 2018. “Roadnet: Shao, Z., Z. Zhou, X. Huang, and Y. Zhang. 2021. “MRENet:
Learning to Comprehensively Analyze Road Networks in Simultaneous Extraction of Road Surface and Road
Complex Urban Scenes from High-resolution Remotely Centerline in Complex Urban Scenes from Very
Sensed Images.” IEEE Transactions on Geoscience and High-resolution Images.” Remote Sensing 13 (2): 239.
Remote Sensing 57 (4): 2043–2056. doi:10.1109/ doi:10.3390/rs13020239.
TGRS.2018.2870871. Shen, Z., J. Luo, and L. Gao. 2010. “Road Extraction from High-
Long, J., E. Shelhamer, and T. Darrell. 2015. “Fully Convolutional resolution Remotely Sensed Panchromatic Image in
Networks for Semantic Segmentation.” Proceedings of the Different Research Scales.” In 2010 IEEE International
IEEE Conference on Computer Vision and Pattern Geoscience and Remote Sensing Symposium: 453–456.
Recognition: 6810–6818. Boston, MA, USA. Honolulu, HI, USA.
Luo, Y., J. Li, C. Yu, B. Xu, Y. Li, L. Hsu, and N. El-Sheimy. 2019. Simonyan, K., and A. Zisserman. 2014. “Very Deep
“Research on Time-correlated Errors Using Allan Variance in Convolutional Networks for Large-scale Image
a Kalman Filter Applicable to Vector-tracking-based GNSS Recognition.” Availavle from: https://arxiv.org/abs/1409.
Software-defined Receiver for Autonomous Ground Vehicle 1556
Navigation.” Remote Sensing 11 (9): 1026. doi:10.3390/ Srivastava, N., G. Hinton, A. Krizhevsky, and R. Salakhutdinov.
rs11091026. 2014. “Dropout: A Simple Way to Prevent Neural Networks
Maboudi, M., J. Amini, M. Hahn, and M. Saati. 2017. “Object- from Overfitting.” Journal of Machine Learning Research 15:
based Road Extraction from Satellite Images Using Ant 1929–1958.
Colony Optimization.” International Journal of Remote Unsalan, C., and B. Sirmacek. 2012. “Road Network Detection
Sensing 38 (1): 179–198. doi:10.1080/ Using Probabilistic and Graph Theoretical Methods.” IEEE
01431161.2016.1264026. Transactions on Geoscience and Remote Sensing 50 (11):
Miao, Z., W. Shi, H. Zhang, and X. Wang. 2012. “Road Centerline 4441–4453. doi:10.1109/TGRS.2012.2190078.
Extraction from High-resolution Imagery Based on Shape Vincent, O., and O. Folorunso. 2009. “A Descriptive Algorithm
Features and Multivariate Adaptive Regression Splines.” for Sobel Image Edge Detection.” In Proceedings of
IEEE Geoscience and Remote Sensing Letters 10 (3): 583–587. Informing Science & IT Education Conference (Insite) 40:
doi:10.1109/LGRS.2012.2214761. 97–107. Macon, United States.
Mnih, V. 2013. “Machine Learning for Aerial Image Labeling.” Wei, Y., K. Zhang, and S. Ji. 2020. “Simultaneous Road
Ph.D. dissertation, Dept. Comput. Sci., Univ. Toronto, Surface and Centerline Extraction from Large-scale
Toronto, ON, Canada. Remote Sensing Images Using CNN-based
Mnih, V., and G. E. Hinton. 2010. “Learning to Detect Roads in Segmentation and Tracing.” IEEE Transactions on
High-Resolution Aerial Images.” Berlin, Heidelberg, 210–223. Geoscience and Remote Sensing 58 (12): 8919–8931.
10.1007/978-3-642-15567-3_16. doi:10.1109/TGRS.2020.2991733.
Movaghati, S., A. Moghaddamjoo, and A. Tavakoli. 2010. Wei, Y., Z. Wang, and M. Xu. 2017. “Road Structure Refined Cnn
“Road Extraction from Satellite Images Using Particle for Road Extraction in Aerial Image.” IEEE Geoscience and
Filtering and Extended Kalman Filtering.” IEEE Remote Sensing Letters 14 (5): 709–713. doi:10.1109/
Transactions on Geoscience and Remote Sensing 48 (7): LGRS.2017.2672734.
2807–2817. doi:10.1109/TGRS.2010.2041783. Xie, Y., F. Miao, K. Zhou, and J. Peng. 2019. “HsgNet: A Road
Qiaoping, Z., and I. Couloigner. 2004. “Automatic Road Change Extraction Network Based on Global Perception of
Detection and GIS Updating from High Spatial High-order Spatial Information.” ISPRS International Journal
Remotely-sensed Imagery.” Geo-Spatial Information Science of Geo-Information 8 (12): 571. doi:10.3390/ijgi8120571.
7 (2): 89–95. doi:10.1007/BF02826642. Xu, Y., Y. Feng, Z. Xie, A. Hu, and X. Zhang. 2018. “A Research on
Ronneberger, O., P. Fischer, and T. Brox. 2015. “U-net: Extracting Road Network from High Resolution Remote
Convolutional Networks for Biomedical Image Sensing Imagery.” 26th International Conference on
Segmentation.” International Conference on Medical Image Geoinformatics, Kunming, China, 1–4. 10.1109/
Computing and Computer-Assisted Intervention: 234–241. GEOINFORMATICS.2018.8557042.
Munich, Germany. Yang, M., K. Yu, C. Zhang, Z. Li, and K. Yang. 2018. “Denseaspp
Saito, S., and Y. Aoki. 2015. “Building and Road Detection from for Semantic Segmentation in Street Scenes.” In Proceedings
Large Aerial Imagery.” Image Processing: Machine Vision of the IEEE Conference on Computer Vision and Pattern
Applications 9405: 94050. Recognition: 3684–3692. Salt Lake City, UT, USA.
1174 A. ABDOLLAHI ET AL.

Yang, X., X. Li, Y. Ye, R. Y. K. Lau, X. Zhang, and X. Huang. 2019. Zhao, W., S. Du, and W. J. Emery. 2017. “Object-based
“Road Detection and Centerline Extraction via Deep Convolutional Neural Network for High-resolution
Recurrent Convolutional Neural Network U-net.” IEEE Imagery Classification.” IEEE Journal of Selected Topics
Transactions on Geoscience and Remote Sensing 57 (9): in Applied Earth Observations and Remote Sensing 10 (7):
7209–7220. doi:10.1109/TGRS.2019.2912301. 3386–3396. doi:10.1109/JSTARS.2017.2680324.
Yi, W., Y. Chen, H. Tang, and L. Deng. 2010. “Experimental Zhong, Y., F. Fei, Y. Liu, B. Zhao, H. Jiao, and L. Zhang. 2017.
Research on Urban Road Extraction from High-resolution “SatCNN: Satellite Image Dataset Classification Using Agile
RS Images Using Probabilistic Topic Models.” IEEE Convolutional Neural Networks.” Remote Sensing Letters 8
International Geoscience and Remote Sensing Symposium: (2): 136–145. doi:10.1080/2150704X.2016.1235299.
445–448. Honolulu, HI, USA. Zhong, Z., J. Li, W. Cui, and H. Jiang. 2016. “Fully Convolutional
Zhang, Z., L. Qingjie, and W. Yunhong. 2018. “Road Extraction Networks for Building and Road Extraction: Preliminary
by Deep Residual U-net.” IEEE Geoscience and Remote Results.” IEEE International Geoscience and Remote Sensing
Sensing Letters 15 (5): 749–753. doi:10.1109/ Symposium (IGARSS), Beijing, China: 1591–1594.
LGRS.2018.2802944.
Zhao, H., J. Shi, X. Qi, X. Wang, and J. Jia. 2017. “Pyramid Scene Zhou, W., S. Newsam, C. Li, and Z. Shao. 2017. “Learning Low
Parsing Network.” In Proceedings of the IEEE Conference on Dimensional Convolutional Neural Networks for
Computer Vision and Pattern Recognition: 2881–2890. High-resolution Remote Sensing Image Retrieval.” Remote
Honolulu, HI, USA. Sensing 9 (5): 489. doi:10.3390/rs9050489.

You might also like