Road Vec Net
Road Vec Net
To cite this article: Abolfazl Abdollahi, Biswajeet Pradhan & Abdullah Alamri (2021) RoadVecNet:
a new approach for simultaneous road network segmentation and vectorization from aerial and
google earth imagery in a complex urban set-up, GIScience & Remote Sensing, 58:7, 1151-1174,
DOI: 10.1080/15481603.2021.1972713
from the IKONOS satellite imagery. Saito and Aoki the proposed model could not achieve a single-pixel
(2015) presented a CNN model and achieved 88.66% wide for some road centerlines. Li et al. (2019) used a
accuracy for road extraction from Massachusetts aer Y-Net network for road extraction from the Jilin-1
ial imagery. Zhao, Du, and Emery (2017) introduced an satellite imagery and public Massachusetts dataset.
object-based deep learning model with 89.59% OA The proposed model comprises two feature extrac
for extracting road from Worldview-2 imagery. Li et al. tion and fusion modules. They applied the feature
(2016) applied a CNN model to extract roads from extraction module that contains downsampling to
Pleiades-1A and GeoEye images; then they used a upsampling to extract features in detail and applied
post-processing step to smoothen the results and a fusion module to mix all features for road segmen
obtain the road centerline and achieved 80.59% accu tation. The presented model achieved 67.75% accu
racy. Although certain outcomes have been achieved racy for mean region intersection over union (mean
in road extraction by using CNN, many errors still IU), while it requires more time for training and did
persist. For instance, the proposed approaches were not exhibit good results for narrow road sections
not efficient in accurately detecting roads in complex when the image has a small number of road pixels.
areas, and the extracted roads still have imperfect In another study by Xu et al. (2018), the roads were
fragments and patches (Xie et al. 2019, Zhao, Shi et extracted from WorldView-2 imagery by using a
al. 2017; Sarhan, Khalifa, and Nabil 2011). guided filter and a deep residual network (Res-UNet)
Fully CNNs (FCNN) can extract high-level features . The experimental outcomes demonstrated that the
with more abstract semantic information (Hu et al. model obtained 92.77% F1 score accuracy, however, it
2015). Zhang et al. (2018) combined the UNet model did not perform well for road detection in areas with
with residual learning to extract a road area from the other objects with a similar spatial distribution and
Massachusetts road dataset. The presented technique spectral values as road class. Yang et al. (2019) applied
achieved 91.87% accuracy for precision metric, how recurrent convolution neural network UNet (RCNN-
ever, it was insufficient for road detection in sections UNet) to detect roads and extract road centerlines.
where road networks are covered by trees and park They used Google Earth imagery and Roadtracer data
ing lots. Cheng et al. (2017) proposed a cascaded end- set to test their model. The proposed model was a
to-end network that contains two networks to simul supervised multitask learning network for road seg
taneously detect the road surface and centerline from mentation and road centerline extraction and
the Google Earth imagery. They obtained 88.84% for obtained 81.74% for completeness metric.
quality measure, while they figured out that the In the above literature review, although a numer
method could not detect roads for the large areas of ous number of approaches have been applied for
obstructions. The proposed approach could not road class identification and road centerline extrac
obtain precise information about road width. In tion, they have some shortcomings. Specifically, the
Zhong et al. (2016), building and road features were roads in complex areas are covered by obstructions,
simultaneously extracted using FCNN from the such as cars, shadows, and trees, or the existing
Massachusetts aerial images with 68% F1 accuracy. approaches in heterogeneous areas cannot efficiently
Buslaev et al. (2018) presented a deep learning model detect the road part. The existing approaches for road
based on vanilla UNet and ResNet-34 to detect road centerline extraction could not achieve accurate infor
class from DigitalGlobe’s satellite imagery with 0.5 m mation about road width and location. In this study,
per pixel spatial resolution. Although a loss function we present a new deep learning model called
based on Intersection Over Union (IOU) and binary RoadVecNet to simultaneously extract the road sur
cross entropy (BCE) were also introduced for perfor face and then vectorize the road network. In the
mance improvement, the model could not achieve extraction part, we want to deal with the road seg
high accuracy for IOU (64%) in road extraction. Liu et mentation issues and detect consistent road parts. We
al. (2019) extracted road centerline from the also want to vectorize the road network by determin
Massachusetts and EPFL datasets based on a CNN ing and extracting the road vector rather than the
model, edge-preserving filtering, shape feature and road centerline to obtain accurate information about
morphological filtering, and Gabor filters. They the road network’s width and location. The proposed
obtained 89% accuracy for quality metric; however, approach comprised two convolutional UNet
1154 A. ABDOLLAHI ET AL.
networks that are interlinked into one architecture. The rest of the manuscript is organized as follows.
The initial framework is used to identify road surfaces, The details of the suggested RoadVecNet framework
while the second framework is utilized to vectorize for road surface segmentation and vectorization are
roads to achieve road location and width information. presented in Section 2. The detailed explanations of
In the proposed model, we used two encoders, two the datasets are depicted in Section 3, and the eva
decoders, and two novel modules, namely, dense luation metrics and experimental results are high
dilated spatial pyramid pooling (DDSPP) (Yang et al. lighted in Section 4. The detailed quantitative
2018) and squeeze-and-excite (SE) (Hu, Shen, and Sun comparisons of the suggested network with the
2018). The DDSPP module is used to achieve a bigger other comparative models are presented in Section
receptive field and create feature pyramids with a 5. Finally, the conclusion and main findings are
more denser scale variability. The SE module is explained in Section 6.
employed to consider the interdependencies
between feature channels and extract more valuable
2. Methodology
information. We also used a loss function named focal
loss weighted by the median frequency balancing This study implemented an interlinked UNet networks
(MFB_FL) to overcome highly unbalanced datasets called RoadVecNet for simultaneous road surface seg
where positive cases are rare. MFB_FL lessens the mentation and vectorization from HRSI. The main
burden of simple samples, allowing more time to be steps for applying the suggested method listed are
spent on difficult samples, and improves the road as follows: (i) dataset preparation was performed to
extraction and road vectorization results. conduct the testing, training, and validation imagery
Accordingly, we can achieve constant road surface for road surface segmentation and vectorization; (ii)
identification outcomes and complete and smoothen the presented framework was then trained and vali
road vectorization results with accurate information dated based on the training and validation images;
of road width and location even under obstructions of (iii) the trained framework was then applied to the
shadows, trees, and complicated environments com test images to produce road surface and vectorized
pared with other comparative deep learning-based road maps; (iv) the performance of the presented
techniques. The significant contributions of the sug framework was evaluated on the basis of the evalua
gested technique are explained as follows: 1) a new tion metrics, and the results were compared with
RoadVecNet that contains interlinked UNet networks some preexisting deep learning methods.
is introduced to bridge two subtasks of road surface
segmentation and road vectorization together. To the
2.1. RoadVecNet architecture
best of the authors’ knowledge, this work is the first to
apply the proposed cascaded model for the given An overview of the suggested RoadVecNet framework
task. 2) Road vectorization is formulated as binary is shown in Figure 1. The proposed network com
classification issues (i.e. non-edge and edge) by prises the road surface segmentation and road vec
using the convolutional network. Next, the Sobel torization networks (Figure 1). Each UNet model
approach is used to achieve a smooth and complete includes a contracting encoder arm where the resolu
vectorized road. 3) Two challenging large size road tion decreases, and the feature depth increases and
datasets, namely, Ottawa and Massachusetts, are used an expanding decoder arm where the resolution
to test the proposed method. 4) More constant road increases, and the feature depth decreases. We uti
surface segmentation and smooth road vectorization lized filters sizes of 32, 64, 128, and 256 to consider
results can be achieved by the proposed model even the number of feature maps in encoder–decoder. The
under complex backgrounds compared with the skip connections characteristic of the U-Net frame
other existing methods when some modules, such work (Ronneberger, Fischer, and Brox 2015) connect
as DDSPP, SE, and MFB_FL loss, and encoder and each upsampled feature map at the decoder arm to
decoder layers are used in the framework. The experi the encoder’s arm with an identical spatial resolution.
mental results prove the overall geometric quality of Accordingly, the probability map that indicates the
the road segmentation and vectorization with accu likelihood of every road and non-road pixel is
rate road location and width information. obtained with the sigmoid classifier.
GISCIENCE & REMOTE SENSING 1155
Figure 1. Flowchart of the RoadVecNet framework containing (a) road surface segmentation and (b) road vectorization UNet networks.
Figure 2. DDSPP structure. Each dilated convolutional layer’s output is concatenated (C) with the input feature map and then fed to
the subsequent dilated layer.
1) Road surface segmentation architecture: The Then, the proper skip connections of the encoder
detailed configuration of this network is shown in feature maps to the output feature maps are conca
Figure 2(a). This network was first applied to detect tenated. Thereafter, two 3 � 3 convolutional layers
the road surface, which is categorized into two: road were applied, followed by batch normalization (BN)
and background categories. In this network, pre- and Rectified Linear Unit (ReLU) function. The distri
trained VGG-19 (Simonyan and Zisserman 2014) was bution of activations varies in the intermediate layers
used as an encoder because VGG-19 can be easily during the training step, which is a problem. This issue
transferred to another task, given that it has formerly slows down the training phase because every layer in
learned features from ImageNet. The key advantages every training phase must learn to adjust to a new
of adopting the VGG-19 network are as follows: (1) its distribution. Thus, BN (Ioffe and Szegedy 2015), which
design is identical to UNet, making it easier to com standardizes the inputs to a layer in the network by
bine with UNet, and (2) it will allow much deeper subtracting the batch mean and dividing by the batch
networks to produce superior output segmentation standard deviation, is used to improve the stability of
and vectorization results. We also used the DDSPP a neural network. The speed of a neural network’s
module to extract high-resolution feature maps and training process can be accelerated by BN (Ioffe and
capture contextual information within the architec Szegedy 2015). Furthermore, the model’s perfor
ture and the SE module to pass more relevant data mance is improved in some cases due to the modest
and reduce redundant ones. Every block in the deco regularization influence. Subsequently, the SE module
der part implements a 2 � 2 bilinear upsampling on was used, and the mask was generated by applying a
the input features to double the dimension of the convolutional layer with the sigmoid function. In
input feature maps. This avoids artifacts and the use remote sensing imagery, the road samples face the
of slow deconvolution layer and hence decreases the class imbalance issue because of the skewed dispen
number of learning parameters, which it also contri sation of ground objects (Abdollahi, Pradhan, and
butes to a faster total training and inference time. Alamri 2020). The cross-entropy loss does not
1156 A. ABDOLLAHI ET AL.
adequately account for the imbalanced classes connections, and sigmoid layer; however, it is much
because it is calculated by summing up all of the smaller than the road surface segmentation model. A
pixels. A typical approach for considering the imbal relatively small architecture was chosen for this part
anced classes is to use a weighting factor (Eigen and for the following reasons. First, the training network
Fergus 2012). The class loss is weighted using median has fewer positive pixels (vectorized road pixels) com
frequency balancing by the ratio of the training set’s pared with the road segmentation framework. Thus,
median class frequency and the real class frequency applying a relatively deep network may cause over
(Eigen and Fergus 2012). The presentation of a fitting. In addition, the feature maps generated by the
weighting factor between the simple and the hard final convolutional layer in the decode arm of the
samples is the same; however, it balances the value of road segmentation framework have fewer complex
positive and negative samples. Therefore, the focal backgrounds compared with the original image. A
loss function was implemented by Lin et al. (2017) to relatively small architecture is sufficient to deal with
lessen the burden of simple samples, allowing them the vectorization task. In Figure 2, the inputs of the
to focus more on the hard samples. We used the focal vectorization model are the feature maps generated
loss weighted by the median frequency balancing by the final convolutional layer of the decoder arm in
(MFB_FL) to address the imbalance issue of the train the road segmentation model. In every encoder block,
ing data and train the road surface segmentation two 3 � 3 convolutional layers were implemented,
network that is denoted as follows: followed by batch normalization and ReLU.
Thereafter, the SE block is used to enhance the feature
MFB FLseg ðg; f ðoÞ; δ1 Þ ¼ αð1 lc ðIij ÞÞγ :BCEseg (1)
map’s quality. Then, a2 � 2 max-pooling layer with
where stride 2 was applied to decrease the spatial dimension
S X
P X
C
of the feature maps. All the components in the deco
X
BCEseg ¼ wc � ðgji ¼ CÞ log lc ðIij Þ (2) der arm are comparable to those of the decoder arm
i¼1 j¼1 c¼1 of the road segmentation network. To train the road
vectorization model, its MFB_FL is denoted as follows:
medianðmc jc 2 CÞ
wc ¼ MFB FLvec ðy; hðIÞ; δ2 Þ ¼ αð1 lc ðf ðIij ÞÞÞγ :BCEvec (4)
mc
, where
wheremedianðmc Þ is the median value of every mc , mc
S X
X P X
C
is the modulation of pixels in class c, wc is the class BCEvec ¼ wc � ðyij ¼ CÞ log lc ðf ðIij ÞÞ (5)
weight, f ðIij Þ is the output of the final convolutional i¼1 j¼1 c¼1
training dataset for every subtask. Moreover, we used where H � W is the size of this channel, Xfup ði; jÞ is a
the main RGB (red, green, and blue) images and the spatial location of the f th channel, and Fsq is the spatial
corresponding ground truth surface images for the squeeze module. The second operation is excitation,
road surface segmentation task and the main images which takes the global information produced in the
and its corresponding ground truth vectorized squeeze stage. This operation includes two fully con
images for the road vectorization task. Finally, the nected (FC) layers. The pooled vector is first encoded
overall loss function in RoadVecNet, which is a com and then decoded to shape 1 � 1 � Fr and 1 � 1 � F,
bination of losses (1) and (3), can be expressed as respectively, to generate an excitation vector as
follows: F
s ¼ Fex ðz; WÞ ¼ σðW2 <ðW1 zÞÞ, where W1 2 R r �F
F
MFB FLðδ1 þ δ2 Þ ¼ MFB FLseg ðg; f ðoÞ; δ1 Þ denotes the parameters of the initial FC layer RF� r , r
is the reduction ratio, < is ReLU, and σ denotes the
þMFB FLvec ðy; hðIÞ; δ2 Þ ¼ (7)
sigmoid function. The output of the SE block is gen
αð1 lc ðIij ÞÞγ :BCEseg þ αð1 lc ðf ðIij ÞÞÞγ :BCEvec
erated as X~fup ¼ Fscale ðXfup ; zc Þ ¼ sc Xfup , where X~dup ¼
where the last convolutional layer’s output in the ½X~ up ; X~ up ; . . . ; X~ up is a channel-wise multiplication
1 2 F
road vectorization network is hðf ð�ÞÞ, and the last between the channel attention, sc is the scale factor,
convolutional layer’s output in the road segmenta and Fscale is the input feature map.
tion model is f ð�Þ. The focal loss is parameterized by
γ and α, and it controls the degree of downweight
2.3. DDSPP module
ing of easy examples and the class weights, respec
tively. The FL simplifies to BCE when γ=0. In this In this work, the DDSPP module was performed on
work, we set the values for γ ¼ 2 and α ¼ 0:25 the feature maps generated by the encoder arms
because the degree of concentrating on hard and to elicit further multi-scale contextual information
easy samples can be increased by higher values of γ and produce a greater number of scale features
and lower values of α. over a broader range. Atrous spatial pyramid pool
ing (ASPP) was first utilized in DeepLab (Chen et al.
2017) to enhance the suggested networks’ perfor
2.2. SE module mance. ASPP is a mixture of spatial pyramid pool
The SE module (Hu, Shen, and Sun 2018) was used to ing and atrous convolution with various atrous
improve the model’s representation power by a con rates. This tool is effective in adjusting the recep
text gating mechanism and attain a clear relationship tive field to catch multi-scale information and in
between the convolutional layer channels. The mod controlling the resolution of the features computed
ule encodes feature maps by allocating a weight for by deep learning networks. In particular, ASPP
every channel in the feature map. The SE module includes (a) an image-level feature that is gener
includes two major parts, called squeeze and excita ated by global average pooling and (b) one con
tion. The first operation is squeeze. The input feature volution with a 1 � 1 filter size and four parallel
maps to SE block are accumulated to generate a convolutions of a 3 � 3 filter size with different
channel descriptor by applying global average pool rates of 2, 4, 8, and 12, as illustrated in Figure 2.
ing (GAP) of the entire context of channels. We have Then, bilinear upsampling was applied to upsam
ple the outcoming features from the entire
Xdup ¼ ½X1up ; X2up ; . . . ; XFup �, in which the input data to
branches to the input size and concatenated and
the SE module are Xfup 2 RW�H , and the spatial
underwent another convolution with 1 � 1.
squeeze is calculated as follows:
However, we used a new module named DDSPP
H X
W (Yang et al. 2018), which combines the benefit of
1 X
zf ¼ Fsq ðXfup Þ ¼ Xfup ði; jÞ (8) cascaded modules with atrous convolution and
H�W i j
ASPP to produce more scale features over a
1158 A. ABDOLLAHI ET AL.
broader range and exploit further multi-scale con achieved by the proposed approach and other com
textual features. The receptive field for atrous con parative techniques for road surface segmentation
volution can be defined as follows: and road vectorization tasks are described.
F ¼ ½ðK 1ÞðR 1Þ þ K� � ½ðK 1ÞðR 1Þ þ K� (9)
where R is the rate, and k is the convolution kernel 3.1. Experimental setting
size. For example, when R= 2 and K= 3, the F is then We utilized some data augmentation strategies, such
equal to 5 × 5. However, we can have a bigger recep as flipping the images vertically and horizontally as
tive field and can create feature pyramids with a more well as rotating them 90°, 180°, and 270° to expand the
denser scale variability by using dense connections size of our training and validation sets and train a
between stacked dilated layers. Assuming that we proper model. Moreover, to dominate the overfitting
have two convolutional operations with K1 and K2 difficulty, we appended a dropout of 0.5 (Srivastava et
kernel sizes, the receptive field can be defined as al. 2014) to the deeper convolutional layers of the
follows: road segmentation network and road vectorization
F ¼ ðK1 þ K2 1Þ � ðK1 þ K2 1Þ (10) network. A computationally affordable yet strong reg
ularization to the model can be provided using this
The new receptive field size will result in 13 × 13 when strategy. Adaptive moment estimation (Adam) opti
the rates are 2 and 4. mizer with 0.001 learning rate was also utilized in this
work to learn the model parameters, such as weights
and biases via optimizing the loss function. The pre
2.4. Inference stage sented RoadVecNet was trained with batch size 2
The road surface segmentation and road vectoriza from scratch except the backbone network that we
tion can be concurrently implemented through the used as the pretrained one. The trained network was
proposed RoadVecNet in the inference stage (Figure then implemented on the test data for road surface
2). A probability road map was achieved by using segmentation and road vectorization. We implemen
the road segmentation network. Then, the road ted the optimization of the networks for 100 epochs
vectorization network transformed the features through the datasets until no more performance
maps of the final convolutional layer generated by improvements were seen. We applied the suggested
using a road segmentation model into vector-based network for road surface segmentation and road vec
possibility maps in the inference stage. Finally, the torization on a GPU Nvidia Quadro RTX 6000 with a
Sobel algorithm was applied to achieve a complete memory of 24 GB and a computing capability of 7.5
and smooth road vectorization network with pre under Keras framework with Tensorflow backend.
cise road width information (Vincent and Folorunso
2009). The Sobel algorithm is an instance of the
gradient approach. In the gradient method, the 3.2. Dataset descriptions
edges are detected by looking for the minimum Two types of remote sensing datasets called
and maximum in the image’s initial derivative. The Massachusetts road imagery (Mnih 2013) containing
Sobel method computes an estimation of the image aerial images with 0.5 m spatial resolution and Ottawa
intensity gradient function and is a discrete differ road imagery (Liu et al. 2018) containing Google Earth
entiation method (Vincent and Folorunso 2009). images with 0.21 m spatial resolution were used to
test the proposed network on the road segmentation
and vectorization. We selected these two different
3. Experiments and Assessment datasets, which contain various road width pixels, to
The experimental settings in the suggested approach show the proposed architecture’s superiority in road
are first introduced in this section. Subsequently, we segmentation and vectorization. Each dataset
described the Massachusetts and Ottawa datasets includes two sub-datasets, namely, road surface seg
used for road segmentation and vectorization. Next, mentation and road vectorization. The detailed infor
the evaluation metrics and quantitative results mation of each dataset is highlighted as follows:
GISCIENCE & REMOTE SENSING 1159
Figure 3. Demonstration of three representative imagery, their segmentation ground truth, and vectorized ground truth maps for the
Massachusetts road imagery. (a), (b), and (c) illustrate the original RGB imagery, corresponding segmentation ground truth maps, and
superposition between vectorized and segmentation ground truth maps, respectively.
Figure 4. Demonstration of three representative imagery and their segmentation ground truth and vectorized ground truth maps for
the Ottawa road imagery. (a), (b), and (c) demonstrate the main RGB images, corresponding segmentation ground truth maps, and
superposition between vectorized and segmentation ground truth maps, respectively.
Figure 5. Visual performance attained by Ours-S against the other comparative networks for road surface segmentation from the
Massachusetts imagery. The cyan, green and blue colors denote the TPs, FPs, and FNs, respectively.
Figure 6. Visual performance attained by the comparative networks for road surface segmentation from the Ottawa imagery. The cyan
green, and blue colors denote the TPs, FPs, and FNs, respectively.
illustrate that the SegNet-S, ResUNet-S, and results than Ours-S. Ours-S could generate high-
DeepLabV3-S networks were sensitive to the barriers resolution road segmentation maps for both datasets
of trees and shadows and predicted more FN pixels by alleviating the effect of obstacles, predicting less FP
(depicted as blue color) and FP pixels (depicted as pixels, and preserving the road border information.
green color), thereby producing low-quality road seg The reason is that we used the DDSPP module to
mentation maps for both datasets. Meanwhile, the create feature pyramids with more denser scale varia
FCN-S, UNet-S, and VNet-S architectures could improve bility and a bigger receptive field. We also utilized the
the results and generate more coherent and satisfac SE module to extract more valuable information by
tory road segmentation maps. However, none of the considering the interdependencies between feature
abovementioned models achieved better qualitative channels. In addition, we applied the MFB_FL loss
1162 A. ABDOLLAHI ET AL.
Figure 7. Visual performance attained by Ours-S against VNet-S network for road surface segmentation from the Ottawa and
Massachusetts imagery. The cyan, green, and blue colors denote the TPs, FPs, and FNs, respectively.
function to overcome highly unbalanced datasets and 3.5. Qualitative comparison of road vectorization
allow more attention on the hard samples. Therefore,
Here, we compared the results attained by the pre
we could obtain more constant and smoother road
sented RoadVecNet architecture for road vectoriza
segmentation and vectorization results.
tion from the Massachusetts and Ottawa datasets
Figure 8. Comparison outcomes of various approaches for road vectorization in visual performance for Ottawa imagery. The first and
second columns demonstrate the original RGB and corresponding reference imagery, respectively. The third, fourth, fifth, sixth, and
last columns demonstrate the results of FCN-V, SegNet-V, UNet-V, DeepLabV3-V, and ResUNet-V. More details can be seen in the
zoomed-in view.
GISCIENCE & REMOTE SENSING 1163
Figure 9. Comparison of the outcomes of the VNet-V approach and Ours-V for road vectorization in terms of visual performance for
Ottawa imagery. The first and second columns demonstrate the original RGB and corresponding reference imagery, respectively. The
third and fourth columns demonstrate the results of VNet-V and Ours-V. More details can be seen in the zoomed-in view.
with the same comparative deep learning methods the FCN-V, SegNet-V ResUNet-V, and DeepLabV3-V
applied in the road surface segmentation part, such as architectures could generate relatively complete
UNet architecture (Ronneberger, Fischer, and Brox road vectorization network, they brought in spurs
2015), DeepLabV3 framework (Chen et al. 2017), and produced some FPs in the homogenous regions
SegNet network (Badrinarayanan, Kendall, and where the road was covered by occlusions and
Cipolla 2017), VNet applied by Abdollahi, Pradhan, around the intersections, reducing the correctness
and Alamri (2020), ResUNet provided by and smoothness of the road vectorization network.
Diakogiannis et al. (2019), and FCN (Long, The UNet-V and VNet-V methods could improve the
Shelhamer, and Darrell 2015). We utilized the suffix “- results and generate a complete network of the road
V” after every approach’s name to denote road vec vectorization; however, it failed to vectorize the road
torization. Figures 9 and 10 demonstrate the compar in the intersection parts and brought in some discon
ison outcomes of various approaches and the tinuity and FPs. Figures 10 and 11 demonstrate the
presented RoadVecNet for road vectorization in visual visual performance of the comparative models for
performance for Ottawa imagery. The vectorized road Massachusetts imagery. In this dataset, the complex
ground truth map is also included in the second ity of obstacles and backgrounds are more, and the
column of the figure to better display the contrast road width is less than those in the Ottawa dataset.
influences. We also used blue rectangular boxes in the Accordingly, all the above-mentioned comparative
figures to show the FP and FN pixels for facilitating models, including VNet-V, could not accurately vec
comparison. Figures 8 and 9 illustrate that although torize the road, resulting in non-complete and non-
1164 A. ABDOLLAHI ET AL.
Figure 10. Comparison of the outcomes of various approaches for road vectorization in terms of visual performance for Massachusetts
imagery. The first and second columns demonstrate the original RGB and corresponding reference imagery, respectively. The third,
fourth, and fifth columns demonstrate the results of FCN-V, SegNet-V, and DeepLabV3-V, respectively. More details can be seen in the
zoomed-in view.
smooth vectorized road network, especially for com along with other comparative convolutional net
plex backgrounds and intersection areas where they works, could attain satisfactory outcomes for road
brought in more discontinuity and FPs. By contrast, segmentation from both datasets. However, the
Ours-V could detect complete and non-spur vector DeepLabV3-S, ResUNet-S, and SegNet-S architectures
ized road network even from the Massachusetts data achieved the lowest F1 score accuracy with 85.83%.
set with narrow road width and complex 86.97%, and 87% for Massachusetts and 90.54%,
backgrounds. Our vectorized road map is more similar 90.72%, and 91.48% for Ottawa. The SegNet-S model
to the actual ground truth vectorized road than the could slightly improve the accuracy because it utilizes
other comparative models. the max-pooling indices at the encoder and corre
sponding decoder paths to upsample the layers in
the decoding process. The model does not need to
4. Discussion
learn the upsampling weights again because this
We obtained the quantitative calculations for the pre function makes the training process more
sented technique and other comparative networks straightforward.
applied to the Massachusetts and Ottawa datasets Tables 1 and 2 also show that the VNet-S frame
for road segmentation, which are summarized in work was the second-best approach in road surface
Tables 1 and 2, respectively. The first four columns in segmentation, with 91.45% for Massachusetts and
both tables are the performance of four test sample 92.02% for Ottawa. By contrast, the accuracy of the
imagery, and the final column is the average accuracy F1 score metric for Ours-S was higher than all the
of the whole test imagery. The bold value is the best comparative approaches. In fact, the presented
in the F1 score metric, while the underlined values are model could improve the F1 score accuracy by
the second-best. Tables 1 and 2 illustrate that Ours-S, 1.06% for Massachusetts and 1.38% for Ottawa
GISCIENCE & REMOTE SENSING 1165
Figure 11. Comparison outcomes of our approach and the other comparative models for road vectorization in visual performance for
Massachusetts imagery. The first column demonstrates the original RGB imagery. The second, third, fourth, and last columns
demonstrate the results of ResUNet-V, UNet-V, VNet-V, and Ours-V, respectively. More details can be seen in the zoomed-in view.
Table 1. Percentage of F1 score, MCC, and IOU attained by Ours-S and other comparative networks for road
segmentation from Massachusetts imagery. The bold and underline F1 scores demonstrate the best and second-
best, respectively.
Image1 Image2 Image3 Image4 Average
FCN-S F1 score 0.9104 0.9117 0.9028 0.9007 0.9064
MCC 0.9008 0.9037 0.8910 0.8901 0.8964
IOU 0.8338 0.8360 0.8212 0.8176 0.8272
SegNet-S F1 score 0.8680 0.8909 0.8701 0.8511 0.8700
MCC 0.8554 0.8838 0.8573 0.8324 0.8572
IOU 0.7654 0.8017 0.7686 0.7394 0.7688
UNet-S F1 score 0.9128 0.9141 0.9075 0.9073 0.9104
MCC 0.9017 0.9057 0.8984 0.8942 0.9000
IOU 0.8378 0.8570 0.8289 0.8286 0.8381
VNet-S F1 score 0.9122 0.9192 0.9084 0.9173 0.9145
MCC 0.9023 0.9108 0.8965 0.9067 0.9040
IOU 0.8385 0.8504 0.8322 0.8473 0.8421
ResUNet-S F1 score 0.8632 0.8882 0.8668 0.8609 0.8697
MCC 0.8493 0.8806 0.8539 0.8453 0.8572
IOU 0.7593 0.7988 0.7649 0.7557 0.7696
DeeplabV3-S F1 score 0.8564 0.8798 0.8468 0.8503 0.8583
MCC 0.8383 0.8693 0.8294 0.8303 0.8418
IOU 0.7475 0.7839 0.7330 0.7382 0.7507
Ours-S F1 score 0.9243 0.9239 0.9282 0.9240 0.9251
MCC 0.9143 0.9168 0.9190 0.9128 0.9157
IOU 0.8574 0.8740 0.8641 0.8568 0.8631
compared with the VNet-S network, which was the based segmentation method (Wei, Zhang, and Ji
second-best model. Furthermore, we compared the 2020), road structure-refined CNN (RSRCNN) techni
quantitative results achieved by the proposed model que (Wei, Wang, and Xu 2017), and FCNs approach
with more deep learning-based models, such as CNN- (Zhong et al. 2016) applied for road segmentation
1166 A. ABDOLLAHI ET AL.
Table 2. Percentage of F1 score, MCC, and IOU attained by Ours-S and other comparative networks for road
segmentation from Ottawa imagery. The bold and underline values demonstrate the best and second-best,
respectively.
Image1 Image2 Image3 Image4 Average
FCN-S F1 score 0.8829 0.9150 0.9375 0.9282 0.9159
MCC 0.8453 0.8887 0.9103 0.8796 0.8810
IOU 0.7888 0.8415 0.8803 0.8641 0.8437
SegNet-S F1 score 0.8816 0.9302 0.9371 0.9103 0.9148
MCC 0.8432 0.9053 0.9102 0.8572 0.8790
IOU 0.7867 0.8676 0.8797 0.8336 0.8419
UNet-S F1 score 0.8849 0.9329 0.9321 0.9231 0.9183
MCC 0.8477 0.9108 0.9028 0.8711 0.8831
IOU 0.7921 0.8722 0.8709 0.8554 0.8477
VNet-S F1 score 0.8933 0.9294 0.9390 0.9191 0.9202
MCC 0.8597 0.9070 0.9137 0.8678 0.8870
IOU 0.8072 0.8681 0.8850 0.8502 0.8526
ResUNet-S F1 score 0.8761 0.9160 0.9372 0.8995 0.9072
MCC 0.8137 0.8887 0.9110 0.8159 0.8573
IOU 0.7484 0.8450 0.8818 0.7949 0.8175
DeeplabV3-S F1 score 0.8731 0.9274 0.9330 0.8884 0.9054
MCC 0.8101 0.9016 0.9027 0.8229 0.8593
IOU 0.7427 0.8627 0.8725 0.7759 0.8135
Ours-S F1 score 0.8992 0.9412 0.9434 0.9520 0.9340
MCC 0.8666 0.9202 0.9186 0.9190 0.9061
IOU 0.8152 0.8869 0.8909 0.9062 0.8748
Table 3. Percentage of F1 score and MCC attained by Ours-V and other comparative networks for road vectorization
from the Ottawa imagery. The bold and underline values denote the best and second-best, respectively.
Image1 Image2 Image3 Image4 Average
FCN-V F1 score 0.8643 0.9017 0.8893 0.8893 0.8862
MCC 0.8551 0.8953 0.8825 0.8821 0.8788
SegNet-V F1 score 0.8622 0.8702 0.8820 0.8776 0.8730
MCC 0.8563 0.8658 0.8782 0.8722 0.8681
UNet-V F1 score 0.8999 0.9134 0.9072 0.9288 0.9123
MCC 0.8931 0.9076 0.9015 0.9241 0.9066
DeeplabV3-V F1 score 0.8566 0.8699 0.8804 0.8742 0.8703
MCC 0.8513 0.8650 0.8763 0.8695 0.8655
VNet-V F1 score 0.9045 0.9129 0.9038 0.9297 0.9127
MCC 0.8985 0.9071 0.8973 0.9254 0.9070
ResUNet-V F1 score 0.8614 0.8734 0.8839 0.8874 0.8765
MCC 0.8529 0.8659 0.8771 0.8807 0.8691
Ours-V F1 score 0.9203 0.9237 0.9164 0.9358 0.9241
MCC 0.9149 0.9187 0.9110 0.9315 0.9190
Table 4. Percentage of F1 score and MCC attained by Ours-V and other comparative networks for road vectorization
from Massachusetts imagery. The bold and underline values denote the best and second-best, respectively.
Image1 Image2 Image3 Image4 Average
FCN-V F1 score 0.7982 0.8350 0.8204 0.8503 0.8260
MCC 0.8004 0.8273 0.8095 0.8510 0.8221
SegNet-V F1 score 0.7917 0.8326 0.8047 0.8424 0.8179
MCC 0.7763 0.8232 0.7911 0.8333 0.8060
UNet-V F1 score 0.8129 0.8458 0.8237 0.8514 0.8335
MCC 0.7994 0.8478 0.8244 0.8421 0.8284
DeeplabV3-V F1 score 0.7539 0.8263 0.7749 0.8237 0.7947
MCC 0.7350 0.8176 0.7586 0.8114 0.7807
VNet-V F1 score 0.8206 0.8470 0.8208 0.8619 0.8373
MCC 0.8069 0.8388 0.8083 0.8535 0.8268
ResUNet-V F1 score 0.7771 0.8307 0.7814 0.8298 0.8047
MCC 0.7596 0.8218 0.7653 0.8180 0.7911
Ours-V F1 score 0.8854 0.8834 0.8878 0.9129 0.8924
MCC 0.8762 0.8754 0.8794 0.9066 0.8844
GISCIENCE & REMOTE SENSING 1167
from Massachusetts imagery. The presented method model. Nevertheless, this method could not perform
was built and evaluated on an experimental dataset, well in road vectorization using the Massachusetts
while the outcomes for the other three works were imagery (Table 4) and predicted more FPs and less
taken from a previously published study. The F1 score FNs, resulting in less F1 score accuracy with 83.73%,
accuracy achieved by the CNN-based approach, which is not very good. This phenomenon is attribu
RSRCNN, and FCNs were 82%, 66.2%, and 68%, ted to the aerial images that have more complex
respectively, while that of Ours-S approach is backgrounds and occlusions, and the road width is
92.51%. The results confirmed that the more super narrow. The other methods that could not achieve a
vised information in the presented model obtains higher F1 score accuracy than VNet-V for Ottawa
better outcomes against the other preexisting deep images could not obtain a higher accuracy for
learning approaches in road surface segmentation Massachusetts images as well. By contrast, Ours-V
from the HRSI. was able to achieve better results than others for
We calculated the F1 score and MCC metrics to both datasets. Ours-V achieved F1 score accuracy
better probe the capability of Ours-V and other com rates of 92.41% and 89.24% for Ottawa and
parative models in road vectorization. The qualitative Massachusetts imagery, respectively. Ours-V could
outcomes for the Ottawa (Google Earth) and improve the results of VNet-V (second-best method)
Massachusetts (aerial) imagery are demonstrated in to 1.14% for Ottawa and 5.51% for Massachusetts,
Table 3 and Table 4, respectively. Table 3 and Table 4 which confirmed its validity for road vectorization
show that the VNet-V could achieve satisfactory from Google Earth and Arial imagery.
results for road vectorization from the Ottawa ima The average F1 score accuracy attained by our
gery with 91.27% F1 score accuracy, which could approach and other comparative approaches in road
improve the results of other comparative models, surface segmentation and road vectorization from
such as FCN-V, DeepLabV3-V, ResUNet-V, UNet-V, both datasets is plotted in Figure 12(a,b), respectively.
and SegNet-V, and it was ranked as the second-best The approaches and the average percentage of the F1
Figure 12. Average percentage of the F1 score metric of our method and other methods for road surface segmentation (a) and road
vectorization (b) from Ottawa and Massachusetts imagery.
Figure 13. Performance of the proposed model for road segmentation and vectorization through training epochs: training and
validation losses for the (a) Ottawa and (b) Massachusetts datasets.
1168 A. ABDOLLAHI ET AL.
Table 5. Percentage of the F1 score, IOU, and MCC attained by 4.1. Ablation study
Ours-V network for road segmentation and vectorization from
the Massachusetts and Ottawa imagery after changing several We conducted some tests to see how different set
settings. tings affected the model’s performance in road sur
Road Segmentation Ottawa F1 score 0.9177
MCC 0.8984 face segmentation and vectorization. In this case, we
IOU 0.8684 used PSPNet backbone (Zhao, Shi et al. 2017), sto
Massachusetts F1 score 0.8775
MCC 0.8662 chastic gradient descent with a 0.01 learning rate,
IOU 0.7818 and batch size of 4. The quantitative results for both
Road Vectorization Ottawa F1 score 0.9002
MCC 0.9048 tasks are shown in Table 5. Meanwhile, the visualiza
Massachusetts F1 score 0.8532 tion results for road segmentation and vectorization
MCC 0.8460
tasks are depicted in Figures 14 and 15, respectively.
Table 5 illustrates that the accuracy of the F1 score
score metric are shown in the horizontal and vertical decreased to 91.77% and 87.75% for road segmenta
axes, respectively. Figure 12 depicts that the Ours-S tion and 90.02% and 85.32% for road vectorization for
and Ours-V methods achieved the highest F1 score, the Massachusetts and Ottawa images, respectively,
affirming the superiority of the proposed technique after changing some settings. Figures 14 and 15 also
for road vectorization from Google Earth and Arial show that the proposed model brought in spurs and
imagery. Figure 13(a,b) display the training and vali produced some FPs in the homogenous regions,
dation losses of the presented approach over 100 thereby considerably decreasing the smoothness of
epochs for Ottawa and Massachusetts imagery, the road vectorization network.
respectively. Based on the decrease in model loss,
the method has learned efficient features for road
surface segmentation and vectorization. The training 4.2. Failure case analysis
and validation losses are close together in the learn
In this case, we conducted some failure case analysis
ing curve for both datasets. The model reduced over-
by reducing the size of images to 256 × 256 to check
fitting, and the variance of the method is negligible.
the model’s performance on road segmentation and
Figure 14. Visual performance attained by Ours-S network for road surface segmentation from the Ottawa and Massachusetts imagery
after changing several settings. The cyan, green, and blue colors denote the TPs, FPs, and FNs, respectively.
GISCIENCE & REMOTE SENSING 1169
Figure 15. Visual performance attained by Ours-V for road vectorization from the Massachusetts and Ottawa imagery after changing
several settings. The blue rectangle shows the predicted FPs and FNs. More details can be seen in the zoomed-in view.
Table 6. Percentage of the F1 score, IOU, and MCC attained by for road vectorization, especially for complicated
Ours-V network for road segmentation and vectorization from and intersection areas wherein they brought in
the Massachusetts and Ottawa imagery after analyzing a failure
case. more FPs when we decreased the image size. The
Road Segmentation Ottawa F1 score 0.8887 model could learn considerably less, and the images
MCC 0.8291 were distinguished as failure due to overfitting when
IOU 0.7997
Massachusetts F1 score 0.8444 the image size was reduced. Accordingly, the detec
MCC 0.8226
IOU 0.7308
tion accuracy was greatly diminished. Therefore,
Road Vectorization Ottawa F1 score 0.8702 reducing the image input size was ineffective for
MCC 0.8637
Massachusetts F1 score 0.8161 producing high-quality road segmentation and vec
MCC 0.8045 torization maps.
Figure 16. Visual performance attained by Ours-S network for road surface segmentation from the Ottawa and Massachusetts imagery
after analyzing a failure case. The cyan, green, and blue colors denote the TPs, FPs, and FNs, respectively.
Figure 17. Visual performance attained by Ours-V for road vectorization from the Massachusetts and Ottawa imagery after analyzing a
failure case. The blue rectangle shows the predicted FPs and FNs. More details can be seen in the zoomed-in view.
GISCIENCE & REMOTE SENSING 1171
Figure 18. The vectorized road is superimposed with the original Aerial (Massachusetts) and Google Earth (Ottawa) imagery to show
the overall geometric quality of vectorized outcomes. The first and second rows demonstrate the Aerial images, and the third and last
rows illustrate the Google Earth images. The last column also demonstrates the superimposed vectorized road. More details can be
seen in the zoomed-in view.
advantage of the proposed model was verified with it could not segment road and vectorize well,
rigorous experiments: 1) Two different road datasets thereby resulting in discontinuity for large and con
imagery called Ottawa (Google Earth) and tinuous areas of obstacles. These issues are the
Massachusetts (Aerial) datasets, which comprise the primary drawbacks of the suggested technique for
original RGB images, corresponding ground truth road surface segmentation and vectorization. Future
segmentation maps, and corresponding ground study can address these constraints by incorporat
truth vector maps, were employed to test the ing topological criteria and gap-filling methods into
model for road segmentation and vectorization. 2) our proposed road extraction and vectorization
In the road surface segmentation tasks, the pro method to improve its accuracy.
posed RoadVecNet could achieve more consistent
and smooth road segmentation outcomes than all
the comparative models in terms of visual and qua Highlights
litative performance. 3) In the road vectorization
● A new RoadVecNet is applied for road segmentation and
task, RoadVecNet also showed better performance
vectorization simultaneously.
than the other comparative state-of-the-art deep ● SE and DDSPP modules are used to improve the accuracy.
convolutional architectures. Figure 18 demonstrates ● Sobel edge detection method is used to obtain complete
the vectorized road results overlaid on the original and smooth road edge networks
Google Earth and Aerial imagery to prove the over ● Two different Massachusetts and Ottawa datasets are used
all geometric quality of the road segmentation and for road vectorization.
● MFB_FL loss function is utilized to overcome highly unba
vectorization by the model. We calculated the root-
lanced training dataset.
mean-square (RMS) of road widths based on the
quadratic mean distance between the matched
references and extracted widths. The vectorization Data availability
of the classified outcomes achieved width RMS
values of 1.47 and 0.63 m for Massachusetts and The Massachusetts and Ottawa datasets and developed code
Ottawa images, respectively, proving that the pro that is uploaded at Github can be downloaded from the online
versions at https://www.cs.toronto.edu/~vmnih/data/, https://
posed model could achieve precise information
github.com/gismodelling/RoadVecNet, and https://github.
about road width. Moreover, the proposed network com/yhlleo/RoadNet.
could extract the precise location of the road net
work because the vectorized road maps are well
superimposed with the original imagery. The pro Disclosure statement
posed RoadVecNet model showed robustness
against the obstacle to a certain extent; however, No potential conflict of interest was reported by the author(s).
1172 A. ABDOLLAHI ET AL.
Lin, T., P. Goyal, R. Girshick, K. He, and P. Dollár. 2017. “Focal Ševo, I., and A. Avramović. 2016. “Convolutional Neural
Loss for Dense Object Detection.” Proceedings of the IEEE Network Based Automatic Object Detection on Aerial
International Conference on Computer Vision: 2980–2988. Images.” IEEE Geoscience and Remote Sensing Letters 13 (5):
Venice, Italy. 740–744. doi:10.1109/LGRS.2016.2542358.
Liu, R., Q. Miao, J. Song, Y. Quan, Y. Li, P. I. Xu, and J. Dai. 2019. Sarhan, E., E. Khalifa, and A. M. Nabil. 2011. “Road Extraction
“Multiscale Road Centerlines Extraction from Framework by Using Cellular Neural Network from Remote
High-resolution Aerial Imagery.” Neurocomputing 329: Sensing Images.” In 2011 International Conference on Image
384–396. doi:10.1016/j.neucom.2018.10.036. Information Processing: 1–5. Shimla, India.
Liu, Y., J. Yao, X. Lu, M. Xia, X. Wang, and Y. Liu. 2018. “Roadnet: Shao, Z., Z. Zhou, X. Huang, and Y. Zhang. 2021. “MRENet:
Learning to Comprehensively Analyze Road Networks in Simultaneous Extraction of Road Surface and Road
Complex Urban Scenes from High-resolution Remotely Centerline in Complex Urban Scenes from Very
Sensed Images.” IEEE Transactions on Geoscience and High-resolution Images.” Remote Sensing 13 (2): 239.
Remote Sensing 57 (4): 2043–2056. doi:10.1109/ doi:10.3390/rs13020239.
TGRS.2018.2870871. Shen, Z., J. Luo, and L. Gao. 2010. “Road Extraction from High-
Long, J., E. Shelhamer, and T. Darrell. 2015. “Fully Convolutional resolution Remotely Sensed Panchromatic Image in
Networks for Semantic Segmentation.” Proceedings of the Different Research Scales.” In 2010 IEEE International
IEEE Conference on Computer Vision and Pattern Geoscience and Remote Sensing Symposium: 453–456.
Recognition: 6810–6818. Boston, MA, USA. Honolulu, HI, USA.
Luo, Y., J. Li, C. Yu, B. Xu, Y. Li, L. Hsu, and N. El-Sheimy. 2019. Simonyan, K., and A. Zisserman. 2014. “Very Deep
“Research on Time-correlated Errors Using Allan Variance in Convolutional Networks for Large-scale Image
a Kalman Filter Applicable to Vector-tracking-based GNSS Recognition.” Availavle from: https://arxiv.org/abs/1409.
Software-defined Receiver for Autonomous Ground Vehicle 1556
Navigation.” Remote Sensing 11 (9): 1026. doi:10.3390/ Srivastava, N., G. Hinton, A. Krizhevsky, and R. Salakhutdinov.
rs11091026. 2014. “Dropout: A Simple Way to Prevent Neural Networks
Maboudi, M., J. Amini, M. Hahn, and M. Saati. 2017. “Object- from Overfitting.” Journal of Machine Learning Research 15:
based Road Extraction from Satellite Images Using Ant 1929–1958.
Colony Optimization.” International Journal of Remote Unsalan, C., and B. Sirmacek. 2012. “Road Network Detection
Sensing 38 (1): 179–198. doi:10.1080/ Using Probabilistic and Graph Theoretical Methods.” IEEE
01431161.2016.1264026. Transactions on Geoscience and Remote Sensing 50 (11):
Miao, Z., W. Shi, H. Zhang, and X. Wang. 2012. “Road Centerline 4441–4453. doi:10.1109/TGRS.2012.2190078.
Extraction from High-resolution Imagery Based on Shape Vincent, O., and O. Folorunso. 2009. “A Descriptive Algorithm
Features and Multivariate Adaptive Regression Splines.” for Sobel Image Edge Detection.” In Proceedings of
IEEE Geoscience and Remote Sensing Letters 10 (3): 583–587. Informing Science & IT Education Conference (Insite) 40:
doi:10.1109/LGRS.2012.2214761. 97–107. Macon, United States.
Mnih, V. 2013. “Machine Learning for Aerial Image Labeling.” Wei, Y., K. Zhang, and S. Ji. 2020. “Simultaneous Road
Ph.D. dissertation, Dept. Comput. Sci., Univ. Toronto, Surface and Centerline Extraction from Large-scale
Toronto, ON, Canada. Remote Sensing Images Using CNN-based
Mnih, V., and G. E. Hinton. 2010. “Learning to Detect Roads in Segmentation and Tracing.” IEEE Transactions on
High-Resolution Aerial Images.” Berlin, Heidelberg, 210–223. Geoscience and Remote Sensing 58 (12): 8919–8931.
10.1007/978-3-642-15567-3_16. doi:10.1109/TGRS.2020.2991733.
Movaghati, S., A. Moghaddamjoo, and A. Tavakoli. 2010. Wei, Y., Z. Wang, and M. Xu. 2017. “Road Structure Refined Cnn
“Road Extraction from Satellite Images Using Particle for Road Extraction in Aerial Image.” IEEE Geoscience and
Filtering and Extended Kalman Filtering.” IEEE Remote Sensing Letters 14 (5): 709–713. doi:10.1109/
Transactions on Geoscience and Remote Sensing 48 (7): LGRS.2017.2672734.
2807–2817. doi:10.1109/TGRS.2010.2041783. Xie, Y., F. Miao, K. Zhou, and J. Peng. 2019. “HsgNet: A Road
Qiaoping, Z., and I. Couloigner. 2004. “Automatic Road Change Extraction Network Based on Global Perception of
Detection and GIS Updating from High Spatial High-order Spatial Information.” ISPRS International Journal
Remotely-sensed Imagery.” Geo-Spatial Information Science of Geo-Information 8 (12): 571. doi:10.3390/ijgi8120571.
7 (2): 89–95. doi:10.1007/BF02826642. Xu, Y., Y. Feng, Z. Xie, A. Hu, and X. Zhang. 2018. “A Research on
Ronneberger, O., P. Fischer, and T. Brox. 2015. “U-net: Extracting Road Network from High Resolution Remote
Convolutional Networks for Biomedical Image Sensing Imagery.” 26th International Conference on
Segmentation.” International Conference on Medical Image Geoinformatics, Kunming, China, 1–4. 10.1109/
Computing and Computer-Assisted Intervention: 234–241. GEOINFORMATICS.2018.8557042.
Munich, Germany. Yang, M., K. Yu, C. Zhang, Z. Li, and K. Yang. 2018. “Denseaspp
Saito, S., and Y. Aoki. 2015. “Building and Road Detection from for Semantic Segmentation in Street Scenes.” In Proceedings
Large Aerial Imagery.” Image Processing: Machine Vision of the IEEE Conference on Computer Vision and Pattern
Applications 9405: 94050. Recognition: 3684–3692. Salt Lake City, UT, USA.
1174 A. ABDOLLAHI ET AL.
Yang, X., X. Li, Y. Ye, R. Y. K. Lau, X. Zhang, and X. Huang. 2019. Zhao, W., S. Du, and W. J. Emery. 2017. “Object-based
“Road Detection and Centerline Extraction via Deep Convolutional Neural Network for High-resolution
Recurrent Convolutional Neural Network U-net.” IEEE Imagery Classification.” IEEE Journal of Selected Topics
Transactions on Geoscience and Remote Sensing 57 (9): in Applied Earth Observations and Remote Sensing 10 (7):
7209–7220. doi:10.1109/TGRS.2019.2912301. 3386–3396. doi:10.1109/JSTARS.2017.2680324.
Yi, W., Y. Chen, H. Tang, and L. Deng. 2010. “Experimental Zhong, Y., F. Fei, Y. Liu, B. Zhao, H. Jiao, and L. Zhang. 2017.
Research on Urban Road Extraction from High-resolution “SatCNN: Satellite Image Dataset Classification Using Agile
RS Images Using Probabilistic Topic Models.” IEEE Convolutional Neural Networks.” Remote Sensing Letters 8
International Geoscience and Remote Sensing Symposium: (2): 136–145. doi:10.1080/2150704X.2016.1235299.
445–448. Honolulu, HI, USA. Zhong, Z., J. Li, W. Cui, and H. Jiang. 2016. “Fully Convolutional
Zhang, Z., L. Qingjie, and W. Yunhong. 2018. “Road Extraction Networks for Building and Road Extraction: Preliminary
by Deep Residual U-net.” IEEE Geoscience and Remote Results.” IEEE International Geoscience and Remote Sensing
Sensing Letters 15 (5): 749–753. doi:10.1109/ Symposium (IGARSS), Beijing, China: 1591–1594.
LGRS.2018.2802944.
Zhao, H., J. Shi, X. Qi, X. Wang, and J. Jia. 2017. “Pyramid Scene Zhou, W., S. Newsam, C. Li, and Z. Shao. 2017. “Learning Low
Parsing Network.” In Proceedings of the IEEE Conference on Dimensional Convolutional Neural Networks for
Computer Vision and Pattern Recognition: 2881–2890. High-resolution Remote Sensing Image Retrieval.” Remote
Honolulu, HI, USA. Sensing 9 (5): 489. doi:10.3390/rs9050489.