Automatic Segmentation of River and Land in SAR Images A Deep Learning Approach
Automatic Segmentation of River and Land in SAR Images A Deep Learning Approach
Radhika M. Pai
Department of Information and Communication Technology
Manipal Institute of Technology
Manipal Academy of Higher Education
Manipal, India
[email protected]
Abstract—The ubiquitousness of satellite imagery and power- component to the flow of the water. Extreme changes to
ful, computationally efficient Deep Learning frameworks have surface water can have serious repercussions such as floods
found profound use in the field of remote sensing. Augmented which is currently the most common natural disaster to affect
with easy access to abundant image data made available by
different satellites such as LANDSAT and European Space the world . Thus, it is imperative to develop approaches to
Agency’s Copernicus missions, deep learning has opened various detect and constantly monitor and predict future water levels.
avenues of research in monitoring the world’s oceans, land, rivers, The recent large-scale proliferation of remote-sensing satel-
etc. One significant problem in this direction is the accurate lites such as the Sentinel-1, Landsat, and Radarsat resulted
identification and subsequent segmentation of surface-water in in regular monitoring of the Earth at high-frequency periodic
images in the microwave spectrum. Typically, standard image
processing tools are used to segment the images which are time- intervals. Further, these satellites are equipped with high-
inefficient. However, in recent years, deep learning methods for resolution microwave sensors capable of imagery in all terrain
semantic segmentation is the preferred choice given its high conditions as well as showing invariance to day and night
accuracy and ease of use. This paper proposes the use of cycles. One of the significant advantages is their ability to
deep-learning approaches such as U-Net to perform an efficient penetrate thick cloud cover.
segmentation of river and land. Experimental results show that
our approach achieves vastly superior performance on SAR Detection of surface water in Synthetic Aperture Radar
images with pixel accuracy of 0.98 and F1 score of 0.99. (SAR) images until now has largely been addressed by elab-
Index Terms—Semantic Segmentation, SAR image, U-Net, orate image processing algorithms. Some commonly applied
Deep Learning. approaches include the watershed algorithm [1], thresholding
[2] and morphological profiling followed by traditional ma-
I. I NTRODUCTION chine learning algorithms such as Support Vector Machines
Observation of surface water is an essential component in (SVM’s). Although these algorithms have been shown to
understanding and deriving insights about local ecological and perform effectively for a specific polarization, the model does
hydrological processes. Surface water, in contrast to atmo- not generalize across polarizations. In addition, the presence
spheric or groundwater, largely refers to water on the surface of foreign objects such as bridges results in gaps in the output
of Earth such as rivers, lakes and wetlands. Surface water [1]. This and the significant time investment in terms of hand-
is often subject to external forces that result in expansions, tuning paves the way for more robust approaches such as
contractions and changes in appearance, lending a dynamic neural networks.
16
of reusing the weights of a pre-trained model for a similar
task. Such an initialisation has been proven to work better
than random weight initialization [14]. In this paper, it is
proposed to compare the results obtained by learning the
U-Net model from scratch and using the pre-trained weights
as learnt by the U-Net model on the ISBI 2015 Cell Tracking
dataset.
17
fraction of the water-body pixels which are labelled correctly
and Recall is the fraction of all the labeled water-body pixels
that are correctly predicted. They are formulated as follows:
T rue P ositive
P recision =
T rue P ositive + F alse P ositive
T rue P ositive
Recall =
T rue P ositive + F alse N egative
Since neither Precision nor Recall is sufficient to completely
describe the performance of a model, the F1 score, a weighted
harmonic mean of the two metrics is also employed. It is given
as follows:
2 · precision · recall
F1 =
precision + recall
The models are also evaluated with respect to the Mean
Intersection over Union (MIoU) as well as the Pixel Accuracy
(PA). The MIoU measures the similarity between the labelled
image and the ground truth. Pixel accuracy simply reports the
number of pixels correctly classified by the model. The two
metrics are as follows:
(1/C) i nii
M IoU =
ti + j nji − nii
nii
P A = i
i ti
C. Results
In this work, two models are used: Vanilla U-Net and
Transfer U-net. The Vanilla U-Net architecture is trained on
our SAR images dataset, while Transfer U-Net is trained on
Fig. 2. Examples from the training set, column (a) corresponds to SAR the 2015 ISBI Cell Tracking challenge and fed the weights
images and column (b) is the ground truth.
generated for our transfer learning task. The results of the
transfer learning task represent the best results i.e the one
obtained by retraining the decoder of the U-Net model. The
Loss Function: For the U-Net model, the soft-dice loss network architectures are evaluated on 10 test images with
function [15] is a typically preferred loss function to reduce respect to the metrics defined above. The quantitative results
bias in predictions. It is especially useful when the training of the models is shown in Table I. Qualitatively, the visual
data suffers from a class imbalance problem. It is formulated outputs of the segmentation models can be seen in Figure 3.
as follows:
2 |A ∩ B|
Dice Loss = TABLE I
|A| + |B| P ERFORMANCE C OMPARISON
The numerator, A ∩ B represents the intersection of the sets
Architecture Precision Recall F1 MIoU PA
A and B and the denominator, the number of elements of A
Vanilla U-Net 0.9927 0.9919 0.9923 0.9551 0.9876
and B respectively. In this study, the soft-dice loss function
Transfer U-Net 0.9943 0.9881 0.9912 0.9512 0.9859
is used because of the class imbalance problem as discussed
earlier.
Both the architectures perform very well on SAR images
B. Evaluation Metrics and obtain a good F1 score of 0.99. Unlike the traditional
To evaluate the algorithms, several metrics namely Preci- image processing approaches such as Watershed segmentation,
sion, Recall, Mean Intersection over Union (MIoU) as well U-Net models can identify the fine details. Moreover, the
as the Pixel Accuracy (PA) is used [18]. Precision is the transfer U-net architecture is able to even segment rivers whose
18
width is small. However, Vanilla U-Net fails to identify small [7] Z. Cui, S. Dang, Z. Cao, S. Wang, and N. Liu, “Sar target
recognition in large scene images via region-based convolutional neural
width-rivers. networks,” Remote Sensing, vol. 10, no. 5, 2018. [Online]. Available:
One more interesting result is surprising effectiveness of http://www.mdpi.com/2072-4292/10/5/776
the Transfer Learning approach. An extensive experimentation [8] W. Yang, L. Chen, D. Dai, and G.-S. Xia, “Semantic labelling of sar
images with conditional random fields on region adjacency graph,”
with retraining selected decoder layers is done. Retraining only Radar, Sonar & Navigation, IET, vol. 5, pp. 835 – 841, 11 2011.
the last 1x1 Convolution layer results in predictions that are [9] C. Henry, S. M. Azimi, and N. Merkle, “Road segmentation in sar
significantly noisy. Similarly by retraining only the second last satellite images with deep fully convolutional neural networks,” IEEE
Geoscience and Remote Sensing Letters, vol. 15, no. 12, pp. 1867–1871,
convolution layer produces a model with lesser noise and so Dec 2018.
forth. It is discovered that increasing the number of trainable [10] Z. Zhang, Q. Liu, and Y. Wang, “Road extraction by deep residual u-
layers in the decoder architecture results in successively bet- net,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp.
749–753, May 2018.
ter performance. Performance peaks on retraining the entire [11] “European copernicus open access hub.” [Online]. Available:
decoder architecture which results in a model that performs https://scihub.copernicus.eu
comparatively to the U-Net model that learned its weights from [12] “Sentinel application platform.” [Online]. Available:
http://step.esa.int/main/toolboxes/snap/snap-faq/
scratch. It is seen that the spatial features map very well from [13] A. S. Yommy, R. Liu, and S. Wu, “Sar image despeckling using refined
the biomedical image segmentation problem over to surface lee filter,” 2015 7th International Conference on Intelligent Human-
water segmentation in SAR data. Machine Systems and Cybernetics, vol. 2, pp. 260–265, 2015.
[14] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are
features in deep neural networks?” CoRR, vol. abs/1411.1792, 2014.
V. C ONCLUSION [Online]. Available: http://arxiv.org/abs/1411.1792
[15] F. Milletari, N. Navab, and S. Ahmadi, “V-net: Fully convolutional
In this paper, a robust methodology is proposed for an neural networks for volumetric medical image segmentation,” in 2016
Fourth International Conference on 3D Vision (3DV), Oct 2016, pp.
efficient and highly precise segmentation of surface river water 565–571.
and land. In addition, two different implementation of U-Net [16] A. Dosovitskiy, J. T. Springenberg, M. Riedmiller, and T. Brox,
architecture is studied on SAR images, one in which U-net “Discriminative unsupervised feature learning with convolutional neural
networks,” in Advances in Neural Information Processing Systems 27,
is trained from scracth (Vanilla U-Net) and other in which Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q.
pretrained weights are used (Transfer U-Net). Experimental Weinberger, Eds. Curran Associates, Inc., 2014, pp. 766–774. [Online].
results show that the both architectures gave similar perfor- Available: http://papers.nips.cc/paper/5548-discriminative-unsupervised-
feature-learning-with-convolutional-neural-networks.pdf
mance in terms of F1 score, pixel accuracy and mean IoU. [17] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
However, transfer U-Net is able to identify very minute details Surpassing human-level performance on imagenet classification,” in
in the image such as small rivers etc. One limitation however, 2015 IEEE International Conference on Computer Vision (ICCV), Dec
2015, pp. 1026–1034.
to this approach is the possibility of false positives, that is [18] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
the model may identify water in regions of relatively low for semantic segmentation,” in Proceedings of the IEEE conference on
intensity. For our future work, we would extend this model computer vision and pattern recognition, 2015, pp. 3431–3440.
for multi-class classification and introduce information from
panchromatic satellite imagery for verification.
R EFERENCES
19
Fig. 3. Experimental Results of Proposed Architecture. Column (a) is the SAR image captured by the Sentinel-1 satellite. Column (b) is the segmentation
map produced by the Vanilla U-Net model. Column (c) is the map produced by the Transfer U-Net model.
20