Automatic Segmentation of River and Land in SAR Images A Deep Learning Approach

This document describes a study that uses deep learning methods for semantic segmentation to automatically segment river and land areas from synthetic aperture radar (SAR) images. Specifically, it proposes using a U-Net architecture to perform the segmentation. The study aims to develop an efficient methodology for surface water detection from satellite SAR images. Experimental results show that the U-Net approach achieves high accuracy, with a pixel accuracy of 0.98 and F1 score of 0.99.

Uploaded by

Mufqi Ananda Rahmadi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views6 pages

Automatic Segmentation of River and Land in SAR Images A Deep Learning Approach

Uploaded by

Mufqi Ananda Rahmadi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)

Automatic Segmentation of River and Land in SAR

Images: A Deep Learning Approach
Manohara Pai M. M Vaibhav Mehrotra
Department of Information and Communication Technology Department of Information and Communication Technology
Manipal Institute of Technology Manipal Institute of Technology
Manipal Academy of Higher Education Manipal Academy of Higher Education
Manipal, India Manipal, India
[email protected] [email protected]

Shreyas Aiyar Ujjwal Verma

Department of Information and Communication Technology Department of Electronics and Communication Engg
Manipal Institute of Technology Manipal Institute of Technology
Manipal Academy of Higher Education Manipal Academy of Higher Education
Manipal, India Manipal, India
[email protected] [email protected]

Radhika M. Pai
Department of Information and Communication Technology
Manipal Institute of Technology
Manipal Academy of Higher Education
Manipal, India
[email protected]

Abstract—The ubiquitousness of satellite imagery and power- component to the flow of the water. Extreme changes to
ful, computationally efficient Deep Learning frameworks have surface water can have serious repercussions such as floods
found profound use in the field of remote sensing. Augmented which is currently the most common natural disaster to affect
with easy access to abundant image data made available by
different satellites such as LANDSAT and European Space the world . Thus, it is imperative to develop approaches to
Agency’s Copernicus missions, deep learning has opened various detect and constantly monitor and predict future water levels.
avenues of research in monitoring the world’s oceans, land, rivers, The recent large-scale proliferation of remote-sensing satel-
etc. One significant problem in this direction is the accurate lites such as the Sentinel-1, Landsat, and Radarsat resulted
identification and subsequent segmentation of surface-water in in regular monitoring of the Earth at high-frequency periodic
images in the microwave spectrum. Typically, standard image
processing tools are used to segment the images which are time- intervals. Further, these satellites are equipped with high-
inefficient. However, in recent years, deep learning methods for resolution microwave sensors capable of imagery in all terrain
semantic segmentation is the preferred choice given its high conditions as well as showing invariance to day and night
accuracy and ease of use. This paper proposes the use of cycles. One of the significant advantages is their ability to
deep-learning approaches such as U-Net to perform an efficient penetrate thick cloud cover.
segmentation of river and land. Experimental results show that
our approach achieves vastly superior performance on SAR Detection of surface water in Synthetic Aperture Radar
images with pixel accuracy of 0.98 and F1 score of 0.99. (SAR) images until now has largely been addressed by elab-
Index Terms—Semantic Segmentation, SAR image, U-Net, orate image processing algorithms. Some commonly applied
Deep Learning. approaches include the watershed algorithm [1], thresholding
[2] and morphological profiling followed by traditional ma-
I. I NTRODUCTION chine learning algorithms such as Support Vector Machines
Observation of surface water is an essential component in (SVM’s). Although these algorithms have been shown to
understanding and deriving insights about local ecological and perform effectively for a specific polarization, the model does
hydrological processes. Surface water, in contrast to atmo- not generalize across polarizations. In addition, the presence
spheric or groundwater, largely refers to water on the surface of foreign objects such as bridges results in gaps in the output
of Earth such as rivers, lakes and wetlands. Surface water [1]. This and the significant time investment in terms of hand-
is often subject to external forces that result in expansions, tuning paves the way for more robust approaches such as
contractions and changes in appearance, lending a dynamic neural networks.

978-1-7281-1488-0/19/$31.00 ©2019 IEEE 15

DOI 10.1109/AIKE.2019.00011
The advent of very deep neural-networks in the past few II. R ELATED W ORK
years along with off-the-shelf libraries for learning algorithms
has enabled easy building of end-to-end models to perform Primary work in the field of semantic segmentation of
common computer vision tasks such as image recognition, ob- SAR images has been towards target recognition and road
ject detection and image segmentation. Deep learning methods segmentation. Cui et al. [7] used region-based convolutional
offer a significant advantage over traditional image processing neural networks for target detection in large scene images. The
pipeline in that there is little domain knowledge or insight objective of their model was to detect targets such as tanks
required. One such instance of image segmentation is seman- and armoured cars, as provided in the MSTAR dataset. The
tic segmentation in which specific regions of an image are methodology involved a fast sliding method to slice and resize
automatically categorized as one among several predetermined the images that are input to the model. An average accuracy
classes. Semantic segmentation has found significant success of 94.67% was recorded.
in areas such as self-driving cars [3] as well as diagnostics of Yang et al. [8] used Conditional Random Fields on re-
medical images [4]. In this paper, it is proposed to develop an gion adjacency graphs for the semantic labelling of SAR
efficient methodology to tackle the problem of surface water images. Gabor filters were employed to extract texture infor-
detection from satellite SAR images. mation whereas to exploit backscattering intensity information,
The primary contribution in this paper is as follows: gamma distribution and histogram cues were used. The highest
1) The usage of the U-Net architecture to perform semantic accuracy of 86.5% was reached when all the above-mentioned
segmentation of surface-water and land using high- techniques were used.
resolution SAR data. Henry et al. [9] used deep fully-convolutional neural net-
2) Show the effectiveness of transfer learning, a modern works for road segmentation in SAR images. They found that
deep learning paradigm which reuses knowledge learnt by adding a tolerance rule towards spatially small mistakes,
from similar tasks. fully convolutional neural networks (FCNNs) proved to be an
effective model for road segmentation, overcoming the major
difficulty of isolating thin objects in a speckled environment.
A. Representation of Water in SAR Images
However, the model had difficulty generalising over a variety
SAR images are typically represented as the grayscale of patterns and would fail in applications wherein the contour
images wherein the associated intensity value of each pixel of the water bodies is extremely irregular.
is denoted by the proportion of microwave which is back- A Deep Learning approach based on a modified U-Net
scattered. Land, which is usually rough, appears bright with architecture has been shown to work by Zhengxin Zhang et
high intensity. Water appears dark since most of the inci- al. [10] for the extraction of the road using the Massachusetts
dent radar energy is scattered away. The significant contrast roads data set. Relaxed precision and recall were used as the
difference can thus be exploited for efficient segmentation. evaluation parameters along with a break-even point, a point
An inherent problem in SAR images is Speckle Noise, a where the relaxed precision and recall were equal. The deep
form of multiplicative noise that corrupts SAR images by residual U-Net or ResNet showed a break-even point at .9187,
altering backscatter. Therefore, speckle noise can distort the outperforming the conventional U-Net with a break-even point
river edges, thus making it difficult to accurately determine the of .9053.
boundaries. Fortunately, robust algorithms such as the Median River channel segmentation has been explored in [1] using
filter and the Lee filter exist to reduce speckle noise [5]. an image processing approach (Watershed segmentation). The
primary drawback of using the watershed segmentation algo-
B. Motivation for U-Net Architecture rithm is irregular and jagged boundaries of rivers. Similarly,
objects such as bridges and ships can cause the algorithm to
In this paper, it is proposed to demonstrate the effectiveness get ”stuck” within the local high contrast region of the river.
of U-Net architecture [6] using a manually labelled dataset of
SAR images. Each pixel is thus labelled either as river/water
bodies or land. U-Net proves to be a well-suited model III. P ROPOSED M ETHODOLOGY
because:
In this section, the proposed framework for surface-water
1) U-Net works with very few training images due to the and land segmentation is presented. In this study, Sentinel
effectiveness of data augmentation approaches. 1 SAR images from the European Open Access hub [11],
2) It has proven to detect boundaries of an irregular and which provides free access to the Sentinel family of products,
rough nature with very high accuracy. was manually collected. Using Sentinel’s Application Platform
The rest of this paper is organised as follows. Section (SNAP) [12], the Refined Lee filter [5] is applied to despeckle
II reviews similar work, section III details the proposed U- the SAR images. Subsequently, the open source python library
Net-based SAR segmentation, and experimental results are labelme was used to manually annotate the image. Finally,
presented and analysed in Section IV. Finally, conclusions are data-augmentation techniques are applied to generate new
drawn in Section V. training samples.

16
of reusing the weights of a pre-trained model for a similar
task. Such an initialisation has been proven to work better
than random weight initialization [14]. In this paper, it is
proposed to compare the results obtained by learning the
U-Net model from scratch and using the pre-trained weights
as learnt by the U-Net model on the ISBI 2015 Cell Tracking
dataset.

IV. R ESULTS & D ISCUSSION

A. Experimental Procedure
Dataset: A subset of the publicly available SAR data from
the European Copernicus Satellite mission is utilized to eval-
uate the performance of the proposed framework. This study
utilizes 40 level 1 Ground Range Detected (GRD) images
acquired over land using the Interferometric Wide (IW) swath
mode with a 5x20m resolution. Image data from coastal areas
of India (Mangalore-Udupi region) was collected and rescaled
appropriately to produce tiles of 512x512 pixels. Subsequently,
areas of interest, which included images with river pixels, were
selected. For training and testing, created a dataset of 30 and
10 images respectively. The training set also includes only
Fig. 1. Proposed Methodology ”river” and only ”land” images. The training set contains 12%
of river pixels and the test set 19% of river pixels. The class
imbalance problem is handled by optimising the Dice Loss
function for the U-Net model [15]. Labelling the images was
performed to create the ground truth of the respective SAR
A. Despeckling with Refined Lee Filter images. Some of the images in the training set along with the
To reduce the effect of speckle noise, filtering techniques ground truth are displayed in Figure 2.
that preserve the boundaries of the river are applied to the The proposed algorithm was implemented in python 2.7
SAR images. Ardhi Wicaksono Santoso et al. [5] performed using the keras functional API on a Intel Xeon with a Tesla
a comparative study of filters based on properties such as V100 GPU, 32 GB RAM on a system running CUDA 9.0.
Speckle Index (SI), Average Difference (AD), Equivalent Data Augmentation: Manual labelling of a large number
Number of Looks (ENL), and have determined the Lee filter of river boundaries in SAR images is a laborious and time-
to have the best metrics. The Refined Lee filter [13] is an consuming task. Therefore, data augmentation is the preferred
improvement over the original Lee filter which dynamically choice in such a scenario. Data augmentation is a technique
adjusts the number of pixels used in the sliding window by that applies transformations such as rotation, translation, scal-
employing the K-Nearest Neighbour algorithm. ing, etc to improve the usage of the annotated data and achieve
invariance [16] with respect to width and height shifts along
B. Semantic Segmentation Architecture with rotation. Similar to tissue deformation, river water flow is
In this work, the performance of U-Net architecture [6] is also susceptible to extensive contouring and changes in width.
studied for semantic segmentation of surface-water and land. In this study, the data is augmented by rotating, translating,
The U-Net architecture is a popular architecture for semantic zooming, horizontal flipping and shearing of the original
segmentation originally proposed for biomedical image seg- data. The images are augmented in real time by randomly
mentation. It consists of three paths, a contracting path and an selecting the parameters to transform. Finally, 50,000 images
expansive path which correspond to the encoder and decoder are obtained which are used to train the network.
architectures respectively. The presence of a bridge acts as an Training: The weights of the network were randomly
intermediate layer between the two paths. The additional copy initialised with a normal distribution with mean zero and
and crop operations between the encoder and decoder halves standard deviation 2/n. This initialisation has been proven
improve localisation and generate highly precise segmentation by [17] to allow for very deep architectures to converge. In all
maps while retaining high-level semantic information. the layers a zero padding is applied such that the dimension
Transfer Learning: Training very deep neural network of the output is the same as the input (same padding).The
architectures is usually not a feasible task given the Adam optimiser, with a learning rate of 10−4 is used to learn
requirements of computational power and limitations on the the network parameters. The U-net model was trained for 5
size of the dataset. Transfer learning is based on the principle epochs, each of 2500 steps with a batch size of 4.

17
fraction of the water-body pixels which are labelled correctly
and Recall is the fraction of all the labeled water-body pixels
that are correctly predicted. They are formulated as follows:

T rue P ositive
P recision =
T rue P ositive + F alse P ositive
T rue P ositive
Recall =
T rue P ositive + F alse N egative
Since neither Precision nor Recall is sufﬁcient to completely
describe the performance of a model, the F1 score, a weighted
harmonic mean of the two metrics is also employed. It is given
as follows:
2 · precision · recall
F1 =
precision + recall
The models are also evaluated with respect to the Mean
Intersection over Union (MIoU) as well as the Pixel Accuracy
(PA). The MIoU measures the similarity between the labelled
image and the ground truth. Pixel accuracy simply reports the
number of pixels correctly classiﬁed by the model. The two
metrics are as follows:

(1/C) i nii
M IoU =
ti + j nji − nii

nii
P A = i
i ti

where C is the number of classes, nji is the number of

pixels of class i mistakenly classiﬁed as belonging to class j.
ti represents the number of pixels belonging to class i.

C. Results
In this work, two models are used: Vanilla U-Net and
Transfer U-net. The Vanilla U-Net architecture is trained on
our SAR images dataset, while Transfer U-Net is trained on
Fig. 2. Examples from the training set, column (a) corresponds to SAR the 2015 ISBI Cell Tracking challenge and fed the weights
images and column (b) is the ground truth.
generated for our transfer learning task. The results of the
transfer learning task represent the best results i.e the one
obtained by retraining the decoder of the U-Net model. The
Loss Function: For the U-Net model, the soft-dice loss network architectures are evaluated on 10 test images with
function [15] is a typically preferred loss function to reduce respect to the metrics deﬁned above. The quantitative results
bias in predictions. It is especially useful when the training of the models is shown in Table I. Qualitatively, the visual
data suffers from a class imbalance problem. It is formulated outputs of the segmentation models can be seen in Figure 3.
as follows:
2 |A ∩ B|
Dice Loss = TABLE I
|A| + |B| P ERFORMANCE C OMPARISON
The numerator, A ∩ B represents the intersection of the sets
Architecture Precision Recall F1 MIoU PA
A and B and the denominator, the number of elements of A
Vanilla U-Net 0.9927 0.9919 0.9923 0.9551 0.9876
and B respectively. In this study, the soft-dice loss function
Transfer U-Net 0.9943 0.9881 0.9912 0.9512 0.9859
is used because of the class imbalance problem as discussed
earlier.
Both the architectures perform very well on SAR images
B. Evaluation Metrics and obtain a good F1 score of 0.99. Unlike the traditional
To evaluate the algorithms, several metrics namely Preci- image processing approaches such as Watershed segmentation,
sion, Recall, Mean Intersection over Union (MIoU) as well U-Net models can identify the ﬁne details. Moreover, the
as the Pixel Accuracy (PA) is used [18]. Precision is the transfer U-net architecture is able to even segment rivers whose

18
width is small. However, Vanilla U-Net fails to identify small [7] Z. Cui, S. Dang, Z. Cao, S. Wang, and N. Liu, “Sar target
recognition in large scene images via region-based convolutional neural
width-rivers. networks,” Remote Sensing, vol. 10, no. 5, 2018. [Online]. Available:
One more interesting result is surprising effectiveness of http://www.mdpi.com/2072-4292/10/5/776
the Transfer Learning approach. An extensive experimentation [8] W. Yang, L. Chen, D. Dai, and G.-S. Xia, “Semantic labelling of sar
images with conditional random fields on region adjacency graph,”
with retraining selected decoder layers is done. Retraining only Radar, Sonar & Navigation, IET, vol. 5, pp. 835 – 841, 11 2011.
the last 1x1 Convolution layer results in predictions that are [9] C. Henry, S. M. Azimi, and N. Merkle, “Road segmentation in sar
significantly noisy. Similarly by retraining only the second last satellite images with deep fully convolutional neural networks,” IEEE
Geoscience and Remote Sensing Letters, vol. 15, no. 12, pp. 1867–1871,
convolution layer produces a model with lesser noise and so Dec 2018.
forth. It is discovered that increasing the number of trainable [10] Z. Zhang, Q. Liu, and Y. Wang, “Road extraction by deep residual u-
layers in the decoder architecture results in successively bet- net,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp.
749–753, May 2018.
ter performance. Performance peaks on retraining the entire [11] “European copernicus open access hub.” [Online]. Available:
decoder architecture which results in a model that performs https://scihub.copernicus.eu
comparatively to the U-Net model that learned its weights from [12] “Sentinel application platform.” [Online]. Available:
http://step.esa.int/main/toolboxes/snap/snap-faq/
scratch. It is seen that the spatial features map very well from [13] A. S. Yommy, R. Liu, and S. Wu, “Sar image despeckling using refined
the biomedical image segmentation problem over to surface lee filter,” 2015 7th International Conference on Intelligent Human-
water segmentation in SAR data. Machine Systems and Cybernetics, vol. 2, pp. 260–265, 2015.
[14] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are
features in deep neural networks?” CoRR, vol. abs/1411.1792, 2014.
V. C ONCLUSION [Online]. Available: http://arxiv.org/abs/1411.1792
[15] F. Milletari, N. Navab, and S. Ahmadi, “V-net: Fully convolutional
In this paper, a robust methodology is proposed for an neural networks for volumetric medical image segmentation,” in 2016
Fourth International Conference on 3D Vision (3DV), Oct 2016, pp.
efficient and highly precise segmentation of surface river water 565–571.
and land. In addition, two different implementation of U-Net [16] A. Dosovitskiy, J. T. Springenberg, M. Riedmiller, and T. Brox,
architecture is studied on SAR images, one in which U-net “Discriminative unsupervised feature learning with convolutional neural
networks,” in Advances in Neural Information Processing Systems 27,
is trained from scracth (Vanilla U-Net) and other in which Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q.
pretrained weights are used (Transfer U-Net). Experimental Weinberger, Eds. Curran Associates, Inc., 2014, pp. 766–774. [Online].
results show that the both architectures gave similar perfor- Available: http://papers.nips.cc/paper/5548-discriminative-unsupervised-
feature-learning-with-convolutional-neural-networks.pdf
mance in terms of F1 score, pixel accuracy and mean IoU. [17] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
However, transfer U-Net is able to identify very minute details Surpassing human-level performance on imagenet classification,” in
in the image such as small rivers etc. One limitation however, 2015 IEEE International Conference on Computer Vision (ICCV), Dec
2015, pp. 1026–1034.
to this approach is the possibility of false positives, that is [18] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
the model may identify water in regions of relatively low for semantic segmentation,” in Proceedings of the IEEE conference on
intensity. For our future work, we would extend this model computer vision and pattern recognition, 2015, pp. 3431–3440.
for multi-class classification and introduce information from
panchromatic satellite imagery for verification.

R EFERENCES

[1] M. Ciecholewski, “River channel segmentation in polarimetric

sar images: Watershed transform combined with average
contrast maximisation,” Expert Systems with Applications,
vol. 82, pp. 196 – 215, 2017. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0957417417302622
[2] J. Li and S. Wang, “An automatic method for mapping inland surface
waterbodies with radarsat-2 imagery,” International Journal of Remote
Sensing, vol. 36, no. 5, pp. 1367–1384, 2015. [Online]. Available:
https://doi.org/10.1080/01431161.2015.1009653
[3] M. Siam, S. Elkerdawy, M. Jägersand, and S. Yogamani,
“Deep semantic segmentation for automated driving: Taxonomy,
roadmap and challenges,” pp. 1–8, 2017. [Online]. Available:
https://doi.org/10.1109/ITSC.2017.8317714
[4] H. R. Roth, C. Shen, H. Oda, M. Oda, Y. Hayashi, K. Misawa,
and K. Mori, “Deep learning and its application to medical image
segmentation,” arXiv e-prints, p. arXiv:1803.08691, Mar. 2018.
[5] A. W. Santoso, D. Pebrianti, L. Bayuaji, and J. M. Zain, “Performance
of various speckle reduction ﬁlters on synthetic aperture radar image,”
in 2015 4th International Conference on Software Engineering and
Computer Systems (ICSECS), Aug 2015, pp. 11–14.
[6] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
for biomedical image segmentation,” in Medical Image Computing and
Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Horneg-
ger, W. M. Wells, and A. F. Frangi, Eds. Springer International
Publishing, 2015, pp. 234–241.

19
Fig. 3. Experimental Results of Proposed Architecture. Column (a) is the SAR image captured by the Sentinel-1 satellite. Column (b) is the segmentation
map produced by the Vanilla U-Net model. Column (c) is the map produced by the Transfer U-Net model.