0% found this document useful (0 votes)
46 views8 pages

Valanghe

This article proposes a deep learning approach for detecting snow avalanches in SAR images. The authors train a fully convolutional neural network on a dataset of over 6,000 manually labeled avalanches from 117 Sentinel-1 SAR images. The neural network takes in six channels of data including backscatter information and new topographical features. It performs pixel-level segmentation to identify avalanches, unlike previous methods that classify patches. Testing on a new SAR image, their model achieves an F1 score of over 66% for avalanche detection, outperforming the state-of-the-art algorithm score of 38%. Visual inspection finds the model detects small avalanches but also some not originally labeled by experts.

Uploaded by

Paola Pierleoni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views8 pages

Valanghe

This article proposes a deep learning approach for detecting snow avalanches in SAR images. The authors train a fully convolutional neural network on a dataset of over 6,000 manually labeled avalanches from 117 Sentinel-1 SAR images. The neural network takes in six channels of data including backscatter information and new topographical features. It performs pixel-level segmentation to identify avalanches, unlike previous methods that classify patches. Testing on a new SAR image, their model achieves an F1 score of over 66% for avalanche detection, outperforming the state-of-the-art algorithm score of 38%. Visual inspection finds the model detects small avalanches but also some not originally labeled by experts.

Uploaded by

Paola Pierleoni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
1

Snow avalanche segmentation in SAR images with


Fully Convolutional Neural Networks
Filippo Maria Bianchi∗ , Jakob Grahn, Markus Eckerstorfer, Eirik Malnes, Hannah Vickers

Abstract—Knowledge about frequency and location of snow and the texture of avalanche debris. Also, local topography
avalanche activity is essential for forecasting and mapping of in which the avalanches occur is largely disregarded since it
snow avalanche hazard. Traditional field monitoring of avalanche has only been used to mask out areas where avalanches are
activity has limitations, especially when surveying large and
remote areas. In recent years, avalanche detection in Sentinel-1 unlikely to occur. However, the occurrence of avalanches is
radar satellite imagery has been developed to improve monitoring. strongly correlated to topographical conditions and avalanche
However, the current state-of-the-art detection algorithms, based debris exhibits characteristic shapes, which should both be
on radar signal processing techniques, are still much less accurate
taken into account when performing the detection.
than human experts. To reduce this gap, we propose a deep
learning architecture for detecting avalanches in Sentinel-1 radar Convolutional neural networks (CNNs) have attracted con-
images. We trained a neural network on 6, 345 manually labelled siderable interest for their ability to model complex contextual
avalanches from 117 Sentinel-1 images, each one consisting of six information in images [3]. Prominent examples in remote
channels that include backscatter and topographical information. sensing are terrain surface classification [4], [5], categorization
Then, we tested our trained model on a new SAR image. of aerial scenes [6], detection of changes in the terrain
Comparing to the manual labelling (the gold standard), we
achieved an F1 score above 66%, while the state-of-the-art over time from SAR and optical satellite sensors [7], [8],
detection algorithm sits at an F1 score of only 38%. A visual and segmentation of objects from airborne images [9], [10].
inspection of the results generated by our deep learning model Nevertheless, few research efforts have been devoted so far
shows that only small avalanches are undetected, while some towards detecting avalanche activity from SAR data, which
avalanches that were originally not labelled by the human expert remains an open and challenging endeavour. In our previous
are discovered.
work [11], we proposed a deep learning architecture to perform
Index Terms—Deep Learning; Saliency Segmentation; Convo- binary classification of avalanches in Northern Norway. In
lutional Neural Networks; Snow Avalanches; SAR; Sentinel-1.
particular, we used a CNN to classify fixed-size patches of
SAR images in two classes: 1 if the patch contains at least
I. I NTRODUCTION one avalanche, or 0 otherwise. Our approach was successively
Knowledge about the spatio-temporal distribution of snow adopted later on for SAR-borne avalanche detection in the
avalanche (hereafter referred to as avalanche) activity in a Alps [12] and in other locations in Norway [13]. As a
given region is critical for avalanche forecasting and hazard major limitation, patch-wise classification cannot determine the
mapping. An increase in avalanche activity or magnitude of presence of multiple avalanches within the same patch. Also,
releasing avalanches leads to an increase in avalanche risk. Con- the results are heavily influenced by the patch size, which
ventionally, avalanche activity is primarily monitored through makes it difficult to evaluate the detection performance. In
field measurements, which is time-consuming, expensive, and particular, for large windows is easier to correctly predict the
can only be done for very few accessible areas. Monitoring presence of at least one avalanche, but the resolution of the
avalanche activity using satellite-borne synthetic aperture radar detection is too coarse and not very useful.
(SAR) has, therefore, gained considerable interest in recent In this work, we approach avalanche detection as a saliency
years. SAR products enable continuous covering of very large segmentation task, where the classification is not done at the
areas, regardless of light and weather conditions [1]. patch level, but rather at the individual pixel level. We adopt
An experienced operator can identify avalanche debris (the a Fully Convolutional Network (FCN) architecture, which
depositional part of an avalanche) in SAR change detection generates for each input image a segmentation mask. This
composites (showing temporal radar backscatter change) with solves the drawback of the dependency from the window size
high accuracy. On the other hand, automatic signal processing and makes it possible to determine the exact location of the
methods based on radar backscatter thresholding and segmenta- avalanches. Our work provides important contributions to the
tion often fail and produce a large number of false alarms due fields of Earth science, remote sensing, and avalanche risk
to the highly dynamic nature of snow in the SAR images [2]. assessment.
A key limitation of classical segmentation methods is that • We explore, for the first time, the capability of deep
they mainly focus only on pixel-wise information in radar learning models in detecting the presence of avalanches
backscatter, without accounting for the contextual information in SAR products at a pixel granularity and surpass the
around the pixel and high-level features, such as the shape current state-of-the-art avalanche detection algorithm [2].
*[email protected] Our work was possible thanks to a large dataset of SAR
The authors are with NORCE, The Norwegian Research Centre AS products manually annotated by an avalanche expert.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
2

• We advance our knowledge on topographical features to from the following three channels: R[VVreference ], G[VVactivity ],
identify areas where avalanches are highly likely to occur. B[VVreference ]. We considered visual detection as the golden
Notably, we introduce a new topographical feature, called standard and used it as ground truth to train and evaluate
potential angle of reach (PAR), which indicates how likely our deep learning model. The whole dataset contains a total
it is for an avalanche to reach a specific location. We do not of 6, 345 avalanches; 3, 667, 355, 474 pixels are classified as
use the PAR to filter input images or detection results, but “non-avalanche” and 712, 945 (0.000194% of the total) as
we rather provide the PAR as an exogenous input feature “avalanche”.
to the FCN. We first estimate how informative is the PAR
in the discrimination of avalanche and not-avalanche pixels. III. T OPOGRAPHICAL FEATURES
Then, in the experimental section, we evaluate how much the Since avalanches are caused by steep terrain, the topography
detection performance of the deep learning model improves is an important factor to determine where avalanches can appear.
when providing the FCN with the PAR feature map. In particular, the local slope needs to be steep enough for an
avalanche to release and the slope typically needs to flatten
II. SAR DATASET out for the avalanche to stop. Therefore, it is reasonable to
The dataset consists of data from the Sentinel-1 (S1) satellites. consider such information when performing the detection task
In particular, data acquired in the interferometric wideswath and we generated two feature maps from the digital elevation
(IW) mode was considered in terms of the ground range model (DEM), which is available for the entire Norway in
detected (GRD) product. In total, 118 SAR scenes covering 10m pixel resolution. The first is the local slope angle of the
two mountainous regions in Northern Norway in the period terrain; the second is a new topographical feature introduced
Oct. 2014-Apr. 2017. in this work, which is called potential angle of reach (PAR).

A. Preprocessing A. Slope angle


Each SAR product was (i) radiometrically calibrated to radar The slope angle feature map is directly computed by taking
backscatter (sigma nought) values, (ii) spatially downsampled the gradient of the DEM (see Fig. 1(d)). The terrain slope is
from 10 to 20 meters resolution, (iii) geocoded onto a 20 meters often considered when detecting avalanches, as they typically
resolution UTM-grid (EPSG:32633) using a 10 meter resolution start in terrain between 35-45 degrees steepness and deposit
digital elevation model (DEM) [14], (iv) radiometrically on less steep slope angles. In previous work, the slope was
transformed to decibel (dB) values and clipped to range used to derive a runout mask that indicated where avalanches
values from -25 to -5 dB, to remove noise and restrict the are most likely to deposit [2]. Since the mask is applied to
range of the backscatter to intervals where avalanches are filter out areas in a pre-processing operation, the slope feature
visible. The preprocessed products were then grouped by their did not contribute to the actual detection. Most importantly,
satellite geometry, such that the scenes within a group have the since run-out masks are obtained by manual thresholding the
same viewing geometry. For each group, scenes were paired slope, if a wrong threshold is chosen some avalanches will not
chronologically into reference and activity image pairs. For be detected. To address this issue, we provide the slope as an
the two S1 satellites, the reference image is acquired 6 days additional layer of the input image and let our neural network
before the activity image (12 days before the launch of S1B learn how to optimally exploit it to solve the segmentation
in 2015). The resulting products have an approximate size of task, without applying manually chosen thresholds.
11.500 × 5.500 pixels, and each pixel covers 20 × 20 meters. Fig. 2 shows that the distribution of the slope angle is
different for the avalanche and non-avalanche pixels in our
B. Generation of SAR features dataset. In particular, avalanche pixels are mostly concentrated
We considered three SAR features to generate the images to around [20, 35] degrees. The difference in the two distributions
be processed by the deep learning model. The first two are the indicates that the slope angle can be exploited to discriminate
difference of the horizontal and vertical polarization between between “avalanche” and “non-avalanche” classes.
the reference and the activity image: VV = VVactiv - VVref , VH
= VHactiv - VHref . The difference values are re-scaled to [0,1] B. Potential angle of reach (PAR)
(see Fig. 1(a,b)). The third feature is the point-wise product
The angle of reach of an avalanche, sometimes denoted
of the difference images squared: VVVH = VV2 ∗ VH2 (see
α and referred to as the alpha-angle, indicates how far an
Fig. 1(c)). We did not consider radar shadow, layover masks,
avalanche travels from its triggering point in relation to the
or land masks depicting avalanche runout zones, which are not
descent it makes. Specifically, it is defined as the elevation
available for all areas.
angle of the line between the point of furthest avalanche runout
and the point of highest release. For most avalanches, this angle
C. Labeling ranges between 20 to 40 degrees [15], [16], [17].
For each product, a human expert generated a binary While the angle of reach is defined only for an existing
segmentation mask that indicates whether a pixel in the product avalanche, we here introduce the potential angle of reach
is an avalanche or not. To create the segmentation mask, the (denoted as α̃), which is defined for a hypothetical avalanche
human expert looked for changes in a difference image obtained located at any given point in the DEM. Ideally, this feature will

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
3

(a) VV difference (b) VH difference (c) VVVH

70 60

60 50

50
40
40
30
30
20
20

10 10

0 0

(d) Slope (e) PAR

Fig. 1. (a, b) the SAR features obtained from the difference in the VV and VH channels. (e) product VVVH of the squared differences. (d, e) slope and PAR
feature maps. Only a small area (1k × 1k pixels) of the actual scene is depicted here.

avalanche pixels non-avalanche pixels to see that for avalanche pixels the distribution is more regular
0.06 0.04 and has a single peak centred around 40 degrees. While the true
0.05 0.03
angle of reach is expected to range 20 to 40 degrees, the PAR
0.04 is consequently biased towards higher values. We concluded
0.03 0.02
that the PAR is informative since the two distributions are
0.02
0.01 different for the two classes. Contrarily to the slope, the PAR
0.01
0.00 0.00 is not simply concatenated to the other layers of the input
0 20 40 60 0 25 50 75
Slope Slope image but is rather used to encourage the deep learning model
Fig. 2. Distribution of the slope angle for avalanche and non-avalanche pixels to focus on specific areas (see Sect. IV-C).

range values between 20-40 degrees in terrain where avalanches


can accumulate. Assuming that avalanches normally releases
in steep terrain, e.g., in slopes of 30-50 degrees, the PAR
angle is obtained by (i) computing the elevation angle to all
neighbouring release points x (within a 4 km radius), and (ii), Fig. 3. Definition of the potential angle of reach α̃, where θ(x) denotes the
by taking the maximum of all such angles, as illustrated in angle between the horizontal and the line drawn from a point in a release
zone, denoted x, to the point of interest.
Fig. 3. By computing the PAR for each point in the DEM, a
PAR feature map can be obtained and used as an additional
channel of input images. IV. D EEP L EARNING M ODEL
Fig. 4 depicts the distribution of the PAR angles for avalanche The FCN network used for segmentation is based on the
and non-avalanche pixels using the training data. It is possible U-Net architecture [18], which consists of an encoder and a

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
4

avalanche pixels non-avalanche pixels with max pooling, while the decoder restores it through bilinear
0.08
upsampling. Each block contains 2 Batch Normalization [19]
0.03 and one Dropout layer [20], which are respectively used to
0.06
0.02 facilitate the training convergence and improve the model
0.04 generalization capability. We note that Batch Norm layers
0.02 0.01 are not present in the original U-net architectures but, as also
0.00 0.00 verified in preliminary experiments, their presence improves
0 20 40 60 0 20 40 60 80
PAR PAR the segmentation performance. The last encoder block (Enc
Fig. 4. Distribution of the PAR for avalanche and non-avalanche pixels Block 512 in Fig. 5) does not have Dropout, while the last
decoder block (Dec Block 32) is followed by a Conv layer with
Bi-Upsample
one 1 × 1 filter and a sigmoid activation. Since the network is
Conv(n)
BatchNorm Concatenate with Skip fully convolutional (there are no dense layers), it can process
Dropout
ReLU inputs of variable size.
Conv(n) Conv(n)
BatchNorm BatchNorm We note that it would be possible to use more powerful
ReLU ReLU FCN architectures such as DeepLabV3+ [21], which achieves
MaxPool Conv(n)
BatchNorm
state-of-the-art results in segmenting natural images. However,
Dropout
ReLU models with a larger capacity, such as DeepLabV3+, require
Encoder Block Decoder Block very large datasets to be trained on. In remote sensing
applications, a smaller network such as U-net is often preferred,
given the limited amount of training data. Moreover, the U-net
Enc Block 32

Dec Block 32

skip connection outperforms other architectures in detecting small objects [22],


such as the snow avalanches in our work.
Enc Block 64

Dec Block 64

A. Class balance
Avalanches are small objects and the avalanche class is highly
Enc Block 128

Dec Block 128

under-represented in the dataset (avalanche pixels are only


0.019% of the total). Therefore, a trivial model that classifies
Enc Block 256

Dec Block 256

each pixel as “non-avalanche” would reach a classification


accuracy of 99.98%. A solution to handle class unbalance is to
Enc Block 512

differently weight the loss relative to the pixels of the different


classes so that the model is more penalized when it misclassifies
the underrepresented class [9]. Specifically, we configured the
loss to give twice more importance to the classification errors on
Fig. 5. The FCN architecture used for segmentation. Conv(n) stands for a the avalanche pixels. We also experimented with loss functions
convolutional layer with n neurons. For example, n = 32 in the first Encoder specifically designed to handle class unbalance, such as the
Block, 64 in the second, and so on. Jaccard-distance loss [23] and the Lovász-Softmax loss [24],
but we obtained worse results than optimizing the FCN using
binary cross-entropy loss with class balancing.
decoder, respectively depicted in blue and red in Fig. 5. The
encoder hierarchically extracts feature maps that indicate the
presence of the patterns of interest in the image. By reducing B. Data augmentation
the spatial dimensions and increasing the number of filters, To avoid overfitting during training and to enhance the model
the deeper layers in the encoder capture patterns of increasing generalization to new data, we perform data augmentation
complexity and with a larger spatial extent in the input image. by randomly applying (on the fly) horizontal and vertical
The decoder gradually transforms the high-level features and, flips, horizontal and vertical shifts, rotations, zooming, and
in the end, maps them into the output. The output is a binary shearing to the training images. To ensure consistency, the
segmentation mask, which has the same height/width of the same transformations on the input images are also applied to
input and indicates which are the pixels that belong to the their labels (avalanche masks).
avalanche class. The skip connections link the feature maps To compute the prediction of a whole SAR product at
from the encoding to the decoding layers, such that some inference time, we could slide the FCN on the large image and
information can bypass the bottleneck located at the bottom of compute predictions for one window at a time. However, this
the “U” shape. In this way, the network still learns to generalize approach usually generates checkerboard artefacts and border
from the high-level latent representation but also recovers the effects close to the window edges. To obtain smoother and
spatial information through a pixel-wise semantic alignment more accurate predictions, we consider overlapping windows
with the intermediate representations. by sliding the FCN with a stride equal to half the window
Fig. 5 shows the architecture details: the number n in each size. Furthermore, we apply to each window all the possible
Enc/Dec Block indicates the quantity of 3 × 3 filters in the 90◦ rotations and flips; then, we compute the predictions and,
Conv(n) layers. The encoder reduces the spatial dimension finally, revert the transformations on the predicted outputs.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
5

To obtain the final segmentation, we first merge the multiple classified as “avalanche” by the human expert. We ended up
predictions available at each pixel location (stemming from the with ≈ 35.000 patches, of which 10% were used as a validation
geometric transformations and the overlapping windows) and set for model selection and early stopping. Finally, out of the
then we join them by using a 2nd order spline interpolation. 118 available S1 scenes, one scene with date 17-Apr-2018,
which contains 99 avalanches, was isolated from the rest and
C. Attention mask used as the test set.
Following our hypothesis that the PAR feature map can
highlight areas where it is more likely to find an avalanche, V. R ESULTS AND DISCUSSION
we propose a neural attention mechanism [25] that generates The network is trained with Adam optimizer [26] with default
an attention mask conditioned on the PAR. The intention is parameters; we used mini-batches of size 16 and dropout rate
to learn an attention mask that encourages the segmentation 0.4. Examples of FCN predictions are depicted in Fig. 7. Since
procedure to put more focus on specific regions of the input the networks predict real values in [0,1], a binary segmentation
image. Specifically, we use a small network that takes as input mask (last column) is obtained by thresholding the soft output
the PAR and generates the attention mask that is, subsequently, (3rd column) at 0.5.
applied pixel-wise to the SAR channels (VV, VH, and VVVH) Since the avalanche class is highly under-represented, accu-
before they are fed into the segmentation network (see Fig. 6). racy is not a good measure to quantify the performance and,
We note that the attention mask is not applied to the input therefore, we evaluated the quality of the segmentation result
channel containing the slope feature map. by using different metrics. The first is the F1 score, which is
The attention network consists of three stacked Conv layers computed at the pixel level and is defined as
with 32 3×3 filters and ReLU activations and a Conv layer with precision · recall
1 3 × 3 filter and sigmoid activation. The attention network has F1 = 2 ,
precision + recall
a small receptive field (7 pixels), meaning that each attention
value only depends on the local PAR. This is acceptable since where precision is defined as T PT+F P TP
P and recall is T P +F N (TP
the PAR already yields highly non-localized features from the = True Positives, FP = False Positives, FN = False Negatives).
DEM and captures long-range relationships in the scene. The F1 score is also evaluated during training on the validation
The attention network is also fully convolutional and is set and used for early stopping and for saving the best model.
jointly trained with the segmentation network. Our solution To evaluate the segmentation results at a coarser resolution
allows learning end-to-end on how to generate and apply the level, we considered the bounding boxes containing the
attention mask in a way that is optimal for the downstream avalanches in the ground truth and in the predicted mask. To
segmentation task. This is a more flexible approach than quantify how much the bounding boxes overlap in the ground
masking out parts of the input (e.g. by applying pre-computed truth and the predicted segmentation mask, we computed the
runout masks), or directly pre-multiplying the SAR channels intersection over union (IoU):
with the PAR feature map. Area of bounding boxes intersection
IoU = .
Area of bounding boxes union
D. Model training and evaluation
We compared the proposed deep learning method with the
We trained the FCN by feeding it with small square patches, state-of-the-art algorithm for automatic avalanche detection,
rather than processing entire scenes at once, which would which is currently used in production pipelines [2]. Such a
also be unfeasible due to the memory limitations of the segmentation algorithm is primarily driven by change detection
GPU1 . By using small patches it is also possible to inject and filtering methods to enhance potential avalanche features;
stochasticity in the learning phase by randomly shuffling and dynamic thresholding based on the statistics of image pairs
augmenting the data at each epoch. This limits overfitting controls the final delineated features. The baseline algorithm is,
and decreases the chances of getting stuck in local minima. to a large extent, dependent on additional input layers such as
We experimented with patches of 160 × 160 or 256 × 256 slope, vegetation maps and runout zone information that restrict
pixels, which is a size compatible with the receptive field of the areas where features are allowed to be detected, thereby
the filters in the innermost network layer (Enc Block 512), reducing the number of false alarms as much as possible.
which is 140. After preliminary experimentation, we obtained
the best performance with the 160 × 160 patches. The training TABLE I
and validation sets are generated by randomly partitioning S EGMENTATION RESULTS FROM THE TEST IMAGE WITH 99 AVALANCHES .
these patches in order to prevent biasing either the training or W E REPORT THE F1 SCORE ( IN PERCENT ), I NTERSECTION OVER U NION OF
THE BOUNDING BOXES ( IN PERCENT ), T RUE P OSITIVE ( CORRECT HITS ),
validation sets towards any particular imaging parameters, such FALSE N EGATIVE ( MISSED AVALANCHES DETECTION ), AND FALSE
as the incidence angle. It should, moreover, be noted that image P OSITIVE ( FALSE AVALANCHES DETECTION ).
pairs are only constructed from the same satellite orbit number,
such that the viewing geometries of the activity and reference Method F1 (%) IoU (%) TP (#) FN (#) FP (#)
images are nearly identical. To build the training/validation
set, we considered only the patches containing at least 1 pixel Baseline 38.13 33.11 44 45 11
FCN 66.6 54.3 72 17 32
1 Two Nvidia GTX2080 were used to train and evaluate the model

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
6

VV VV masked

Conv 1 + Sigmoid
Conv 32 + ReLU
Conv 32 + ReLU
Conv 32 + ReLU

Apply mask
Segmentation
network

PAR Attention Net Attention mask

VH VH masked
Slope
Fig. 6. For each patch, the Attention Net generates an attention mask from the PAR features and applies it to the VV, VH, and VVVH SAR features. The
masked SAR features and the slope (not masked) are then fed into the U-Net. Attention Net and U-net are jointly trained by minimizing the segmentation error.
Note that the VVVH feature is not shown in the figure for conciseness.

Fig. 7. Examples of prediction on individual patches of the validation set. From the left: i) VVVH input channel fed to FCN; ii) Slope feature fed to FCN; iii)
PAR feature fed to Attention Net; iv) ground truth labels manually annotated by the expert; v) raw output of the FCN; vi) FCN output thresholded at 0.5.

Tab. I reports the results obtained on the test image. A. Ablation study
Compared to the baseline, the FCN achieved a much higher
agreement with the manual labels, as indicated by the higher The ablation study consists of removing some features
F1 and IoU values. Out of the 99 avalanches in the test image, from the model or from the input data to evaluate how these
FCN correctly identified 72 of them and missed 17. However, affect the performance. In particular, we study how much each
most of the FN are small avalanches that are difficult to detect. SAR channel and the topographical feature maps contribute
FCN also identified 32 FP: most of them are due to particular to the segmentation results. We also evaluate the difference in
terrain structures, which cause high backscatter that resembles concatenating the PAR to the other input channels (VV, VH,
avalanches (see Fig. 8). Interestingly, some of those FPs are VVVH, and slope) or using it to compute the attention mask
actual avalanches that have been overlooked during the manual that is applied pixel-wise to the SAR channels (see the details
annotation. in Sect. IV-C).
The results reported in Tab. II indicate that the most
important improvement comes from including the difference

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
7

Fig. 8. Comparison between manual labeling and FCN output overlain onto a RGB change detection image. From the left: i) agreement between FCN detection
and manual annotations; ii) avalanches missed by the FCN; iii) false detection from the FCN algorithm; iv) avalanches correctly detected by the FCN but
overlooked during the manual annotation.

TABLE II applied to the input SAR features to let the segmentation


A BLATION EXPERIMENT RESULTS . network focusing more on the critical areas.
The results show the effectiveness of the proposed method,
VV VH VVVH Slope PAR PAR (attn.) F1
improving the F1 score of 38.1% achieved by a baseline signal
3 55.4 processing algorithm to 66.6%. The F1 score was computed
3 3 63.0
3 3 3 64.9 based on the manual labelling of the human expert. The
3 3 3 3 65.2 proposed deep learning model only fails to detect some of
3 3 3 3 3 65.4 the smaller avalanches, while detects additional avalanches that
3 3 3 3 3 66.6
have been missed by the expert.
By being the first of its kind, we believe that our work will
pave the way for pixel-level classification of snow avalanches
image obtained by the VH channels, compared to using the in SAR data with deep learning and will serve as a future
VV channel alone. By adding the slope and PAR features it reference in the field of Earth science and remote sensing. Our
is possible to further increase the segmentation performance. analysis and the obtained results suggest that the potential angle
Finally, the results show that the proposed attention mechanism of reach is well correlated with the presence of avalanches.
allows to better exploit the information yield by the PAR, Therefore, we believe that the proposed potential angle of
compared to just concatenating the PAR feature map to the reach feature will be useful for future work in this field. In the
other input channels. next step, we aim to extend our dataset to evaluate the FCN’s
performance on SAR images with different snow conditions
VI. C ONCLUSIONS (wet or dry).
In this work, we proposed the first deep learning approach
for saliency segmentation of avalanches in Sentinel-1 SAR R EFERENCES
images. As channels of the images provided as input to the
segmentation network, we used the time difference of the radar [1] M. Eckerstorfer, E. Malnes, and K. Müller, “A complete snow avalanche
backscatter information, as well as topographical information. activity record from a Norwegian forecasting region using Sentinel-1
satellite-radar data,” Cold Regions Science and Technology, vol. 144, pp.
The latter consists of the terrain slope and the newly introduced 39–51, 2017.
potential angle of reach, which indicates the likelihood of [2] H. Vickers, M. Eckerstorfer, E. Malnes, and A. Doulgeris, “Synthetic
finding avalanches at different locations. The topographical aperture radar (sar) monitoring of avalanche activity: An automated
detection scheme,” SCIA, 2017.
feature maps were provided along with the SAR features to a [3] X. X. Zhu, D. Tuia, L. Mou, G.-S. Xia, L. Zhang, F. Xu, and
Fully Convolutional Network, which was trained to perform F. Fraundorfer, “Deep learning in remote sensing: a comprehensive
avalanche segmentation. The ground truth segmentation masks review and list of resources,” IEEE Geoscience and Remote Sensing
Magazine, 2017.
used to train the deep learning model came from the manual
[4] Y. Zhou, H. Wang, F. Xu, and Y.-Q. Jin, “Polarimetric sar image clas-
labelling of avalanche pixels performed by a human expert. A sification using deep convolutional neural networks,” IEEE Geoscience
total of 118 Sentinel-1 SAR products were labelled, of which and Remote Sensing Letters, vol. 13, no. 12, pp. 1935–1939, 2016.
117 were used for training and one single product was used [5] M. Kampffmeyer, A.-B. Salberg, and R. Jenssen, “Urban land cover
classification with missing data modalities using deep convolutional
for testing the segmentation performance on unseen data. neural networks,” IEEE Journal of Selected Topics in Applied Earth
The Fully Convolutional Network was extended with an Observations and Remote Sensing, vol. 11, no. 6, pp. 1758–1768, 2018.
additional attention block, jointly trained with the rest of [6] O. A. Penatti, K. Nogueira, and J. A. Dos Santos, “Do deep features
generalize from everyday objects to remote sensing and aerial scenes
the segmentation network, which computes an attention mask domains?” in Proceedings of the IEEE conference on computer vision
conditioned on the potential angle of reach. The mask was and pattern recognition workshops, 2015, pp. 44–51.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
8

[7] L. T. Luppino, F. M. Bianchi, G. Moser, and S. N. Anfinsen, “Unsu-


pervised image regression for heterogeneous change detection,” IEEE
Transactions on Geoscience and Remote Sensing, pp. 1–16, 2019.
[8] ——, “Remote sensing image regression for heterogeneous change
detection,” in 2018 IEEE 28th International Workshop on Machine
Learning for Signal Processing (MLSP), Sep. 2018, pp. 1–6.
[9] M. Kampffmeyer, A.-B. Salberg, and R. Jenssen, “Semantic segmentation
of small objects and modeling of uncertainty in urban remote sensing
images using deep convolutional neural networks,” in CVPR, 2016.
[10] F. M. Bianchi, M. M. Espeseth, and N. Borch, “Large-scale detection and
categorization of oil spills from sar images with deep learning,” Remote
Sensing, vol. 12, no. 14, p. 2260, 2020.
[11] P. E. Kummervold, E. Malnes, M. Eckerstorfer, I. Arntzen, and F. M.
Bianchi, “Avalanche detection in sentinel-1 radar images using con-
volutional neural networks,” in International Snow Science Workshop,
2018.
[12] S. Sinha, S. Giffard-Roisin, F. Karbou, M. Deschatres, A. Karas,
N. Eckert, C. Coléou, and C. Monteleoni, “Can avalanche deposits be
effectively detected by deep learning on sentinel-1 satellite sar images?”
2019.
[13] A. U. Waldeland, J. H. Reksten, and A.-B. Salberg, “Avalanche detection
in sar images using deep learning,” in IGARSS 2018-2018 IEEE
International Geoscience and Remote Sensing Symposium. IEEE, 2018,
pp. 2386–2389.
[14] T. Grydeland and Y. Larsen, “Beyond plane sailing: Solving the range-
doppler equations in a reduced geometry,” in EUSAR 2018; 12th European
Conference on Synthetic Aperture Radar, 2018.
[15] S. Bakkehoi, U. Domaas, and K. Lied, “Calculation of snow avalanche
runout distance,” Annals of Glaciology, vol. 4, p. 24–29, 1983.
[16] D. Delparte, B. Jamieson, and N. Waters, “Statistical runout modeling
of snow avalanches using gis in glacier national park, canada,” Cold
Regions Science and Technology, vol. 54, no. 3, pp. 183–192, 2008.
[17] K. Johnston, B. Jamieson, and A. Jones, “Estimating extreme snow
avalanche runout for the columbia mountains, british columbia, canada,”
in Proceedings of the 5th Canadian Conference on Geotechnique and
Natural Hazards, Kelowna, BC, 2011.
[18] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
for biomedical image segmentation,” in MICCAI, 2015.
[19] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
network training by reducing internal covariate shift,” arXiv preprint
arXiv:1502.03167, 2015.
[20] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdi-
nov, “Dropout: a simple way to prevent neural networks from overfitting,”
The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958,
2014.
[21] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-
decoder with atrous separable convolution for semantic image segmenta-
tion,” in Proceedings of the European conference on computer vision
(ECCV), 2018, pp. 801–818.
[22] M. Krestenitis, G. Orfanidis, K. Ioannidis, K. Avgerinakis, S. Vrochidis,
and I. Kompatsiaris, “Oil spill identification from satellite images using
deep neural networks,” Remote Sensing, vol. 11, no. 15, 2019. [Online].
Available: https://www.mdpi.com/2072-4292/11/15/1762
[23] G. Csurka, D. Larlus, F. Perronnin, and F. Meylan, “What is a good
evaluation measure for semantic segmentation?.” in BMVC, vol. 27.
Citeseer, 2013, p. 2013.
[24] M. Berman, A. Rannen Triki, and M. B. Blaschko, “The lovász-softmax
loss: A tractable surrogate for the optimization of the intersection-
over-union measure in neural networks,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2018, pp.
4413–4421.
[25] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel,
and Y. Bengio, “Show, attend and tell: Neural image caption generation
with visual attention,” in International conference on machine learning,
2015, pp. 2048–2057.
[26] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
International Conference on Learning Representations (ICLR), 2015.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like