Valanghe
Valanghe
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
1
Abstract—Knowledge about frequency and location of snow and the texture of avalanche debris. Also, local topography
avalanche activity is essential for forecasting and mapping of in which the avalanches occur is largely disregarded since it
snow avalanche hazard. Traditional field monitoring of avalanche has only been used to mask out areas where avalanches are
activity has limitations, especially when surveying large and
remote areas. In recent years, avalanche detection in Sentinel-1 unlikely to occur. However, the occurrence of avalanches is
radar satellite imagery has been developed to improve monitoring. strongly correlated to topographical conditions and avalanche
However, the current state-of-the-art detection algorithms, based debris exhibits characteristic shapes, which should both be
on radar signal processing techniques, are still much less accurate
taken into account when performing the detection.
than human experts. To reduce this gap, we propose a deep
learning architecture for detecting avalanches in Sentinel-1 radar Convolutional neural networks (CNNs) have attracted con-
images. We trained a neural network on 6, 345 manually labelled siderable interest for their ability to model complex contextual
avalanches from 117 Sentinel-1 images, each one consisting of six information in images [3]. Prominent examples in remote
channels that include backscatter and topographical information. sensing are terrain surface classification [4], [5], categorization
Then, we tested our trained model on a new SAR image. of aerial scenes [6], detection of changes in the terrain
Comparing to the manual labelling (the gold standard), we
achieved an F1 score above 66%, while the state-of-the-art over time from SAR and optical satellite sensors [7], [8],
detection algorithm sits at an F1 score of only 38%. A visual and segmentation of objects from airborne images [9], [10].
inspection of the results generated by our deep learning model Nevertheless, few research efforts have been devoted so far
shows that only small avalanches are undetected, while some towards detecting avalanche activity from SAR data, which
avalanches that were originally not labelled by the human expert remains an open and challenging endeavour. In our previous
are discovered.
work [11], we proposed a deep learning architecture to perform
Index Terms—Deep Learning; Saliency Segmentation; Convo- binary classification of avalanches in Northern Norway. In
lutional Neural Networks; Snow Avalanches; SAR; Sentinel-1.
particular, we used a CNN to classify fixed-size patches of
SAR images in two classes: 1 if the patch contains at least
I. I NTRODUCTION one avalanche, or 0 otherwise. Our approach was successively
Knowledge about the spatio-temporal distribution of snow adopted later on for SAR-borne avalanche detection in the
avalanche (hereafter referred to as avalanche) activity in a Alps [12] and in other locations in Norway [13]. As a
given region is critical for avalanche forecasting and hazard major limitation, patch-wise classification cannot determine the
mapping. An increase in avalanche activity or magnitude of presence of multiple avalanches within the same patch. Also,
releasing avalanches leads to an increase in avalanche risk. Con- the results are heavily influenced by the patch size, which
ventionally, avalanche activity is primarily monitored through makes it difficult to evaluate the detection performance. In
field measurements, which is time-consuming, expensive, and particular, for large windows is easier to correctly predict the
can only be done for very few accessible areas. Monitoring presence of at least one avalanche, but the resolution of the
avalanche activity using satellite-borne synthetic aperture radar detection is too coarse and not very useful.
(SAR) has, therefore, gained considerable interest in recent In this work, we approach avalanche detection as a saliency
years. SAR products enable continuous covering of very large segmentation task, where the classification is not done at the
areas, regardless of light and weather conditions [1]. patch level, but rather at the individual pixel level. We adopt
An experienced operator can identify avalanche debris (the a Fully Convolutional Network (FCN) architecture, which
depositional part of an avalanche) in SAR change detection generates for each input image a segmentation mask. This
composites (showing temporal radar backscatter change) with solves the drawback of the dependency from the window size
high accuracy. On the other hand, automatic signal processing and makes it possible to determine the exact location of the
methods based on radar backscatter thresholding and segmenta- avalanches. Our work provides important contributions to the
tion often fail and produce a large number of false alarms due fields of Earth science, remote sensing, and avalanche risk
to the highly dynamic nature of snow in the SAR images [2]. assessment.
A key limitation of classical segmentation methods is that • We explore, for the first time, the capability of deep
they mainly focus only on pixel-wise information in radar learning models in detecting the presence of avalanches
backscatter, without accounting for the contextual information in SAR products at a pixel granularity and surpass the
around the pixel and high-level features, such as the shape current state-of-the-art avalanche detection algorithm [2].
*[email protected] Our work was possible thanks to a large dataset of SAR
The authors are with NORCE, The Norwegian Research Centre AS products manually annotated by an avalanche expert.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
2
• We advance our knowledge on topographical features to from the following three channels: R[VVreference ], G[VVactivity ],
identify areas where avalanches are highly likely to occur. B[VVreference ]. We considered visual detection as the golden
Notably, we introduce a new topographical feature, called standard and used it as ground truth to train and evaluate
potential angle of reach (PAR), which indicates how likely our deep learning model. The whole dataset contains a total
it is for an avalanche to reach a specific location. We do not of 6, 345 avalanches; 3, 667, 355, 474 pixels are classified as
use the PAR to filter input images or detection results, but “non-avalanche” and 712, 945 (0.000194% of the total) as
we rather provide the PAR as an exogenous input feature “avalanche”.
to the FCN. We first estimate how informative is the PAR
in the discrimination of avalanche and not-avalanche pixels. III. T OPOGRAPHICAL FEATURES
Then, in the experimental section, we evaluate how much the Since avalanches are caused by steep terrain, the topography
detection performance of the deep learning model improves is an important factor to determine where avalanches can appear.
when providing the FCN with the PAR feature map. In particular, the local slope needs to be steep enough for an
avalanche to release and the slope typically needs to flatten
II. SAR DATASET out for the avalanche to stop. Therefore, it is reasonable to
The dataset consists of data from the Sentinel-1 (S1) satellites. consider such information when performing the detection task
In particular, data acquired in the interferometric wideswath and we generated two feature maps from the digital elevation
(IW) mode was considered in terms of the ground range model (DEM), which is available for the entire Norway in
detected (GRD) product. In total, 118 SAR scenes covering 10m pixel resolution. The first is the local slope angle of the
two mountainous regions in Northern Norway in the period terrain; the second is a new topographical feature introduced
Oct. 2014-Apr. 2017. in this work, which is called potential angle of reach (PAR).
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
3
70 60
60 50
50
40
40
30
30
20
20
10 10
0 0
Fig. 1. (a, b) the SAR features obtained from the difference in the VV and VH channels. (e) product VVVH of the squared differences. (d, e) slope and PAR
feature maps. Only a small area (1k × 1k pixels) of the actual scene is depicted here.
avalanche pixels non-avalanche pixels to see that for avalanche pixels the distribution is more regular
0.06 0.04 and has a single peak centred around 40 degrees. While the true
0.05 0.03
angle of reach is expected to range 20 to 40 degrees, the PAR
0.04 is consequently biased towards higher values. We concluded
0.03 0.02
that the PAR is informative since the two distributions are
0.02
0.01 different for the two classes. Contrarily to the slope, the PAR
0.01
0.00 0.00 is not simply concatenated to the other layers of the input
0 20 40 60 0 25 50 75
Slope Slope image but is rather used to encourage the deep learning model
Fig. 2. Distribution of the slope angle for avalanche and non-avalanche pixels to focus on specific areas (see Sect. IV-C).
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
4
avalanche pixels non-avalanche pixels with max pooling, while the decoder restores it through bilinear
0.08
upsampling. Each block contains 2 Batch Normalization [19]
0.03 and one Dropout layer [20], which are respectively used to
0.06
0.02 facilitate the training convergence and improve the model
0.04 generalization capability. We note that Batch Norm layers
0.02 0.01 are not present in the original U-net architectures but, as also
0.00 0.00 verified in preliminary experiments, their presence improves
0 20 40 60 0 20 40 60 80
PAR PAR the segmentation performance. The last encoder block (Enc
Fig. 4. Distribution of the PAR for avalanche and non-avalanche pixels Block 512 in Fig. 5) does not have Dropout, while the last
decoder block (Dec Block 32) is followed by a Conv layer with
Bi-Upsample
one 1 × 1 filter and a sigmoid activation. Since the network is
Conv(n)
BatchNorm Concatenate with Skip fully convolutional (there are no dense layers), it can process
Dropout
ReLU inputs of variable size.
Conv(n) Conv(n)
BatchNorm BatchNorm We note that it would be possible to use more powerful
ReLU ReLU FCN architectures such as DeepLabV3+ [21], which achieves
MaxPool Conv(n)
BatchNorm
state-of-the-art results in segmenting natural images. However,
Dropout
ReLU models with a larger capacity, such as DeepLabV3+, require
Encoder Block Decoder Block very large datasets to be trained on. In remote sensing
applications, a smaller network such as U-net is often preferred,
given the limited amount of training data. Moreover, the U-net
Enc Block 32
Dec Block 32
Dec Block 64
A. Class balance
Avalanches are small objects and the avalanche class is highly
Enc Block 128
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
5
To obtain the final segmentation, we first merge the multiple classified as “avalanche” by the human expert. We ended up
predictions available at each pixel location (stemming from the with ≈ 35.000 patches, of which 10% were used as a validation
geometric transformations and the overlapping windows) and set for model selection and early stopping. Finally, out of the
then we join them by using a 2nd order spline interpolation. 118 available S1 scenes, one scene with date 17-Apr-2018,
which contains 99 avalanches, was isolated from the rest and
C. Attention mask used as the test set.
Following our hypothesis that the PAR feature map can
highlight areas where it is more likely to find an avalanche, V. R ESULTS AND DISCUSSION
we propose a neural attention mechanism [25] that generates The network is trained with Adam optimizer [26] with default
an attention mask conditioned on the PAR. The intention is parameters; we used mini-batches of size 16 and dropout rate
to learn an attention mask that encourages the segmentation 0.4. Examples of FCN predictions are depicted in Fig. 7. Since
procedure to put more focus on specific regions of the input the networks predict real values in [0,1], a binary segmentation
image. Specifically, we use a small network that takes as input mask (last column) is obtained by thresholding the soft output
the PAR and generates the attention mask that is, subsequently, (3rd column) at 0.5.
applied pixel-wise to the SAR channels (VV, VH, and VVVH) Since the avalanche class is highly under-represented, accu-
before they are fed into the segmentation network (see Fig. 6). racy is not a good measure to quantify the performance and,
We note that the attention mask is not applied to the input therefore, we evaluated the quality of the segmentation result
channel containing the slope feature map. by using different metrics. The first is the F1 score, which is
The attention network consists of three stacked Conv layers computed at the pixel level and is defined as
with 32 3×3 filters and ReLU activations and a Conv layer with precision · recall
1 3 × 3 filter and sigmoid activation. The attention network has F1 = 2 ,
precision + recall
a small receptive field (7 pixels), meaning that each attention
value only depends on the local PAR. This is acceptable since where precision is defined as T PT+F P TP
P and recall is T P +F N (TP
the PAR already yields highly non-localized features from the = True Positives, FP = False Positives, FN = False Negatives).
DEM and captures long-range relationships in the scene. The F1 score is also evaluated during training on the validation
The attention network is also fully convolutional and is set and used for early stopping and for saving the best model.
jointly trained with the segmentation network. Our solution To evaluate the segmentation results at a coarser resolution
allows learning end-to-end on how to generate and apply the level, we considered the bounding boxes containing the
attention mask in a way that is optimal for the downstream avalanches in the ground truth and in the predicted mask. To
segmentation task. This is a more flexible approach than quantify how much the bounding boxes overlap in the ground
masking out parts of the input (e.g. by applying pre-computed truth and the predicted segmentation mask, we computed the
runout masks), or directly pre-multiplying the SAR channels intersection over union (IoU):
with the PAR feature map. Area of bounding boxes intersection
IoU = .
Area of bounding boxes union
D. Model training and evaluation
We compared the proposed deep learning method with the
We trained the FCN by feeding it with small square patches, state-of-the-art algorithm for automatic avalanche detection,
rather than processing entire scenes at once, which would which is currently used in production pipelines [2]. Such a
also be unfeasible due to the memory limitations of the segmentation algorithm is primarily driven by change detection
GPU1 . By using small patches it is also possible to inject and filtering methods to enhance potential avalanche features;
stochasticity in the learning phase by randomly shuffling and dynamic thresholding based on the statistics of image pairs
augmenting the data at each epoch. This limits overfitting controls the final delineated features. The baseline algorithm is,
and decreases the chances of getting stuck in local minima. to a large extent, dependent on additional input layers such as
We experimented with patches of 160 × 160 or 256 × 256 slope, vegetation maps and runout zone information that restrict
pixels, which is a size compatible with the receptive field of the areas where features are allowed to be detected, thereby
the filters in the innermost network layer (Enc Block 512), reducing the number of false alarms as much as possible.
which is 140. After preliminary experimentation, we obtained
the best performance with the 160 × 160 patches. The training TABLE I
and validation sets are generated by randomly partitioning S EGMENTATION RESULTS FROM THE TEST IMAGE WITH 99 AVALANCHES .
these patches in order to prevent biasing either the training or W E REPORT THE F1 SCORE ( IN PERCENT ), I NTERSECTION OVER U NION OF
THE BOUNDING BOXES ( IN PERCENT ), T RUE P OSITIVE ( CORRECT HITS ),
validation sets towards any particular imaging parameters, such FALSE N EGATIVE ( MISSED AVALANCHES DETECTION ), AND FALSE
as the incidence angle. It should, moreover, be noted that image P OSITIVE ( FALSE AVALANCHES DETECTION ).
pairs are only constructed from the same satellite orbit number,
such that the viewing geometries of the activity and reference Method F1 (%) IoU (%) TP (#) FN (#) FP (#)
images are nearly identical. To build the training/validation
set, we considered only the patches containing at least 1 pixel Baseline 38.13 33.11 44 45 11
FCN 66.6 54.3 72 17 32
1 Two Nvidia GTX2080 were used to train and evaluate the model
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
6
VV VV masked
Conv 1 + Sigmoid
Conv 32 + ReLU
Conv 32 + ReLU
Conv 32 + ReLU
Apply mask
Segmentation
network
VH VH masked
Slope
Fig. 6. For each patch, the Attention Net generates an attention mask from the PAR features and applies it to the VV, VH, and VVVH SAR features. The
masked SAR features and the slope (not masked) are then fed into the U-Net. Attention Net and U-net are jointly trained by minimizing the segmentation error.
Note that the VVVH feature is not shown in the figure for conciseness.
Fig. 7. Examples of prediction on individual patches of the validation set. From the left: i) VVVH input channel fed to FCN; ii) Slope feature fed to FCN; iii)
PAR feature fed to Attention Net; iv) ground truth labels manually annotated by the expert; v) raw output of the FCN; vi) FCN output thresholded at 0.5.
Tab. I reports the results obtained on the test image. A. Ablation study
Compared to the baseline, the FCN achieved a much higher
agreement with the manual labels, as indicated by the higher The ablation study consists of removing some features
F1 and IoU values. Out of the 99 avalanches in the test image, from the model or from the input data to evaluate how these
FCN correctly identified 72 of them and missed 17. However, affect the performance. In particular, we study how much each
most of the FN are small avalanches that are difficult to detect. SAR channel and the topographical feature maps contribute
FCN also identified 32 FP: most of them are due to particular to the segmentation results. We also evaluate the difference in
terrain structures, which cause high backscatter that resembles concatenating the PAR to the other input channels (VV, VH,
avalanches (see Fig. 8). Interestingly, some of those FPs are VVVH, and slope) or using it to compute the attention mask
actual avalanches that have been overlooked during the manual that is applied pixel-wise to the SAR channels (see the details
annotation. in Sect. IV-C).
The results reported in Tab. II indicate that the most
important improvement comes from including the difference
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
7
Fig. 8. Comparison between manual labeling and FCN output overlain onto a RGB change detection image. From the left: i) agreement between FCN detection
and manual annotations; ii) avalanches missed by the FCN; iii) false detection from the FCN algorithm; iv) avalanches correctly detected by the FCN but
overlooked during the manual annotation.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3036914, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
8
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/