Semisupervised_Change_Detection_Using_Graph_Convolutional_Network
Semisupervised_Change_Detection_Using_Graph_Convolutional_Network
Abstract— Most change detection (CD) methods are unsuper- effective in various applications and a number of object-based
vised as collecting substantial multitemporal training data is unsupervised variants [1] have been proposed for CD in very
challenging. Unsupervised CD methods are driven by heuristics high spatial resolution (VHR) images. However, in the absence
and lack the capability to learn from data. However, in many
real-world applications, it is possible to collect a small amount of external supervision in the form of labels, unsupervised
of labeled data scattered across the analyzed scene. Such a few methods depend on the distribution of values in the difference
scattered labeled samples in the pool of unlabeled samples can be image and underlying principles injected in the algorithm.
effectively handled by graph convolutional network (GCN) that They lack the ability to model based on data. Recently, convo-
has recently shown good performance in semisupervised single- lutional neural network (CNN)-based methods outperformed
date analysis, to improve change detection performance. Based on
this, we propose a semisupervised CD method that encodes mul- other methods in most image analysis tasks. Some attempts
titemporal images as a graph via multiscale parcel segmentation exist in the literature that try to use CNN for unsupervised
that effectively captures the spatial and spectral aspects of the CD [2], [3]. Saha et. al. [2] proposed a deep-change-vector-
multitemporal images. The graph is further processed through analysis (DCVA) framework exploiting a pretrained CNN
GCN to learn a multitemporal model. Information from the network as a bitemporal feature extractor. It lacks the ability
labeled parcels is propagated to the unlabeled ones over training
iterations. By exploiting the homogeneity of the parcels, the model to infer beyond the scope of what the pretrained network has
is used to infer the label at a pixel level. To show the effectiveness learned and is dependent on the distribution of deep-change
of the proposed method, we tested it on a multitemporal Very vectors as other difference-based methods.
High spatial Resolution (VHR) data set acquired by Pleiades
In the machine-learning literature, semisupervised classifi-
sensor over Trento, Italy.
cation [4] performs training using abundant unlabeled data
Index Terms— Change detection (CD), deep learning, graph and a few labeled data. In many applications of CD, it can
convolutional network (GCN), high resolution, semisupervised.
be possible to collect a few labeled pixels scattered across the
I. I NTRODUCTION analyzed scene. Knowledge of the labeled pixels is completely
unused if the unsupervised approach is used. Inspired by this,
C HANGE detection methods can be both supervised and
unsupervised. Supervised methods require an ample
amount of training data. While they potentially provide better
there are a few works for semisupervised CD in low-/medium-
resolution images that exploit semisupervised clustering [5],
results by exploiting the prediction model learned using [6] or neural networks [7]. However, no attention has been
training samples, the collection of substantial amount of given to semisupervised method for CD in the VHR images.
multitemporal training data is difficult. Though collecting a A recently introduced variant in the family of CNNs,
small amount of them may be possible in most applications, graph convolutional network (GCN), is effective in capturing
it may not be suitable for training a supervised model and spatial context information and complex details in VHR optical
hence unsupervised CD methods are preferred in the literature images [8]. Moreover, Kipf and Welling [9] showed that GCN
[1]. Most of the unsupervised methods generate a difference can effectively work under semisupervised settings, i.e., with
image that is further analyzed to distinguish changed pixels a small amount of labeled data and ample unlabeled data.
from the unchanged ones. Such methods demonstrated to be By treating target data as a collection of nodes in a graph,
GCN has the ability to propagate information from the labeled
Manuscript received October 16, 2019; revised January 28, 2020 and
February 25, 2020; accepted March 30, 2020. Date of publication April 17, nodes to the unlabeled ones.
2020; date of current version March 25, 2021. (Corresponding author: Motivated by the abovementioned analysis, we propose to
Francesca Bovolo.)
Sudipan Saha is with Fondazione Bruno Kessler, 38123 Trento, Italy, and expand GCN [9] for semisupervised CD and for processing
also with the Department of Information Engineering and Computer Science, VHR images. The method assumes that a small amount of
University of Trento, 38123 Trento, Italy (e-mail: [email protected]). multitemporal labels in the form of change and no change
Lichao Mou and Xiao Xiang Zhu are with the Remote Sensing Technology
Institute (IMF), German Aerospace Center, 82234 Weßling, Germany, and is available and defines CD as a semisupervised classifica-
also with the Signal Processing in Earth Observation, Technical University of tion problem. The proposed method employs a novel graph
Munich, 80333 Munich, Germany. construction to effectively capture multiscale spatial and spec-
Francesca Bovolo is with Fondazione Bruno Kessler, 38123 Trento, Italy
(e-mail: [email protected]). tral information. Input images are seen as a composition of
Lorenzo Bruzzone is with the Department of Information Engineering and multitemporal parcels [1]. Parcels are treated as nodes to
Computer Science, University of Trento, 38123 Trento, Italy. represent them in the form of graph. Each node is represented
Color versions of one or more of the figures in this letter are available
online at https://ieeexplore.ieee.org. by features and the spatial relationship between the nodes is
Digital Object Identifier 10.1109/LGRS.2020.2985340 captured in the form of an adjacency matrix. GCN learns
1545-598X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Universidad del Valle. Downloaded on June 21,2024 at 03:57:01 UTC from IEEE Xplore. Restrictions apply.
608 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 18, NO. 4, APRIL 2021
Authorized licensed use limited to: Universidad del Valle. Downloaded on June 21,2024 at 03:57:01 UTC from IEEE Xplore. Restrictions apply.
SAHA et al.: SEMISUPERVISED CD USING GCN 609
C. Graph Construction
The N p parcels obtained at resolution level k = 1 are treated
as nodes, and we generate the graph-based representation by
computing features for each node and representing the graph
using an adjacency matrix. The parcel segmentation at level
k = 2, . . . , K is used to compute multiscale feature of the
nodes only.
To extract feature vectors Fi representing each node i ,
we exploit multilevel ad hoc features that capture both spa-
tial information and spectral information characterizing the
node and its neighborhood. To compute the feature value
for a node i , four features (mean, maximum, and minimum
spectral values and area) are computed bandwise from the
corresponding parcel at a resolution level k = 1. To capture
multilevel information, the abovementioned four features are
calculated for resolution level k = 2, . . . , K from the parcels
in which the considered node is contained in at those levels.
Aggregating four features from all K levels and B bands of
X 1 and X 2 , a node i is represented by a feature vector Fi of
size: D = 4 × K × B × 2.
Given the parcel segmentation, we aim to obtain an adja-
cency matrix A where the elements Ai j will capture the
interaction between two parcels: 1) spatial distance and
2) spectral closeness. First, if two parcel i and j do not share
any adjacent pixel, their value in the adjacency matrix Ai j
is set to 0. The diagonal elements Aii are also set to 0. For
the set of parcel i and j that are adjacent, we measure the
feature closeness between two parcels i and j by considering
the distance between the feature components as
1 Fid − F dj
D
Ai j = 1 − . (1)
D d=1 Fid + F dj
This step ensures that the parcels i and j (i = j ) are in
spatially adjacent region and share a similar feature value.
The abovementioned procedure is a novel graph construc-
tion mechanism that combines the spatial, spectral, and tem-
poral information of VHR image pairs in the N p parcels. F
and A represent a compact and effective way to provide the Fig. 2. CD results for the proposed method by varying (a) I, (b) dropout,
(c) neurons, and (d) training samples.
spatial, spectral, and temporal information as input to the GCN
model. GCNs can be categorized into two types: spectral and spatial
based. In this letter, we employ a spectral-based approach
D. Semisupervised GCN Model that defines graph convolution by exploiting filters from graph
signal processing [11]. We consider multilayer GCN consisting
The GCN model is inspired from [9]. The GCN model
of L convolutional layers comprising of β neurons each.
training takes advantage of the proposed novel structure of
Rectified Linear Unit (ReLU) activation functions are used
the spatial, spectral, and temporal information proposed in
after every convolutional layer. The matrix of activation in
Section II-A–II-C: 1) unlabeled information from the whole
the layer l = 1, . . . , L of the GCN is H l , where H 0 denotes
scene which is coded in terms of Fi (i = 1, . . . , N p ) and
F , i.e., the set of features of the input graph. We define an
A and 2) spatial-context information in form of M p labeled
adjusted adjacency matrix à that is the adjacency matrix with
parcels/nodes. GCN exploits the relation between nodes and
added self-connections. It is obtained by adding an identity
enables the labels to propagate among neighboring nodes
matrix to A:
iteratively. GCNs extend the concept of convolution from
traditional grid-based data to graph data by learning a function à = A + I N p . (2)
f (·) that effectively generates the representation of a node
We further define a matrix D̃ that is a diagonal matrix obtained
i by taking into account its own feature vector Fi along
by summing each column of Ã
with the feature vectors F j of the neighbors j . A striking
difference from grid-based convolution is that in graph-based D̃ii = Ãi j . (3)
convolution, the number of neighbors of a node is not fixed. j
Authorized licensed use limited to: Universidad del Valle. Downloaded on June 21,2024 at 03:57:01 UTC from IEEE Xplore. Restrictions apply.
610 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 18, NO. 4, APRIL 2021
Fig. 3. CD for Trento data set. (a) Prechange image (RGB). (b) Postchange image (RGB). (c) Reference change map. CD map. (d) Proposed method.
(e) Semisupervised perceptron [7]. (f) DCVA [2].
Motivated by the first-order approximation of localized spec- across other unlabeled adjacent nodes. After I iterations,
tral filters in graphs, the layerwise propagation rule from layer the trained model is used to predict labels for all N p nodes.
l to layer l + 1 can be stated as [9] Considering that parcels are homogeneous, the predicted label
for a parcel is used to infer labels for pixels in that parcel and
H l+1 = σ D̃−0.5 ÃD̃−0.5 H l W l (4)
thus obtain labels for all N pixels in the X 1 , X 2 pair.
where σ denotes the ReLU activation function and W l denotes III. E XPERIMENTAL VALIDATION
the learnable weight matrix corresponding to layer l. For experimental validation (Fig. 2), we chose a bitemporal
Since H 0 = F , we can use it to compute H 1 and similarly image pair from Trento, Italy. The images show a com-
all values up to H L . Since A = D̃−0.5 ÃD̃−0.5 depends on the plex quasi-urban area containing patches of vegetation, river,
adjacency matrix only, it is calculated as a preprocessing step bushes, and Alpine mountains. They contain urban vertical
and reused during weight propagation across convolutional structures of different heights. The image pair was acquired
layers. Considering L layer GCN, the proposed forward model in August 2012 [see Fig. 3(a)] and September 2013 [see
takes the following form [9]: Fig. 3(b)]. They have 0.5-m/pixel resolution and a size of 2100
f (F , A) = softmax A . . . σ A σ AF
W 0 W 1 . . . W L−1 × 4200 pixels. Pleiades sensor has four spectral bands in the
spectral range 450–900 nm. A reference change map is shown
(5) in Fig. 3(c).
where W 0 denotes the weight matrix from input to first hidden The number of nodes N p obtained from the images is
layer and W L−1 denotes weight matrix from last hidden layer 35 847. To capture multilevel features, we use K = 4 and thus
to the output. The training process is made further stochastic obtain a feature vector of dimension D = 128 corresponding
and less prone to overfitting by introducing dropout [12] that to each node. We performed the experiments by keeping
randomly drops some of the β neurons during each iteration. the number of labeled parcels M p as 200 (200 35847),
The proposed novel mechanism uses all N p nodes that including 100 parcels from change areas and the same from
include multitemporal parcels without label information. Con- unchanged areas. We chose the labeled pixels scattered in
sidering our semisupervised learning setting, cross-entropy a way such that each M p contains only one labeled pixel.
loss function [9] is used to calculate Lm (∀m ∈ M p ) and We set the number of hidden layers of the GCN, L = 2.
overall loss L is computed by summing over M p labeled We present the results in terms of sensitivity (accuracy of
parcels, that is detecting changed pixels), specificity (accuracy of detecting
Mp
unchanged pixels), and overall accuracy [2].
L= Lm . (6)
m=1 A. Robustness Analysis
Given L, the network weights W , W , . . . , W0 1
are L−1
We examined the effect of varying: 1) training iterations I;
adjusted using gradient descent over iterations I. In this way, 2) dropout fraction; 3) hidden layer neurons β; and 4) training
the gradient information from the labeled nodes is spread samples M p .
Authorized licensed use limited to: Universidad del Valle. Downloaded on June 21,2024 at 03:57:01 UTC from IEEE Xplore. Restrictions apply.
SAHA et al.: SEMISUPERVISED CD USING GCN 611
Authorized licensed use limited to: Universidad del Valle. Downloaded on June 21,2024 at 03:57:01 UTC from IEEE Xplore. Restrictions apply.