0% found this document useful (0 votes)
4 views

Semisupervised_Change_Detection_Using_Graph_Convolutional_Network

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Semisupervised_Change_Detection_Using_Graph_Convolutional_Network

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 18, NO.

4, APRIL 2021 607

Semisupervised Change Detection Using Graph


Convolutional Network
Sudipan Saha , Student Member, IEEE, Lichao Mou, Xiao Xiang Zhu , Senior Member, IEEE,
Francesca Bovolo , Member, IEEE, and Lorenzo Bruzzone , Fellow, IEEE

Abstract— Most change detection (CD) methods are unsuper- effective in various applications and a number of object-based
vised as collecting substantial multitemporal training data is unsupervised variants [1] have been proposed for CD in very
challenging. Unsupervised CD methods are driven by heuristics high spatial resolution (VHR) images. However, in the absence
and lack the capability to learn from data. However, in many
real-world applications, it is possible to collect a small amount of external supervision in the form of labels, unsupervised
of labeled data scattered across the analyzed scene. Such a few methods depend on the distribution of values in the difference
scattered labeled samples in the pool of unlabeled samples can be image and underlying principles injected in the algorithm.
effectively handled by graph convolutional network (GCN) that They lack the ability to model based on data. Recently, convo-
has recently shown good performance in semisupervised single- lutional neural network (CNN)-based methods outperformed
date analysis, to improve change detection performance. Based on
this, we propose a semisupervised CD method that encodes mul- other methods in most image analysis tasks. Some attempts
titemporal images as a graph via multiscale parcel segmentation exist in the literature that try to use CNN for unsupervised
that effectively captures the spatial and spectral aspects of the CD [2], [3]. Saha et. al. [2] proposed a deep-change-vector-
multitemporal images. The graph is further processed through analysis (DCVA) framework exploiting a pretrained CNN
GCN to learn a multitemporal model. Information from the network as a bitemporal feature extractor. It lacks the ability
labeled parcels is propagated to the unlabeled ones over training
iterations. By exploiting the homogeneity of the parcels, the model to infer beyond the scope of what the pretrained network has
is used to infer the label at a pixel level. To show the effectiveness learned and is dependent on the distribution of deep-change
of the proposed method, we tested it on a multitemporal Very vectors as other difference-based methods.
High spatial Resolution (VHR) data set acquired by Pleiades
In the machine-learning literature, semisupervised classifi-
sensor over Trento, Italy.
cation [4] performs training using abundant unlabeled data
Index Terms— Change detection (CD), deep learning, graph and a few labeled data. In many applications of CD, it can
convolutional network (GCN), high resolution, semisupervised.
be possible to collect a few labeled pixels scattered across the
I. I NTRODUCTION analyzed scene. Knowledge of the labeled pixels is completely
unused if the unsupervised approach is used. Inspired by this,
C HANGE detection methods can be both supervised and
unsupervised. Supervised methods require an ample
amount of training data. While they potentially provide better
there are a few works for semisupervised CD in low-/medium-
resolution images that exploit semisupervised clustering [5],
results by exploiting the prediction model learned using [6] or neural networks [7]. However, no attention has been
training samples, the collection of substantial amount of given to semisupervised method for CD in the VHR images.
multitemporal training data is difficult. Though collecting a A recently introduced variant in the family of CNNs,
small amount of them may be possible in most applications, graph convolutional network (GCN), is effective in capturing
it may not be suitable for training a supervised model and spatial context information and complex details in VHR optical
hence unsupervised CD methods are preferred in the literature images [8]. Moreover, Kipf and Welling [9] showed that GCN
[1]. Most of the unsupervised methods generate a difference can effectively work under semisupervised settings, i.e., with
image that is further analyzed to distinguish changed pixels a small amount of labeled data and ample unlabeled data.
from the unchanged ones. Such methods demonstrated to be By treating target data as a collection of nodes in a graph,
GCN has the ability to propagate information from the labeled
Manuscript received October 16, 2019; revised January 28, 2020 and
February 25, 2020; accepted March 30, 2020. Date of publication April 17, nodes to the unlabeled ones.
2020; date of current version March 25, 2021. (Corresponding author: Motivated by the abovementioned analysis, we propose to
Francesca Bovolo.)
Sudipan Saha is with Fondazione Bruno Kessler, 38123 Trento, Italy, and expand GCN [9] for semisupervised CD and for processing
also with the Department of Information Engineering and Computer Science, VHR images. The method assumes that a small amount of
University of Trento, 38123 Trento, Italy (e-mail: [email protected]). multitemporal labels in the form of change and no change
Lichao Mou and Xiao Xiang Zhu are with the Remote Sensing Technology
Institute (IMF), German Aerospace Center, 82234 Weßling, Germany, and is available and defines CD as a semisupervised classifica-
also with the Signal Processing in Earth Observation, Technical University of tion problem. The proposed method employs a novel graph
Munich, 80333 Munich, Germany. construction to effectively capture multiscale spatial and spec-
Francesca Bovolo is with Fondazione Bruno Kessler, 38123 Trento, Italy
(e-mail: [email protected]). tral information. Input images are seen as a composition of
Lorenzo Bruzzone is with the Department of Information Engineering and multitemporal parcels [1]. Parcels are treated as nodes to
Computer Science, University of Trento, 38123 Trento, Italy. represent them in the form of graph. Each node is represented
Color versions of one or more of the figures in this letter are available
online at https://ieeexplore.ieee.org. by features and the spatial relationship between the nodes is
Digital Object Identifier 10.1109/LGRS.2020.2985340 captured in the form of an adjacency matrix. GCN learns
1545-598X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Universidad del Valle. Downloaded on June 21,2024 at 03:57:01 UTC from IEEE Xplore. Restrictions apply.
608 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 18, NO. 4, APRIL 2021

a model characterizing the multitemporal scene from both


labeled and abundant unlabeled data. The GCN model is used
to determine change/no-change labels from all the pixels in the
image. We tested the proposed method on a complex quasi-
urban VHR data set from Trento, Italy, which demonstrated
the effectiveness of the proposed method.
The rest of this letter is organized as follows. We detail
the proposed method in section II. Results are presented in
section III. We conclude this letter in section IV.

II. P ROPOSED A PPROACH


Let us assume that a pair of bitemporal images X t (t = 1, 2)
acquired by a VHR optical sensor are available. The image X t
consists of B spectral bands and a set of N pixels x n,t . If there
is a class transition for the pixel x n between time t = 1 and
t = 2, x n is said to be changed and unchanged otherwise.
Let us assume that change/no-change reference information
is available for a small number of M pixels scattered across
the scene. The problem statement is to classify all the N
pixels into two classes: change (c ) and no change (ωnc ). The
proposed method first segments the bitemporal images into
multilevel parcels for an effective representation of the hierar-
chical multiscale spatial context. The N p parcels captured at
the lowest resolution level are used as nodes to form a region
adjacency graph. For each node, multiscale spatial features are
computed from both the images and are used to form a feature
vector Fi (i = 1, . . . , N p ). A representative description of the
graph structure in the adjacency matrix form (Ai j ∀i, j =
1, . . . , N p ) is created by considering the spatial arrangement
of the parcels. The relationship between the graph structure
and the label is encoded using a neural network model f (F ,
A) and trained using a supervised loss L computed from
the M p nodes that have labels. We condition the function
f (·) on the adjacency matrix A that enables us to distribute
gradient information from the supervised loss and to learn
Fig. 1. Proposed GCN semisupervised training mechanism.
representation of nodes (i.e., parcels) both with and without
labels. The proposed CD framework is shown in Fig. 1. to be homogeneous simultaneously in X 1 and X 2 . This enables
us to account for homogeneity both in space and time and thus
A. Parcel Segmentation
for the possible changes between two acquisitions.
The richness of information contained at the pixel level for Multitemporal parcels for the higher levels (k = 2, . . . , K )
VHR data is not adequate given their high spatial resolution [8] are obtained by merging the adjacent parcels as described
and thus requires region-/object-level processing. To capture in [1]. The multilevel parcels follow maximality property
spatial information, the proposed method employs multitem- [1] as higher the value of k, larger the size of the resulting
poral parcels [1] as a minimal spatiotemporal processing unit parcels, and a parcel at level k is entirely contained within
instead of pixels. To characterize the hierarchical multiscale a parcel at level k  (k  > k) [1]. The parcel segmentation
representation, multitemporal parcels are employed at different captures the multiscale spatial features of the bitemporal
resolution levels k(k = 1, . . . , K ). To obtain multitemporal images. The parcel segmentation process is shown in the blue
parcels at level k = 1, superpixel segmentation is used to dashed box in Fig. 1.
capture high spatial correlation of pixels from a single-time
image. Superpixels are small connected groups of pixels that B. Retrieving Label Information for Parcel
are perceptually similar and provide a rich regional geometry Considering that the reference change information is avail-
information. For superpixel segmentation, we use the algo- able for M pixels, this information is propagated along
rithm proposed in [10] that is an effective graph-based image the hierarchical multiscale representation to obtain reference
superpixel segmentation algorithm. Given X t , we segment change information for M p parcels (at resolution level k = 1),
sp
it into Nt superpixel segments. Following that, superpixel where M p ≤ M. Out of N p parcels, we assume that M p
segments from X 1 and X 2 are combined through a logical join parcels have some labeled pixels inside them. The reference
operation to obtain the multitemporal parcel map consisting of label for that parcel is estimated by a majority voting among
N p parcels. Thus, the multitemporal parcels have the property the available labeled pixels in it.

Authorized licensed use limited to: Universidad del Valle. Downloaded on June 21,2024 at 03:57:01 UTC from IEEE Xplore. Restrictions apply.
SAHA et al.: SEMISUPERVISED CD USING GCN 609

C. Graph Construction
The N p parcels obtained at resolution level k = 1 are treated
as nodes, and we generate the graph-based representation by
computing features for each node and representing the graph
using an adjacency matrix. The parcel segmentation at level
k = 2, . . . , K is used to compute multiscale feature of the
nodes only.
To extract feature vectors Fi representing each node i ,
we exploit multilevel ad hoc features that capture both spa-
tial information and spectral information characterizing the
node and its neighborhood. To compute the feature value
for a node i , four features (mean, maximum, and minimum
spectral values and area) are computed bandwise from the
corresponding parcel at a resolution level k = 1. To capture
multilevel information, the abovementioned four features are
calculated for resolution level k = 2, . . . , K from the parcels
in which the considered node is contained in at those levels.
Aggregating four features from all K levels and B bands of
X 1 and X 2 , a node i is represented by a feature vector Fi of
size: D = 4 × K × B × 2.
Given the parcel segmentation, we aim to obtain an adja-
cency matrix A where the elements Ai j will capture the
interaction between two parcels: 1) spatial distance and
2) spectral closeness. First, if two parcel i and j do not share
any adjacent pixel, their value in the adjacency matrix Ai j
is set to 0. The diagonal elements Aii are also set to 0. For
the set of parcel i and j that are adjacent, we measure the
feature closeness between two parcels i and j by considering
the distance between the feature components as
 
1  Fid − F dj 
D
Ai j = 1 −    . (1)
D d=1 Fid  + F dj 
This step ensures that the parcels i and j (i = j ) are in
spatially adjacent region and share a similar feature value.
The abovementioned procedure is a novel graph construc-
tion mechanism that combines the spatial, spectral, and tem-
poral information of VHR image pairs in the N p parcels. F
and A represent a compact and effective way to provide the Fig. 2. CD results for the proposed method by varying (a) I, (b) dropout,
(c) neurons, and (d) training samples.
spatial, spectral, and temporal information as input to the GCN
model. GCNs can be categorized into two types: spectral and spatial
based. In this letter, we employ a spectral-based approach
D. Semisupervised GCN Model that defines graph convolution by exploiting filters from graph
signal processing [11]. We consider multilayer GCN consisting
The GCN model is inspired from [9]. The GCN model
of L convolutional layers comprising of β neurons each.
training takes advantage of the proposed novel structure of
Rectified Linear Unit (ReLU) activation functions are used
the spatial, spectral, and temporal information proposed in
after every convolutional layer. The matrix of activation in
Section II-A–II-C: 1) unlabeled information from the whole
the layer l = 1, . . . , L of the GCN is H l , where H 0 denotes
scene which is coded in terms of Fi (i = 1, . . . , N p ) and
F , i.e., the set of features of the input graph. We define an
A and 2) spatial-context information in form of M p labeled
adjusted adjacency matrix à that is the adjacency matrix with
parcels/nodes. GCN exploits the relation between nodes and
added self-connections. It is obtained by adding an identity
enables the labels to propagate among neighboring nodes
matrix to A:
iteratively. GCNs extend the concept of convolution from
traditional grid-based data to graph data by learning a function à = A + I N p . (2)
f (·) that effectively generates the representation of a node
We further define a matrix D̃ that is a diagonal matrix obtained
i by taking into account its own feature vector Fi along
by summing each column of Ã
with the feature vectors F j of the neighbors j . A striking 
difference from grid-based convolution is that in graph-based D̃ii = Ãi j . (3)
convolution, the number of neighbors of a node is not fixed. j

Authorized licensed use limited to: Universidad del Valle. Downloaded on June 21,2024 at 03:57:01 UTC from IEEE Xplore. Restrictions apply.
610 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 18, NO. 4, APRIL 2021

Fig. 3. CD for Trento data set. (a) Prechange image (RGB). (b) Postchange image (RGB). (c) Reference change map. CD map. (d) Proposed method.
(e) Semisupervised perceptron [7]. (f) DCVA [2].

Motivated by the first-order approximation of localized spec- across other unlabeled adjacent nodes. After I iterations,
tral filters in graphs, the layerwise propagation rule from layer the trained model is used to predict labels for all N p nodes.
l to layer l + 1 can be stated as [9] Considering that parcels are homogeneous, the predicted label
  for a parcel is used to infer labels for pixels in that parcel and
H l+1 = σ D̃−0.5 ÃD̃−0.5 H l W l (4)
thus obtain labels for all N pixels in the X 1 , X 2 pair.
where σ denotes the ReLU activation function and W l denotes III. E XPERIMENTAL VALIDATION
the learnable weight matrix corresponding to layer l. For experimental validation (Fig. 2), we chose a bitemporal
Since H 0 = F , we can use it to compute H 1 and similarly image pair from Trento, Italy. The images show a com-
all values up to H L . Since A = D̃−0.5 ÃD̃−0.5 depends on the plex quasi-urban area containing patches of vegetation, river,
adjacency matrix only, it is calculated as a preprocessing step bushes, and Alpine mountains. They contain urban vertical
and reused during weight propagation across convolutional structures of different heights. The image pair was acquired
layers. Considering L layer GCN, the proposed forward model in August 2012 [see Fig. 3(a)] and September 2013 [see
takes the following form [9]: Fig. 3(b)]. They have 0.5-m/pixel resolution and a size of 2100
      
f (F , A) = softmax A  . . . σ A σ AF
 W 0 W 1 . . . W L−1 × 4200 pixels. Pleiades sensor has four spectral bands in the
spectral range 450–900 nm. A reference change map is shown
(5) in Fig. 3(c).
where W 0 denotes the weight matrix from input to first hidden The number of nodes N p obtained from the images is
layer and W L−1 denotes weight matrix from last hidden layer 35 847. To capture multilevel features, we use K = 4 and thus
to the output. The training process is made further stochastic obtain a feature vector of dimension D = 128 corresponding
and less prone to overfitting by introducing dropout [12] that to each node. We performed the experiments by keeping
randomly drops some of the β neurons during each iteration. the number of labeled parcels M p as 200 (200  35847),
The proposed novel mechanism uses all N p nodes that including 100 parcels from change areas and the same from
include multitemporal parcels without label information. Con- unchanged areas. We chose the labeled pixels scattered in
sidering our semisupervised learning setting, cross-entropy a way such that each M p contains only one labeled pixel.
loss function [9] is used to calculate Lm (∀m ∈ M p ) and We set the number of hidden layers of the GCN, L = 2.
overall loss L is computed by summing over M p labeled We present the results in terms of sensitivity (accuracy of
parcels, that is detecting changed pixels), specificity (accuracy of detecting
Mp
unchanged pixels), and overall accuracy [2].
L= Lm . (6)
m=1 A. Robustness Analysis
Given L, the network weights W , W , . . . , W0 1
are L−1
We examined the effect of varying: 1) training iterations I;
adjusted using gradient descent over iterations I. In this way, 2) dropout fraction; 3) hidden layer neurons β; and 4) training
the gradient information from the labeled nodes is spread samples M p .

Authorized licensed use limited to: Universidad del Valle. Downloaded on June 21,2024 at 03:57:01 UTC from IEEE Xplore. Restrictions apply.
SAHA et al.: SEMISUPERVISED CD USING GCN 611

TABLE I using multilevel parcel maps. A novel graph construction


CD R ESULTS C OMPARED W ITH THE S TATE - OF - THE -A RT M ETHODS strategy is used to process the parcels to form a graph
representation that is processed through GCN. GCN optimizes
its loss function based on the labeled parcels only. The iterative
training process helps to propagate the label information from
labeled nodes to unlabeled ones, which allows us to detect
change in the unlabeled data. The proposed method relies only
on the analyzed bitemporal scene and does not need other
data sets and/or of pretrained networks. The method clearly
We varied the number of training iterations from 100 to
outperforms the state-of-the-art semisupervised methods. The
700 in steps of 100 while fixing dropout = 0.4, β = 16,
result is promising even when compared to DCVA [2] based on
and M p = 200. The improvement in performance is steep in
transfer learning. The method works under an interesting semi-
beginning, and then, it tends to become stable [see Fig. 2(a)].
supervised scenario that draws a practical borderline between
We varied the dropout fraction from 0 to 0.8 in steps
supervised setting (that is difficult in multitemporal analysis
of 0.2 while fixing I = 500, β = 16, and M p = 200.
due to the lack of training data) and unsupervised setting
A dropout of 0.2 provides the best sensitivity and a dropout
(that completely ignores the possibility of acquiring a minimal
of 0.6 provides the best specificity. This shows that dropout
number of training samples). In the future, we plan to extend
helps to learn a more robust classifier. However, sensitivity
the method for other sensors [e.g., VHR Synthetic Aperture
declines rapidly while increasing dropout beyond 0.4 [see
Radar (SAR)] and explore the possibility of distinguishing
Fig. 2(b)].
different kinds of change.
We varied neurons (β) from 8 to 32 in a multiplicative step
of 2 by fixing I = 500, dropout as 0.4, and M p = 200. The
R EFERENCES
result shows the improvement for β = 16 from β = 8, which
does not show significant difference for β = 32 [see Fig. 2(c)]. [1] F. Bovolo, “A multilevel parcel-based approach to change detection in
very high resolution multitemporal images,” IEEE Geosci. Remote Sens.
We varied the number of labeled samples (M p ) from 40 to Lett., vol. 6, no. 1, pp. 33–37, Jan. 2009.
280 in steps of 80. Increment in specificity is steep up to 200 [2] S. Saha, F. Bovolo, and L. Bruzzone, “Unsupervised deep change vector
and the increment rate diminishes after that [see Fig. 2(d)]. analysis for multiple-change detection in VHR images,” IEEE Trans.
Geosci. Remote Sens., vol. 57, no. 6, pp. 3677–3693, Jun. 2019.
[3] F. Liu, L. Jiao, X. Tang, S. Yang, W. Ma, and B. Hou, “Local restricted
convolutional neural network for change detection in polarimetric SAR
B. Comparison With State-of-the-Art Methods images,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 3,
For M p = 200, β = 16, and a dropout of 0.4, the pro- pp. 818–833, Mar. 2019.
[4] F. Bovolo, L. Bruzzone, and M. Marconcini, “A novel approach to
posed method obtains a sensitivity of 76.41%, a specificity of unsupervised change detection based on a semisupervised SVM and a
93.32%, and an accuracy of 92.76% using I = 600. GCN similarity measure,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 7,
requires 93 s for this setting using a machine having GPU pp. 2070–2082, Jul. 2008.
[5] M. Roy, S. Ghosh, and A. Ghosh, “Change detection in remotely sensed
NVidia Geforce GTX 1080 Ti and Intel I7 CPU(3.2 GHz). images using semi-supervised clustering algorithms,” Int. J. Knowl. Eng.
The proposed method is able to demarcate the object geome- Soft Data Paradigms, vol. 4, no. 2, pp. 118–137, 2013.
tries, as shown in Fig. 3(d). We compare our results with [6] D. Sinh Mai and L. Thanh Ngo, “Semi-supervised fuzzy C-means
clustering for change detection from multispectral satellite image,” in
the state-of-the-art semisupervised perceptron-based [7], fuzzy Proc. IEEE Int. Conf. Fuzzy Syst. (FUZZ-IEEE), Aug. 2015, pp. 1–8.
C-means (FCM) clustering [6], S3 VM [4], and autoencoder [7] S. Patra, S. Ghosh, and A. Ghosh, “Semi-supervised learning with
[13] methods and unsupervised transfer learning-based DCVA multilayer perceptron for detecting changes of remote sensing images,”
in Proc. Int. Conf. Pattern Recognit. Mach. Intell. Berlin, Germany:
method [2]. For a fair comparison, the same number of Springer, 2007, pp. 161–168.
labeled samples and the same context-based multiscale fea- [8] U. Chaudhuri, B. Banerjee, and A. Bhattacharya, “Siamese graph
ture as the proposed method are used in the state-of-the-art convolutional network for content based remote sensing image retrieval,”
Comput. Vis. Image Understand., vol. 184, pp. 22–30, Jul. 2019.
semisupervised perceptron-based method [7] [see Fig. 3(e)], [9] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
FCM clustering [6], and S3 VM [4] method. The proposed convolutional networks,” 2016, arXiv:1609.02907. [Online]. Available:
method clearly outperforms the semisupervised methods, as http://arxiv.org/abs/1609.02907
[10] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based
shown in Table I. Benefiting from using a pretrained network image segmentation,” Int. J. Comput. Vis., vol. 59, no. 2, pp. 167–181,
trained on a large-scale segmentation data set, DCVA [see Sep. 2004.
Fig. 3(f)] slightly outperforms the proposed method in terms of [11] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and
P. Vandergheynst, “The emerging field of signal processing on graphs:
specificity, whereas the proposed method largely outperforms Extending high-dimensional data analysis to networks and other irregular
DCVA in sensitivity (28.05%) and accuracy (0.39%). domains,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98,
May 2013.
[12] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-
IV. C ONCLUSION dinov, “Dropout: A simple way to prevent neural networks from over-
In this letter, we presented a GCN-based semisupervised fitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.
[13] D. Ienco and R. G. Pensa, “Semi-supervised clustering with multires-
method for CD in VHR multispectral images. The proposed olution autoencoders,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN),
method effectively encodes the multiscale spatial information Jul. 2018, pp. 1–8.

Authorized licensed use limited to: Universidad del Valle. Downloaded on June 21,2024 at 03:57:01 UTC from IEEE Xplore. Restrictions apply.

You might also like