Change_Detection_in_Hyperdimensional_Images_Using_Untrained_Models
Change_Detection_in_Hyperdimensional_Images_Using_Untrained_Models
Abstract—Deep transfer-learning-based change detection meth- Index Terms—Change detection (CD), deep image prior, deep
ods are dependent on the availability of sensor-specific pretrained learning, hyperdimensional images, hyperspectral images.
feature extractors. Such feature extractors are not always available
due to lack of training data, especially for hyperspectral sensors and I. INTRODUCTION
other hyperdimensional images. Moreover models trained on easily
available multispectral (RGB/RGB-NIR) images cannot be reused ECENTLY deep learning has attracted significant at-
on such hyperdimensional images due to their irregular number
of bands. While hyperdimensional images show large number of
spectral bands, they generally show much less spatial complexity,
R tention in earth observation [1]. Following this trend,
deep-learning-based methods have been developed for change
thus reducing the requirement of large receptive fields of convo- detection (CD) [2], an important topic in earth observation.
lution filters. Recent works in the computer vision have shown CD plays pivotal role in several applications, including disas-
that even untrained deep models can yield remarkable result in ter management [3], [4], urban monitoring [5], and precision
some tasks like super-resolution and surface reconstruction. This agriculture [6]. While CD methods can be supervised [7]–[9] or
motivates us to make a bold proposition that untrained lightweight
deep model, initialized with some weight initialization strategy,
semisupervised [5], unsupervised methods are preferred in the
can be used to extract useful semantic features from bi-temporal literature [2], [10] as collecting labeled multitemporal data is
hyperdimensional images. Based on this proposition, we design a significantly challenging. Before the emergence of deep learn-
novel change detection framework for hyperdimensional images ing, change vector analysis (CVA) and its object-based vari-
by extracting bitemporal features using an untrained model and ants [10], [11] were popularly used for unsupervised CD. Deep
further comparing the extracted features using deep change vector
analysis to distinguish changed pixels from the unchanged ones. We
CVA (DCVA) and other transfer-learning-based methods [2],
further use the deep change hypervectors to cluster the changed [3], [12] have embedded the concept of CVA in a transfer
pixels into different semantic groups. We conduct experiments on learning framework. While the transfer-learning-based methods
four change detection datasets: three hyperspectral datasets and a do not use any training or fine-tuning of the deep model, they
hyperdimensional polarimetric synthetic aperture radar dataset. depend on the availability of pretrained feature extractor that
The results clearly demonstrate that the proposed method is suit-
able for change detection in hyperdimensional remote sensing data.
can be used to capture the semantics of the input images. In
more details, such transfer-learning-based methods project the
bitemporal images in deep featurespace by using a pretrained
deep feature extractor and subsequently compares the images
Manuscript received July 11, 2021; revised August 31, 2021 and September in the projected domain. Thus they perform CD by reusing a
23, 2021; accepted October 9, 2021. Date of publication October 20, 2021; deep model that was previously trained for some unrelated task,
date of current version November 10, 2021. The work was supported in part e.g., image classification. Most deep transfer learning based
by the German Federal Ministry of Education, and Research (BMBF) in the
framework of the International Future AI lab “AI4EO – Artificial Intelligence CD methods are designed for synthetic aperture radar (SAR)
for Earth Observation: Reasoning, Uncertainties, Ethics, and Beyond” under amplitude images and multispectral images with few bands.
Grant number: 01DD20001, in part by the European Research Council (ERC) Remote sensing deals with a plethora of sensors showing
under the European Union’s Horizon 2020 research, and innovation programme
under Grant ERC-2016-StG-714087, Acronym: So2Sat, in part by the Helmholtz different spatial, spectral, and temporal characteristics. In many
Association through the Framework of Helmholtz AI under Grant ZT-I-PF-5-01 cases, large number of bands are required to efficiently represent
- Local Unit “Munich Unit @Aeronautics, Space, and Transport (MASTr),” and the information in remote sensing images. The most well-known
Helmholtz Excellent Professorship “Data Science in Earth Observation - Big
Data Fusion for Urban Research” under Grant W2-W3-100. (Corresponding example for this are hyperspectral images that sample a broad
author: Xiao Xiang Zhu.) range of electromagnetic spectrum in hundreds of spectral
Sudipan Saha is with the Department of Aerospace, and Geodesy, Data Sci- bands [13]–[17]. Some CD applications require rich spectral
ence in Earth Observation, Technical University of Munich, 85521 Ottobrunn,
Germany (e-mail: [email protected]). information and hyperspectral images can be very useful for
Lukas Kondmann and Qian Song are with the Remote Sensing Technology such cases, e.g., monitoring of mining activity [18]. Inspite
Institute, German Aerospace Center (DLR), 82234 Weßling, Germany (e-mail: of this, less attention has been paid to develop deep transfer
[email protected]; [email protected]).
Xiao Xiang Zhu is with the Remote Sensing Technology Institute (IMF), learning based CD methods for hyperspectral images [19], [20].
German Aerospace Center (DLR), 82234 Weßling, Germany and also with This can be attributed to the lack of labeled hyperspectral data
the Department of Aerospace and Geodesy, Data Science in Earth Observation that impedes availability of any pretrained network. In more
(SiPEO, former: Signal Processing in Earth Observation), Technical University
of Munich, 85521 Ottobrunn, Germany (e-mail: [email protected]). details, a transfer-learning-based hyperspectral CD method can
Digital Object Identifier 10.1109/JSTARS.2021.3121556 be developed only if a pretrained model is available for the
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
11030 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021
TABLE I images without any training. This is highly relevant for hyper-
NUMBER OF BANDS AND GROUND SAMPLING DISTANCE (GSD) FOR
SOME SPACEBORN HYPERSPECTRAL SENSORS [13]
dimensional images since it is challenging to transfer a model
trained on RGB images to hyperdimensional images, however,
it is trivial to just initialize a model to ingest as many number
of image channels as desired. This strategy is certainly not as
good as learning complex spatial features with abundant labeled
images, however, good enough for CD in hyperdimensional
images. Arguably, the spatial complexity of hyperdimensional
images is not high in most cases, as can be seen in Table I. This
is also evident from the fact that some works in the hyperspectral
image classification just use 1D convolution [27]. While spatial
complexity still has an important role to play for hyperspectral
multitemporal analysis, we argue that this is not as critical as
in high-resolution multispectral images. This brings forth the
same data, which is often unavailable for hyperspectral im- possibility whether complexity in low-spatial and high-spectral
ages. Remarkably, due to the lack of training data, some of resolution multitemporal hyperdimensional images can be cap-
the supervised hyperspectral image classification models are tured by an untrained deep model, merely initialized with a
trained and tested on pixels from the same image [21]. Even if deep model initialization strategy [28], [29]. The likelihood of
sufficient training data is collected for a particular hyperspectral such possibility is supported by the fact that untrained models
sensor and geography, this model will not be straightforward have recently shown remarkable performance in some computer
applicable for another hyperspectral sensor. Currently there are vision tasks where the spatial complexity is much more critical
a large number of hyperspectral sensors with differences in than the hyperspectral images, e.g., deep image prior [26].
spectral coverage and number of bands, e.g., DLR earth sensing We propose an unsupervised CD method for hyperdimen-
imaging spectrometer (DESIS) have 180 bands while precur- sional images using an untrained deep model as deep fea-
sore iperspettrale della missione applicativa (PRISMA) have ture extractor. The proposed method does not need any prior
237 bands [13]. Please see Table I for comparison of number knowledge about the input or the arrangement of the spectral
of bands of different spaceborn hyperspectral sensors. Due to bands. In addition to distinguishing the changed pixels from
such differences, a model trained for one hyperspectral sensor the unchanged ones (binary CD), we also extend the method for
cannot be used for transfer-learning-based CD on another hyper- multiple CD. The key contributions of this article are as follows.
spectral sensor. Additionally, unmanned-aerial-vehicle (UAV) 1) This article shows that even an untrained model, merely
based hyperspectral imaging has become increasingly popular in initialized with a weight initialization technique [28], can
various applications, such as agricultural monitoring [13]. Such be used to capture the spatio-temporal semantics, espe-
UAV-based hyperspectral sensors may exhibit spectral coverage cially for hyperdimensional data where pretrained models
entirely different from the satellite-based ones. are generally not available. Based on this, this article
In addition to hyperspectral data, another example of hyper- proposes a CD method, which can effectively segregate
dimensional data in remote sensing is polarimetric synthetic changed pixels from the unchanged ones in the hyperdi-
aperture radar (PolSAR) image. Compared with the single- mensional images.
polarimetric SAR data, PolSAR images contain more polarimet- 2) This article further extends the method for multiple/ mul-
ric information about the targets and are useful to discriminate ticlass CD using deep change vector obtained using un-
double-bounce scatterers (such as buildings) from volume scat- trained model to cluster the changed pixels into different
terers (such as forest) and surfaces using target decomposition groups.
methods [22]. Thus, PolSAR data are beneficial for applications 3) This article experimentally validates the proposed ap-
such as land classification and building extraction [23]. In prac- proach on three bitemporal hyperspectral scenes, as well
tical PolSAR applications, usually the decomposed results [23]– on a bitemporal hyperdimensional PolSAR data, showing
[25] instead of the raw PolSAR data are used for further analysis, the versatility of the approach.
which constitutes a hyper-dimensional (tens to over one hundred The rest of this article is organized as follows. Some relevant
channels) data cuboid. works are discussed in Section II. Section III discusses the
Models trained for multispectral (RGB/RGB-NIR) or SAR proposed method. Section IV presents the datasets and results
amplitude images cannot be effectively reused for feature extrac- related to hyperspectral images. Results related to PolSAR data
tion of hyperdimensional images due to their irregular number are presented in Section V. Finally, Section VI concludes this
of bands. To transfer RGB-trained models on hyperdimensional article.
images, we require to choose only three bands from hyperdimen-
sional images, thus losing a significant amount of information.
Another possible solution is to somehow modify the first layer II. RELATED WORK
of the pretrained model. Following the relevance to our work, we briefly discuss in this
Ulyanov et al. [26] showed that the structure of a network is section about:
often sufficient to capture important low-level features from the 1) unsupervised CD;
SAHA et al.: CHANGE DETECTION IN HYPERDIMENSIONAL IMAGES USING UNTRAINED MODELS 11031
2) hyperdimensional CD methods; and can overfit on the training images even when the labels are
3) deep image prior. randomized. This provides us hints that the success of the deep
network is possibly not always due to large amount of labeled
A. Unsupervised CD data, rather sometimes due to the structure of the network. Fur-
ther delving into this topic, Ulyanov et al. [26] investigated this
Unsupervised CD methods are generally based on the con- phenomenon in context of image generation. They showed that a
cept of pixewise difference operation, i.e., CVA [30] or clus- large amount of the image statistics are captured by the structure
tering [31]. With the emergence of high-resolution imaging, of generator CNNs itself. Instead of choosing the usual paradigm
object-based variants of CVA, e.g., parcel change vector analysis of training CNNs on large dataset, they fitted CNNs on single
(PCVA) [11], incorporated the notion of spatial context in CVA. image for image restoration problems. The network weights
Morphological filters have also been employed to capture the were randomly initialized. Their simple setup could provide
object information [32]. Deep-learning-based unsupervised CD remarkable result for various image restoration problems, e.g.,
methods, e.g., DCVA [2] are based on transfer learning. DCVA denoising and super-resolution. This phenomenon is remarkable
incorporates CVA with pretrained deep network based feature as it demonstrates the power of untrained network. Following
extraction based on the assumption that a pretrained model is this work, several other works have followed similar approach
available for the target geography and sensor. In addition to demonstrating success of untrained network for different com-
optical images, transfer-learning-based frameworks have also puter vision problems, including surface reconstruction [39] and
shown success in SAR amplitude image analysis [3]. photo manipulation [40]. Another similar line of research is
random projection network [41] that is proposed in the context
B. CD in Hyperdimensional Images of high-dimensional data, which implies a network architecture
Very few deep-learning-based CD methods have been pro- with an input layer that has a huge number of weights, making
posed for hyperdimensional (hyperspectral or other hyperdi- training infeasible. Random projection network [41] tackles this
mensional) images [33]–[35]. In [33], authors identified high challenge by prepending the network with an input layer whose
dimension and limited datasets as unique challenges for hyper- weights are initialized with a random projection matrix.
spectral CD. Toward alleviating these challenges, they devised a
preclassification-based end-to-end CD framework. Another su- III. PROPOSED METHOD
pervised framework recurrent 3-D fully convolutional network Let us assume that we have a pair of coregistered hyperdimen-
(Re3FCN) was introduced by Song et al. [35]. Re3FCN merges sional images X1 and X2 having B0 bands, where B0 is much
a 3-D fully convolutional network (FCN) and a convolutional larger than usual number of bands in a multispectral image. No
long short-term memory. Chen and Zhou [36] proposed a su- training label or suitable pretrained network is available to us.
pervised CD method consisting of the following three steps: Our goal is two fold.
reduction of spectral dimension, joint affinity tensor construc- 1) Binary CD: Distinguish the changed pixels (Ωc ) from the
tion, and binary (changed or unchanged) classification by CNN. unchanged ones (ωnc ).
While these works successfully introduce deep learning to the 2) Multiple CD: Further cluster the changed pixels into a
hyperspectral CD, they do not present any unique solution group of semantically meaningful groups.
toward circumventing the limited availability of datasets in To accomplish the abovementioned goals, we initialize a deep
hyperspectral multitemporal analysis. Their works use pixels model with number of input channels and kernels in intermediate
from same image for training and evaluation. Using such large layers modulated according to the dimension of the X1 and X2 .
supervised networks when training and test pixels belong to This deep model, while untrained, is initialized with an appro-
same scene may lead to overoptimistic accuracy assessment, as priate weight initialization technique [28]. Following this, we
shown by Molinier and Kilpi [37]. Thus, it is crucial to design use this network to extract a set of features from the bitemporal
unsupervised/transfer-learning-based approaches, like the ones images. Pixelwise difference is obtained as deep change vector
proposed for multispectral and SAR images [2], [3]. In addition that is thresholded to identify the changed pixels. Once changed
to hyperspectral images, hyperdimensional CD has also been pixels are segregated, they are further clustered based on the
studied in the context of PolSAR images [24]. To the best of deep change vectors for multiple CD. The proposed hyperdi-
authors’ knowledge, all deep-learning-based hyperdimensional mensional CD framework is called untrained hyperdimensional
CD methods are proposed in context of binary CD, without multiple DCVA (UHM-DCVA) and is shown in Fig. 1 .
delving into multiple/multiclass CD.
A. Feature Extraction
C. Deep Image Prior Deep models trained for multispectral images can ingest input
Deep models are generally trained on large labeled datasets. images of few channels/bands, in order of three to ten [42], [43].
This makes us to believe that the excellent performance of CNNs In contrast, hyperdimensional remote sensing images have B0
are due to their capability to learn realistic features or data priors channels that is generally larger than 200. Thus, deep models
from the data. However, several recent works have shown that trained on multispectral images are not suitable to ingest hyper-
this explanation is not entirely correct. In one of such first works, dimensional X1 and X2 . To overcome this challenge, we use an
Zhang et al. [38] showed that an image classification network untrained model for deep feature extraction from X1 and X2 .
11032 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021
TABLE II
KEY STRUCTURE OF FIVE-LAYER UNTRAINED FEATURE EXTRACTOR
NETWORK ASSUMING NUMBER OF CHANNELS IN INPUT IMAGE IS 224
ρ maps the D-dimensional G into a 1-D index, while preserving IV. VALIDATION ON HYPERSPECTRAL DATA
the main properties of the changes. Unchanged pixels tend to
A. Datasets
generate smaller ρ in comparison to the changed pixels. This
is used to segregate Ωc and ωnc by using a thresholding τ . We validate the proposed method on the following three
While any suitable thresholding [44] method can be used, we publicly available bitemporal hyperspectral scenes [49], [50].1
use Otsu’s thresholding [45] to compute τ . Any pixel having 1) The Santa Barbara bitemporal scene is acquired on 2013
ρ > τ is assigned to Ωc and to ωnc otherwise. [see Fig. 3(a)] and 2014 [see Fig. 3(b)] with the AVIRIS
sensor (224 spectral bands) over the Santa Barbara region
C. Multiple CD in California, United States. The spatial dimension of the
images are 984 × 740 pixels. Reference information is
Changed pixels (Ωc ) are further analyzed in unsupervised way known for only 132 552 pixels, out of which 80 418 pix-
based on G to segregate different kinds of change without any els are unchanged and 52 134 pixels are changed [see
a priori knowledge about the different kinds of change [2]. Fig. 3(c)].
However, we assume an a priori knowledge about number 2) The Bay Area bitemporal scene is acquired on 2013 [see
of kinds of change (K). G is a high-dimensional vector and Fig. 5(a)] and 2015 [see Fig. 5(b)] with the AVIRIS sensor
clustering is challenging in such high-dimensional space [46]. (224 spectral bands) over the area surrounding the city of
To overcome this, we first binarize/discretize the components of Patterson (California). The spatial dimension of the im-
G [2], [47]. Components of G are likely to be either positive ages are 500 × 500 pixels. Reference information is known
or negative, and different kinds of change are likely to show for only 60 610 pixels, out of which 29 393 pixels are
different patterns on the g d (d = 1, . . ., D), components of G. unchanged and 31 217 pixels are changed [see Fig. 5(c)].
Binarization simplifies the information in G, while preserving 3) The Hermiston scene [see Figs. 6(a) and (b)] is acquired
information descriptive of clusters. G is binarized to Gbin with on the years 2004 and 2007 with the Hyperion sensor
components greater than 0 set to 1 and components smaller than (242 spectral bands) over the Hermiston City area in
0 set to 0. Gbin is also D-dimensional like G. Oregon, United States. Bands B001–B007, B058–B076,
Assuming number of changed pixels (pixels in Ωc ) as Nc , and B225–242 are not calibrated, hence, we exclude them
we have Nc binary vectors of D-dimension each. Conversely, from our processing. The spatial dimension of the images
representing each feature as a vector, we have D vectors of are 390 × 200 pixels. A total of 68 014 pixels are labeled as
Nc -dimension each. We expect pixels belonging to same kind unchanged. Remaining pixels are changed [see Fig. 6(c)].
of change to exhibit similar binary signature, while pixels The changed pixels are further grouped into 5 change
belonging to different kinds of change to exhibit dissimilar types: type 1 (5558 pixels), type 2 (1331 pixels), type 3
binary signature. Furthermore, many features exhibit similar (79 pixels), type 4 (1557 pixels), and type 5 (1461 pixels),
binary signature and, thus, redundant for discriminating different shown in Fig. 7(a).
types of change. Out of D features, the feature which shows Please note the following.
most similarity to other D − 1 features can be defined as the 1) For Santa Barbara and Bay Area scene, reference informa-
most informative feature. Toward this, R(i, j) measures the tion is not known for a fraction of pixels. However, these
correlation distance [48] between two Nc -dimensional features datasets are not prepared by us and are publicly available
i and j, scaled in range 0–1 [2], where 1 represents the farthest datasets used in previous research works [49], [50]. Hence,
features. Rd (d (d = 1, . . ., D)) measures the informativeness we follow the reference maps available with those datasets.
of an individual feature 2) We evaluate binary CD method on all three scenes, how-
D ever, multiple/multiclass CD method on only Hermiston
Rd = − R(d, j). (2) scene, as multiple change reference map is available for
j=1 only this scene.
In the abovementioned equation, while the term within summa- B. Compared Methods
tion computes distance of a feature from other features, coupled
with the negation, Rd measures how similar is the feature d to We compared the proposed method to following unsupervised
the other D − 1 features. The most informative feature d∗ is methods.
selected by choosing the feature that maximizes Rd 1) CVA using the hyperdimensional pixel values. The com-
parison to CVA is crucial to understand whether the pro-
d∗ = arg maxd Rd . (3) posed method provides any additional benefit over mere
pixel difference.
Chosen d∗ can be used to group pixels in Ωc into two classes. 2) PCVA [11] that captures the spatial information as super-
Next most informative feature can be selected by following pixel. This comparison helps us to understand whether
the abovementioned process, but first discarding the most in- spatio-temporal context in hyperdimensional images can
formative feature d∗ and features made redundant by it. This be simply captured by a superpixel-based analysis.
hierarchical process allows us to select a set of informative
features that are further used to cluster Ωc into desired number 1 [Online]. Available: https://citius.usc.es/investigacion/datasets/hyperspectral-
of classes ωc1 , ωc2 , . . ., ωcK . change-detection-dataset
11034 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021
Fig. 3. Santa Barbara scene, false color composition (R: band 50, G: band 20, B: band 10) images: (a) prechange and (b) postchange, (c) reference image
(white—unchanged, black—changed, gray—unknown), and CD maps: (d) CVA, (e) DCVA3Channels-1, (f) Proposed.
SAHA et al.: CHANGE DETECTION IN HYPERDIMENSIONAL IMAGES USING UNTRAINED MODELS 11035
Fig. 4. Visualization of two randomly selected features, as generated by the proposed model, on the Santa Barbara scene. It is evident that the features capture
the change information.
3) Spectral angle mapper Z-score image differencing 7) A variant of the proposed method using dilated convo-
(SAMZID) [19] that is designed specifically for hyper- lutional layers (dilation set as 3) to understand whether
spectral CD based on spectral angle mapper and image the proposed method can benefit from the larger receptive
difference. The method, as proposed in [19] originally field.
consists of an unsupervised predictor phase and a super- 8) A 1D variant of the proposed method using 1×1 kernels in-
vised learning phase. We exclude the supervised phase and stead of 3×3 kernels. This helps us to understand whether
apply thresholding [45] on the change map obtained after both the spatial context/spectral information contributed
unsupervised predictor phase. As proposed in [19], two to the CD result.
variants are compared: SAMZIDSin and SAMZIDTan . The first two compared methods are from classical CD
4) Autoencoding of bitemporal Hyperspectral Images for literature. The third and fourth methods are from hyper-
Change Vector Analysis (AICA) [34]—A deep-learning- spectral CD literature that specifically exploit properties
based unsupervised CD method proposed for hyperspec- unique to hyperspectral images. The following two meth-
tral images that combines CVA with autoencoder-based ods are based on deep transfer learning. The proposed
training. method is unsupervised, does not require any training or
5) DCVA [2] with feature extractor pretrained on even any pretrained network, thus, not compared to any
largescale computer vision dataset using VGG16/VGG19 supervised [36] or preclassification [33] based hyperspec-
architecture [42]. This comparison is important to tral CD method. The last two methods are variant of the
understand whether a simple transfer learning approach proposed method and are shown on the Santa Barbara
can be used instead of the proposed method. Pretrained scene.
VGG architecture can ingest only three channels.
So we just select three optimum (RGB) channels C. Settings and Other Details
from the hyperspectral image to feed to the network. The results are reported as average of five runs. Comparison
We use three different configurations: by using first is performed in terms of sensitivity (accuracy in percentage
convolutional layer of VGG16 (DCVA3Channels-1), computed over reference changed pixels), specificity (accuracy
second convolutional layer of VGG16 (DCVA3Channels- in percentage computed over reference unchanged pixels), and
2), and fifth convolutional layer of VGG16 overall accuracy. In more details, given true positive (TP), true
(DCVA3Channels-3). negative (TN), false positive (FP), and false negative (FN), sensi-
6) DCVA as mentioned above, however, in this case, we tivity is TP/(TP+FN), specificity is TN/(TN+FP), and accuracy
modulate the first layer of the network by replicating the is given by (TP+TN)/(TP+TN+FP+FN), all scaled by 100. For
weights as number of channels of hyperspectral images. multiple CD, kappa score is provided.
In this way, we can feed the unmodified entire hyper- We perform a number of additional experiments on the Santa
spectral images to the network. We use two different con- Barbara scene.
figurations: by using first convolutional layer of VGG16 1) For the proposed method, we use a five-layer network,
(DCVAAllChannels-1) and second convolutional layer of however, we provide a comparison of performance as
VGG16 (DCVAAllChannels-2). number of layers is changed.
11036 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021
Fig. 5. Bay Area scene: (a) prechange and (b) postchange, (c) reference image (white—unchanged, black—changed, gray—unknown), and CD maps: (d) CVA,
(e) DCVA3Channels-1, (f) Proposed.
SAHA et al.: CHANGE DETECTION IN HYPERDIMENSIONAL IMAGES USING UNTRAINED MODELS 11037
Fig. 6. Hermiston scene, false color composition (R: band 52, G: band 31, B: band 22) images: (a) prechange and (b) postchange, Reference images: (c) binary
(white—unchanged, black—changed, gray—unknown), Binary CD maps: (d) CVA, (e) DCVA3Channels-1, (f) Proposed.
TABLE III
PERFORMANCE VARIATION OF THE PROPOSED METHOD ON THE SANTA
BARBARA SCENE AS NUMBER OF LAYERS ARE VARIED
Fig. 8. Decomposed POLSAR dataset (details in [24]). CD maps: (a) reference, (b) CVA, (c) Proposed.
TABLE IV TABLE V
CD RESULTS FOR THE SANTA BARBARA SCENE VARIATION OF THE RESULT FOR SANTA BARBARA SCENE AS THRESHOLD
DETERMINATION SCHEME IS VARIED
The proposed method [see Fig. 3(f) and Table IV] clearly
outperforms all the compared methods (including its dilated
and 1-D variant), obtaining a sensitivity 87.98, specificity of
98.57, and accuracy of 94.40. This shows the superiority of
the proposed method to ingest input bitemporal images of
arbitrary dimension, which cannot be achieved with transfer
learning settings (DCVAAllChannels or DCVA3Channels). The
The proposed method’s result is reported as average of five runs. proposed model can capture the change information, which is
evident from visualization of two randomly selected features (in
deep-difference domain) in Fig. 4. Remarkably, the proposed
method’s 1-D variant that only captures spectral context outper-
We observe that both sensitivity and specificity gradually in- forms the dilated convolution based variant. This indicates that
crease up to four layers. Sensitivity increases while specificity the spectral information plays more important role on CD than
slightly decreases when five layers are used. No performance the spatial context information for the considered hyperspectal
gain is observed, rather decreases for six layers. While adding data. This also partly explains why the proposed unsupervised
more convolution layers improve the spatial receptive field of method outperforms transfer learning from models trained on
the filters and increase the complexity of the filters, considering computer vision data.
the coarse resolution of the hyperspectral images this behavior The performance of the proposed method may vary if another
saturates soon. Henceforth, we use five layers for all experiments weight initialization strategy is used instead of the He initial-
related to the proposed method. ization method [28], e.g., if Xavier weight initialization [29] is
CVA obtains a sensitivity of 76.92 and specificity of 96.69 used, the proposed method obtains a sensitivity of 80.12% and
[see Fig. 3(d)]. Remarkably, PCVA performs worser than specificity of 94.27%, which is still superior to most compared
CVA, showing that spectral and temporal complexity of hy- methods in Table IV.
perspectral bitemporal images cannot be captured by mere For thresholding the Otsu’s method [45] is used, as it is
superpixel-based representation. Being designed for hyperspec- popular in the unsupervised CD methods [2], [51]. However
tral CD, SAMZIDSin , SAMZIDTan , and AICA outperform CVA any other suitable method [52]–[55] can be used with similar
and PCVA. DCVAAllChannels-1 and DCVAAllChannels-2 are result as shown in Table V for the ISODATA method [52], [53]
outperformed by the DCVA3Channels-1 [see Fig. 3(e)] and and the Li’s method [54].
DCVA3Channels-2. This clearly shows that structure of the In Section III-A, we chose β0 as 4. In Table VI, we show
network is important. VGGNet architecture, originally proposed variation of result with different values of β0 that supports the
for 3-channel input, can work satisfactorily while ingesting choice of abovementioned value.
only 3 out of 224 spectral bands of AVIRIS sensor. However, 2) Bay Area: The Bay Area scene shows complex urban
attempting to forcefully feed the network with all bands result area along with vegetation patches. As in Santa Barbara,
in decrease in the performance. PCVA, DCVAAllChannels-1, and DCVAAllChannels-2 do not
SAHA et al.: CHANGE DETECTION IN HYPERDIMENSIONAL IMAGES USING UNTRAINED MODELS 11039
TABLE VI TABLE IX
VARIATION OF THE RESULT FOR SANTA BARBARA SCENE AS β0 IS VARIED CD RESULTS FOR SAN FRANCISCO POLSAR SCENE
TABLE VII
CD RESULTS FOR THE BAY AREA SCENE
The Proposed method’s result is reported as average of five runs.
E. Multiple CD Results
Multiple CD reference map is only available for Hermiston
scene. The reference map is shown in Fig. 7(a). Result obtained
by the proposed method, using deep features extracted using
untrained model, is shown in Fig. 7(c). It is evident that the pro-
The proposed method’s result is reported as average of five runs. posed method is able to detect the important semantic changes.
There is certainly overlap between the classes shown in blue and
TABLE VIII red. However, it is clear from Figs. 6(a) and (b) that the blue and
CD RESULTS FOR THE HERMISTON SCENE red classes represent similar semantic notion, making it difficult
for the unsupervised multiple CD method to differentiate them.
To understand whether the proposed multiple/multiclass CD
scheme benefits from using the untrained model as feature
extractor, we compare it to result obtained by using original
hyperspectral data [see Fig. 7(b)]. The proposed method is visu-
ally superior than this baseline. The proposed method obtains a
kappa of 0.80, in comparison to 0.72, obtained using the original
hyperspectral data.
bitemporal hyperdimensional images. As the feature extractor [14] Z. Huang, L. Fang, and S. Li, “Subpixel-pixel-superpixel guided fusion
model is untrained, it can be initialized with as many number of for hyperspectral anomaly detection,” IEEE Trans. Geosci. Remote Sens.,
vol. 58, no. 9, pp. 5998–6007, Sep. 2020.
input channels as desired with appropriate weight initialization [15] L. Fang, W. Zhao, N. He, and J. Zhu, “Multiscale CNNs ensemble based
technique. Moreover, the number of filters in the subsequent self-learning for hyperspectral image classification,” IEEE Geosci. Remote
layers can also be chosen in a flexible manner, as there is Sens. Lett., vol. 17, no. 9, pp. 1593–1597, Sep. 2020.
[16] S. Liu, Q. Du, X. Tong, A. Samat, H. Pan, and X. Ma, “Band
no training involved. Extensive experiments on four hyperdi- selection-based dimensionality reduction for change detection in multi-
mensional datasets show the superiority of the proposed ap- temporal hyperspectral images,” Remote Sens., vol. 9, no. 10, 2017,
proach. The proposed approach is also capable of clustering the Art. no. 1008.
[17] S. Mohla, S. Pande, B. Banerjee, and S. Chaudhuri, “FusatNet: Dual
changed pixels into semantically meaningful groups, as shown attention based spectrospatial multimodal fusion network for hyperspectral
for Hermiston dataset. While the idea seems bold and new in and lidar classification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
context of remote sensing, similar idea has been verified before Recognit. Workshops, 2020, pp. 92–93.
[18] C. Ehrler, C. Fischer, and M. Bachmann, “Hyperspectral remote sensing
in the computer vision and machine learning literature, e.g., applications in mining impact analysis,” in Proc. 34th Int. Symp. Remote
deep image prior. The proposed approach benefits from the Sens. Environ., 2011, pp. 1–4.
fact that hyperdimensional images generally exhibit less spatial [19] S. T. Seydi and M. Hasanlou, “A new land-cover match-based change
detection for hyperspectral imagery,” Eur. J. Remote Sens., vol. 50, no. 1,
complexity due to the cost of generating higher resolution in pp. 517–533, 2017.
both spectral and spatial domain. Thus, the applicability of the [20] X. Tong et al., “A novel approach for hyperspectral change detection
method to very high spatial resolution hyperdimensional sensors based on uncertain area analysis and improved transfer learning,” IEEE
J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 13, pp. 2056–2069,
may not be straightforward and will be investigated in future Apr. 2020.
work. Our future work will also investigate untrained models in [21] L. Mou, P. Ghamisi, and X. X. Zhu, “Deep recurrent neural networks for
the context of the hyperspectral image classification. As a final hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens.,
vol. 55, no. 7, pp. 3639–3655, Jul. 2017.
note, the proposed approach should not be seen as a competitor [22] S. R. Cloude and E. Pottier, “A review of target decomposition theorems
to the supervised methods, rather as a complementary to them. in radar polarimetry,” IEEE Trans. Geosci. Remote Sens., vol. 34, no. 2,
pp. 498–518, Mar. 1996.
[23] Q. Song, F. Xu, and Y.-Q. Jin, “Radar image colorization: Converting
single-polarization to fully polarimetric using deep neural networks,” IEEE
REFERENCES Access, vol. 6, pp. 1647–1661, 2017.
[1] X. X. Zhu et al., “Deep learning in remote sensing: A comprehensive [24] A. Najafi, M. Hasanlou, and V. Akbari, “Land cover changes detection
review and list of resources,” IEEE Geosci. Remote Sens. Mag., vol. 5, in polarimetric SAR data using algebra, similarity and distance based
no. 4, pp. 8–36, Dec. 2017. methods,” Int. Arch. Photogrammetry, Remote Sens. Spatial Inf. Sci.,
[2] S. Saha, F. Bovolo, and L. Bruzzone, “Unsupervised deep change vector vol. 42, pp. 195–200, 2017.
analysis for multiple-change detection in VHR images,” IEEE Trans. [25] H. Bi, J. Sun, and Z. Xu, “Unsupervised PoLSAR image classification us-
Geosci. Remote Sens., vol. 57, no. 6, pp. 3677–3693, Jun. 2019. ing discriminative clustering,” IEEE Trans. Geosci. Remote Sens., vol. 55,
[3] S. Saha, F. Bovolo, and L. Bruzzone, “Building change detection in VHR no. 6, pp. 3531–3544, Jun. 2017.
SAR images via unsupervised deep transcoding,” IEEE Trans. Geosci. [26] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proc.
Remote Sens., vol. 59, no. 3, pp. 1917–1929, Mar. 2021. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 9446–9454.
[4] L. Bergamasco, S. Saha, F. Bovolo, and L. Bruzzone, “Unsupervised [27] N. Audebert, B. Le Saux, and S. Lefèvre, “Deep learning for classification
change-detection based on convolutional-autoencoder feature extraction,” of hyperspectral data: A comparative review,” IEEE Geosci. Remote Sens.
Proc. SPIE, vol. 11155, 2019, Art. no. 1115510. Mag., vol. 7, no. 2, pp. 159–173, Jun. 2019.
[5] S. Saha, L. Mou, X. X. Zhu, F. Bovolo, and L. Bruzzone, “Semisuper- [28] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
vised change detection using graph convolutional network,” IEEE Geosci. Surpassing human-level performance on imagenet classification,” in Proc.
Remote Sens. Lett., vol. 18, no. 4, pp. 607–611, Apr. 2021. IEEE Int. Conf. Comput. Vis., 2015, pp. 1026–1034.
[6] S. Saha, Y. T. Solano-Correa, F. Bovolo, and L. Bruzzone, “Unsuper- [29] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
vised deep transfer learning-based change detection for HR multispectral feedforward neural networks,” in Proc. 13th Int. Conf. Artif. Intell. Statist.,
images,” IEEE Geosci. Remote Sens. Lett., vol. 18, no. 5, pp. 856–860, 2010, pp. 249–256.
May 2021. [30] W. A. Malila, “Change vector analysis: An approach for detecting forest
[7] Z. Zhang, G. Vosselman, M. Gerke, D. Tuia, and M. Y. Yang, “Change changes with Landsat,” in Proc. LARS Symposia, 1980, pp. 326–331.
detection between multimodal remote sensing data using siamese CNN,” [31] T. Celik, “Unsupervised change detection in satellite images using prin-
2018, arXiv:1807.09562. cipal component analysis and k-means clustering,” IEEE Geosci. Remote
[8] H. Chen, C. Wu, B. Du, L. Zhang, and L. Wang, “Change detection in Sens. Lett., vol. 6, no. 4, pp. 772–776, Oct. 2009.
multisource VHR images via deep siamese convolutional multiple-layers [32] N. Falco, G. Cavallaro, P. R. Marpu, and J. A. Benediktsson, “Unsupervised
recurrent neural network,” IEEE Trans. Geosci. Remote Sens., vol. 58, change detection analysis to multi-channel scenario based on morpholog-
no. 4, pp. 2848–2864, Apr. 2020. ical contextual analysis,” in Proc. IEEE Int. Geosci. Remote Sens. Symp.,
[9] F. Rahman, B. Vasu, J. Van Cor, J. Kerekes, and A. Savakis, “Siamese 2016, pp. 3374–3377.
network with multi-level features for patch-based change detection in [33] Q. Wang, Z. Yuan, Q. Du, and X. Li, “GETNET: A general end-to-end 2-D
satellite imagery,” in Proc. IEEE Glob. Conf. Signal Inf. Process., 2018, CNN framework for hyperspectral image change detection,” IEEE Trans.
pp. 958–962. Geosci. Remote Sens., vol. 57, no. 1, pp. 3–13, Jan. 2019.
[10] L. Bruzzone and D. F. Prieto, “Automatic analysis of the difference image [34] A. Appice, N. Di Mauro, F. Lomuscio, and D. Malerba, “Empowering
for unsupervised change detection,” IEEE Trans. Geosci. Remote Sens., change vector analysis with autoencoding in bi-temporal hyperspectral
vol. 38, no. 3, pp. 1171–1182, May 2000. images,” in Proc. CEUR Workshop Proc., vol. 2466, 2019, pp. 1–10.
[11] F. Bovolo, “A multilevel parcel-based approach to change detection in [35] A. Song, J. Choi, Y. Han, and Y. Kim, “Change detection in hyperspectral
very high resolution multitemporal images,” IEEE Geosci. Remote Sens. images using recurrent 3D fully convolutional networks,” Remote Sens.,
Lett., vol. 6, no. 1, pp. 33–37, Jan. 2009. vol. 10, no. 11, 2018, Art. no. 1827.
[12] A. Pomente, M. Picchiani, and F. Del Frate, “Sentinel-2 change detection [36] Z. Chen and F. Zhou, “Multitemporal hyperspectral image change detec-
based on deep features,” in Proc. IGARSS IEEE Int. Geosci. Remote Sens. tion by joint affinity and convolutional neural networks,” in Proc. 10th Int.
Symp., 2018, pp. 6859–6862. Workshop Anal. Multitemporal Remote Sens. Images, 2019, pp. 1–4.
[13] P. Ghamisi et al., “Advances in hyperspectral image and signal processing:
A comprehensive overview of the state of the art,” IEEE Geosci. Remote
Sens. Mag., vol. 5, no. 4, pp. 37–78, Dec. 2017.
SAHA et al.: CHANGE DETECTION IN HYPERDIMENSIONAL IMAGES USING UNTRAINED MODELS 11041
[37] M. Molinier and J. Kilpi, “Avoiding overfitting when applying spectral- for Remote Sensing: Methodology and Application.”
spatial deep learning methods on hyperspectral images with limited la-
bels,” in Proc. IGARSS IEEE Int. Geosci. Remote Sens. Symp., 2019,
pp. 5049–5052.
[38] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Un-
Lukas Kondmann received the bachelor’s degree in
derstanding deep learning requires rethinking generalization,” 2016,
economics from the Ludwig-Maximilians-University
arXiv:1611.03530.
Munich, Munich, Germany, in 2016, the honors de-
[39] F. Williams, T. Schneider, C. Silva, D. Zorin, J. Bruna, and D. Panozzo,
gree in technology management from the Center for
“Deep geometric prior for surface reconstruction,” in Proc. IEEE Conf.
Digital Technology Management in Munich, Munich,
Comput. Vis. Pattern Recognit., 2019, pp. 10130–10139.
Germany, in 2017 and the master’s degree in social
[40] D. Bau, H. Strobelt, W. Peebles, J. Wulff, B. Zhou, J. -Y. Zhu, and A.
data science from the University of Oxford, Oxford,
Torralba, “Semantic photo manipulation with a generative image prior,”
U.K., in 2019. He is currently working toward the
ACM Trans. Graph. (TOG), vol. 38, no. 4, pp. 1–11, 2019.
Ph.D. degree in engineering with the Technical Uni-
[41] P. I. Wójcik, “Random projection in deep neural networks,” 2018,
versity of Munich, Munich, Germany, and the Ger-
arXiv:1812.09489.
man Aerospace Center in Munich, Munich, Germany.
[42] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
He was a Visiting Researcher working on Big Data for social good with the UC
large-scale image recognition,” 2014, arXiv:1409.1556.
Berkeley School of Information in spring 2017. His research is centered around
[43] M. Volpi and D. Tuia, “Dense semantic labeling of subdecimeter resolution
time-series analysis of multispectral remote sensing imagery with a focus on
images with convolutional neural networks,” IEEE Trans. Geosci. Remote
monitoring the Sustainable Development Goals (SDGs).
Sens., vol. 55, no. 2, pp. 881–893, Feb. 2017.
[44] M. Hasanlou and S. T. Seydi, “Sensitivity analysis on performance of dif-
ferent unsupervised threshold selection methods in hyperspectral change
detection,” in Proc. 10th IAPR Workshop Pattern Recognit. Remote Sens.,
2018, pp. 1–4. Qian Song (Member, IEEE) received the B.E. de-
[45] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE gree (Hons.) in communication engineering from the
Trans. Syst., Man, Cybern., vol. 9, no. 1, pp. 62–66, Jan. 1979. School of Information Science and Technology, East
[46] J. A. Lee and M. Verleysen, Nonlinear Dimensionality Reduction. Berlin, China Normal University, Shanghai, China, in 2015,
Germany: Springer, 2007. and the Ph.D. degree (Hons.) in electromagnetic field
[47] D. Marinelli, F. Bovolo, and L. Bruzzone, “A novel method for unsuper- and microwave technology from Fudan University,
vised multiple change detection in hyperspectral images based on binary Shanghai, China, in 2020.
spectral change vectors,” in Proc. 9th Int. Workshop Anal. Multitemporal She is currently a Postdoctoral Fellow with the
Remote Sens. Images, 2017, pp. 1–4. Remote Sensing Technology Institute (IMF), German
[48] P. Virtanen et al., “SciPY 1.0: Fundamental algorithms for scientific Aerospace Center (DLR), Wessling, Germany. Her
computing in Python,” Nature Methods, vol. 17, no. 3, pp. 261–272, 2020. research interests include advanced deep learning
[49] J. López-Fandiño, A. S. Garea, D. B. Heras, and F. Argüello, “Stacked technologies and their applications in synthetic aperture radar image interpreta-
autoencoders for multiclass change detection in hyperspectral images,” in tion.
Proc. IGARSS IEEE Int. Geosci. Remote Sens. Symp., 2018, pp. 1906– Dr. Song was the recipient of the URSI (International Union of Radio Science)
1909. Young Scientist Award in 2020.
[50] J. López-Fandiño, D. B. Heras, F. Argüello, and M. Dalla Mura, “Gpu
framework for change detection in multitemporal hyperspectral images,”
Int. J. Parallel Program., vol. 47, no. 2, pp. 272–292, 2019.
[51] F. Thonfeld, H. Feilhauer, M. Braun, and G. Menz, “Robust change vector Xiao Xiang Zhu (Fellow, IEEE) received the M.Sc.
analysis (RCVA) for multi-sensor very high resolution optical satellite degree, the Doctor of Engineering (Dr.-Ing.) degree,
data,” Int. J. Appl. Earth Observ. Geoinformation, vol. 50, pp. 131–140, and the “Habilitation” degree in the field of sig-
2016. nal processing from Technical University of Munich
[52] T. Ridler et al., “Picture thresholding using an iterative selection method,” (TUM), Munich, Germany, in 2008, 2011, and 2013,
IEEE Trans. Syst. Man Cybern., vol. SMC-8, no. 8, pp. 630–632, respectively.
Aug. 1978. She is currently the Professor for Data Science
[53] M. Sezgin and B. Sankur, “Survey over image thresholding techniques and in Earth Observation (former: Signal Processing in
quantitative performance evaluation,” J. Electron. Imag., vol. 13, no. 1, Earth Observation) with TUM and the Head of the De-
pp. 146–165, 2004. partment “EO Data Science,” Remote Sensing Tech-
[54] C. H. Li and C. Lee, “Minimum cross entropy thresholding,” Pattern nology Institute, German Aerospace Center (DLR),
Recognit., vol. 26, no. 4, pp. 617–625, 1993. Weßling, Germany. Since 2019, she has been a Cocoordinator of the Munich Data
[55] M. Zanetti, F. Bovolo, and L. Bruzzone, “Rayleigh-rice mixture parameter Science Research School. Since 2019, she also heads the Helmholtz Artificial
estimation via EM algorithm for change detection in multispectral images,” Intelligence—Research Field “Aeronautics, Space and Transport.” Since May
IEEE Trans. Image Process., vol. 24, no. 12, pp. 5004–5016, Dec. 2015. 2020, she has been the Director of the international future AI lab “AI4EO—
Artificial Intelligence for Earth Observation: Reasoning, Uncertainties, Ethics
and Beyond,” Munich, Germany. Since October 2020, she has also been a
Sudipan Saha (Member, IEEE) received the M.Tech. Co-Director of the Munich Data Science Institute (MDSI), TUM. He was a
degree in electrical engineering from the Indian Insti- Guest Scientist or Visiting Professor with the Italian National Research Council
tute of Technology Bombay, Mumbai, India, in 2014, (CNR-IREA), Naples, Italy, Fudan University, Shanghai, China, The University
and the Ph.D. degree in information and communi- of Tokyo, Tokyo, Japan, and University of California, Los Angeles, CA, USA, in
cation technologies from the University of Trento, 2009, 2014, 2015, and 2016, respectively. She is currently a visiting AI Professor
Trento, Italy, and Fondazione Bruno Kessler, Trento, with ESA’s Phi-lab. Her main research interests are remote sensing and earth
Italy, in 2020. observation, signal processing, machine learning and data science, with a special
He is currently a Postdoctoral Researcher with application focus on global urban mapping.
Technical University of Munich (TUM), Munich, Dr. Zhu is a Member of young academy (Junge Akademie/Junges Kolleg)
Germany. Previously, he was an Engineer with TSMC with the Berlin-Brandenburg Academy of Sciences and Humanities and the
Limited, Hsinchu, Taiwan, from 2015 to 2016. In German National Academy of Sciences Leopoldina and the Bavarian Academy
2019, he was a Guest Researcher with the TUM. His research interests are of Sciences and Humanities. She serves in the scientific advisory board in
related to multitemporal remote sensing image analysis, domain adaptation, several research organizations, among others the German Research Center for
time-series analysis, image segmentation, deep learning, image processing, and Geosciences (GFZ) and Potsdam Institute for Climate Impact Research (PIK).
pattern recognition. She is an Associate Editor for the IEEE TRANSACTIONS ON GEOSCIENCE AND
Dr. Saha is the recipient of Fondazione Bruno Kessler Best Student Award REMOTE SENSING and the Area Editor responsible for special issues of IEEE
2020. He is a Reviewer for several international journals and was a Guest Editor Signal Processing Magazine.
at Remote Sensing (MDPI) special issue on “Advanced Artificial Intelligence