BCDNet
BCDNet
https://doi.org/10.1007/s13042-023-01880-z
ORIGINAL ARTICLE
Received: 20 October 2022 / Accepted: 23 May 2023 / Published online: 25 June 2023
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023
Abstract
Change detection is becoming more and more popular technology for the analysis of remote sensing data and is very impor-
tant for an accurate understanding of changes that are happening in the Earth’s surface. Different Deep Learning methods
proposed till now are mainly focused on simple networks which results in poor detection for small changed areas because
they can not differentiate between the bi-temporal image’s characteristics. To solve this problem, this article proposes a novel
Building Change Detection Network (BCDetNet) for building object change detection and its analysis from bi-temporal high
resolution satellite image. The proposed BCDetNet model can detect small change areas with the help of multiple feature
extraction block. The proposed BCDetNet model executes building change detection using bi-temporal high resolution satel-
lite images. The proposed BCDetNet model is trained on two publicly available datasets namely LEVIR and WHU change
detection(CD) datasets. These datasets contain RGB images with dimensions of (1024 × 1024) and (512 × 512), respectively.
The BCDetNet model can learn from scratch during training and performs better than the benchmark change detection models
with fewer trainable parameters. The BCDetNet model gives Recall—94.06%, Precision—93.00%, Jaccard score—88.40%,
Accuracy—98.73%, F1 score—93.52% and Kappa coefficient—87.05% on LEVIR CD dataset and Recall—89.51%, Preci-
sion —92.78%, Jaccard score - 84.38%, Accuracy—96.78%, F1 score—91.06% and Kappa coefficient - 82.12% on WHU
CD dataset. This work is a step in the direction of achieving best results in building change detection from high resolution
satellite images.
Keywords Deep learning · Change detection · Siamese difference · Multiple feature extraction · Remote sensing
1 Introduction
1
* Shyam Lal Department of Electronics and Communication Engineering,
[email protected] National Institute of Technology Karnataka, Mangalore,
Karnataka 575025, India
K. S. Basavaraju
2
[email protected] Aerial Services and Digital Mapping, National Remote
Sensing Centre, Indian Space Research Organisation,
N. Solanki Hiren
Balanagar, Hyderabad, Telangana 500037, India
[email protected]
3
Forest Biodiversity and Ecology Division, National Remote
N. Sravya
Sensing Centre, Indian Space Research Organisation,
[email protected]
Balanagar, Hyderabad, Telangana 500037, India
J. Nalini
[email protected]
Chintala Sudhakar Reddy
[email protected]
13
Vol.:(0123456789)
4048 International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062
images, it will identify the changed area. e.g., the evolution is the formulation of robust loss function and the improve-
of plants or urban mutations, etc. ment of deep learning models. In this section, the litera-
Two types of CD are there, binary and semantic change ture explored during the study about the Land cover CD is
detection. In binary CD, per pixel, it will assign a binary given. In the discipline of remote sensing, CD is a critical
label on a pair of images taken at different times. Here task. For remote sensing using satellite images, various CD
positive label means the area related to that pixel is being approaches have been developed. To discover differences
changed, and a negative label means there is no change across remotely sensed images, manual approaches were ini-
in that pixel. While semantic CD will identify what is the tially used, but their main disadvantage was that they were
change that have happened at each location. time-consuming. There are several supervised and unsuper-
Copernicus and Landsat Program has made an enor- vised CD techniques in the literature nowadays, like graphi-
mous amount of Earth observation imagery available. So cal models [4, 5] principal component analysis [6], Markov
now, people can use this in advanced supervised machine Random Fields [7], and kernels [8]. Due to advancements
learning algorithms, which have been very popular in the in Machine Learning, Many neural network-based strategies
past decade, mainly in image processing. It is crucial to find have been developed in recent years. [9–12] have evolved,
out the best ways possible to use the available data. There which are capable of successfully addressing CD issues.
is a lot of data available in this field, but annotated datasets Most image analysis issues have lately been dominated by
are less. As a result, the complexity of the models that may more complex machine learning approaches (deep learn-
be employed is limited. Nevertheless, Many datasets, like ing), and this progress is slowly approaching the challenge
Onera Satellite Change Detection dataset published in [1], of change detection [13–15].
and the Air Change dataset presented in [2] may be utilized Since the development of AlexNet [16] and the first place
to train supervised machine learning algorithms that can win in the 2012 ImageNet Large Scale Visual Recognition
identify the change in image pairs. Challenge, Convolutional Neural Network (CNN) architec-
The major contributions of this research paper are as tures for CD applications have been investigated by a number
follows: of researchers.The most current techniques can be divided
into patch-based approaches, which use different types of
1. Introduced new Multiple Feature Extraction (MFE) CNN architectures to identify an image patch as changed
block, which extracts multi-scale features to detect or unchanged, and semantic segmentation approaches,
small change areas, which are crucial in accurate change which conduct semantic segmentation across the full image.
detection from satellite images. Because a single training image can yield numerous train-
2. Developed novel BCDetNet model by integrating newly ing patches, patch-based techniques overcome the paucity
introduced Multiple Feature Extraction block with Siam- of training data. Patch-based techniques, on the other hand,
diff Model and attention mechanism [3]. operate in a sliding window mode, which is both sluggish
3. The proposed BCDetNet model performs better on two and wasteful because the same locations are visited several
widely used CD datasets than Siam-diff Model and times (Patches that correspond to neighbouring centre pixels
existing deep learning change detection models, which have a lot of overlap).
is evident from the experimental results. Because of the minimal quantity of data available, most of
these methods rely on transfer learning techniques. Almost
The rest of the paper is organized as follows: Sect. 2 all networks for example, were trained on RGB images
describes the Related work. Section 3 gives a detailed therefore it can not be used to SAR or multispectral images,
description of the proposed work. Section 4 explains the as in the case of the dataset reported in [1]. These techniques
CD datasets used and the implementation details. Section 5 also prevent end-to-end training, which has been shown to
presents the ablation study. Section 6 depicts the experi- produce superior results for properly taught systems. As a
mental results and computational complexity of BCDetNet result, the focus of this study is on algorithms that can learn
and other CD architectures. Section 7 concludes the work. purely from accessible change detection data and, as a result,
can be applied to any dataset.
It’s not a novel concept to use machine learning to com-
2 Related work pare images. CNNs are a class of image-processing algo-
rithms that have been used to compare images in a variety
This section gives the summary of all background study and of scenarios [17–19]. For issues involving dense prediction,
prerequisites to work on CD technology. Many research- such as pixel-level prediction, fully convolutional architec-
ers worked on the approaches which to deal with weakly tures (FCNNs) have been proposed [20–22] in Earth observa-
supervised learning based on robust statistics and dedicated tion situations [23]. The use of such concepts in Earth obser-
mathematical modelling. The popular solution in this field vation, as well as their dominance over superpixel-based,
13
International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062 4049
patch-based, and other techniques, has already been inves- and self-attention, can automatically weight the feature
tigated [24]. Siamese models have also been proposed in map, enhancing the changed features and weakening the
several situations for the purpose of image comparison [18, unchanged features [35, 36]. Peng et al. [37] developed a
25]. With the advent of Deep learning research, many novel model that captures object change features by introducing
models have been applied to change detection tasks in recent spatial and channel attention. Currently, attention-based
years. Fully convolution networks (FCNs) are among the methods have high computational complexity. This article
commonly used structures [26]. Deep learning approaches presents a simple yet effective Deep Learning Architecture
seek to learn or transform abstract features from bitemporal BCDetNet for building Change Detection from High-Res-
images into a common feature space where their informa- olution Satellite Images. The proposed model extends the
tion is consistent and comparable. A Symmetric UNet net- FC-Siam-diff model with an attention module and MFE
work [27] was proposed for landslide mapping, in which a block that can detect small change areas. The proposed
pyramid pooling module is used to obtain multiscale change model performs better than the benchmark models with
information. Peng et al. [28] proposed a UNet++ network fewer parameters, thus, reducing computation complexity.
with multiple side output fusion. In [29] three UNet-based
fully convolutional (FC) networks are presented: FC-Early
Fusion (FC-EF), FC-Siamese-Concatenation (FC-Siam- 3 Proposed architecture
conc), and FC-Siamese-Difference (FC-Siam-diff). FC-EF
used the EF strategy, whereas FC-Siam-conc and FC-Siam- This section describes the proposed BCDetNet-deep learn-
diff used the late fusion strategy, with FC-Siam-conc fusing ing model for building object change detection from bi-
the features through concatenation and FC-Siam-diff fusing temporal High-Resolution Satellite Images. The proposed
the features through difference. Although these methods BCDetNet model is extension of the Fully Convolutional
have proven effective in CD, they lack global feature extrac- Siamese difference model (FC-Siam-diff) [29]. The pro-
tion. These methods place little emphasis on spatial context posed BCDetNet model consists of MFE block, the encoder-
information and the internal relationship between high- and decoder block and attention mechanism block. The main
low-level features. The obtained features are typically sensi- contribution of this work is the introduction of the new MFE
tive to noise, angle, shadow, and context, making them less (Multi Feature Extraction) block, which is used as a feature
robust to pseudo-changes. As a result, many improved algo- extractor. Instead of feeding the input image directly to the
rithms are proposed to encode image context in time-space modified FC-Siam-diff architecture (a combination of the
dimension better and improve feature discrimination ability, FC-siam-diff architecture and attention module), the MFE
such as stacking more convolution layers [30] and using the block extracts features from different fields of view, enabling
attention mechanism [31–34]. attention to both small and large land-cover changes. The
The attention mechanism, which includes spatial schematic of overall network architecture of BCDetNet is
attention (SA), channel attention, positional attention, shown in Fig. 1. For instance, in Fig. 1 the notation ‘20 → 16
Fig. 1 Schematic of Overall network architecture of BCDetNet, shared weights. E1 to E4 are encoder layers 1 to 4. D1 to D4 are
Block color legend: green color is Convolution, purple color is Trans- decoder layers 1 to 4 (color figure online)
pose Convolution, yellow color is Final layer and Doted lines shows
13
4050 International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062
→ 16’ in E1 indicates that 20 feature maps are provided as different filters of size 3 × 3 and 5 × 5 with ten channels
input to the layer, and after the convolution operation, there each. After the convolution operation with filters of various
are 16 feature maps as output. This convention applies to all sizes, the resulting multi-scale features are concatenated.
layers, where the first number represents the number of input Concatenated features are passed through the relu activation
feature maps, and the last number represents the number of layer to speed learning and produce more accurate results.
output feature maps after the convolution operation. In D1, The primary purpose of this block is to extract multi-scaled
the notation ‘up 2,→128 → 128’ means that the feature maps features from smaller and larger change areas of the input
are upsampled by a factor of 2, resulting in 128 input feature image. This block’s extracted multi-scale features are given
maps. After the convolution operation, there are 128 output to the encoder unit. Let I be the input image with size H ×
feature maps. W × 3. After convolution with two filters of size, 3 × 3 and
The research aims to develop FCNN architectures that 5 × 5 with ten channels each and concatenation, the size
could learn to identify changes solely from change detec- of the extracted multi-scaled feature f is H × W × 20. MFE
tion datasets, with no pretraining or transfer learning from block operations are as shown in Eq. 1. 𝜑 is the ReLU acti-
other datasets. Unlike most recent work on change detec- vation function. w3×3 and w5×5 are kernels. ∗ is convolution
tion, these designs can be taught from start to finish. The operation.
technique provided by Daudt et al. [29] has evolved into {( ) ( )}
these fully convolutional structures, where a patch-based f = 𝜑 I ∗ w3×3 + I ∗ w5×5 (1)
approach is used. Moving from patch-based architectures
to a fully convolutional method without affecting training
time that improves speed and prediction accuracy. These
fully convolutional networks can process inputs of any size. 3.2 Encoder unit
Two MFE blocks in the proposed architecture aims to
extract features from different area of the bi-temporal image As in [29], the encoder unit is divided into two streams with
and attention mechanism [3] is used to improve the per- identical structures that share weights. Let f1, and f2 , be the
formance. The goal is to extract important features using multi-scale feature extracted from the two MFE blocks with
MFE block and then combine the encoded information’s input1 ( I1) and input2 ( I2 ) respectively which are assigned
more abstract and less localised information with the spa- to one of these structures. Like the FC-Siam- diff network,
tial details available in the network’s earlier layers to create the encoder is composed of convolutional and pooling lay-
accurate class prediction with precise bounds in the output ers. In each stream of the encoder, four blocks of convolu-
image. tions followed by a maxpool of 2 × 2 are included. In the
first and second layers of the encoder, there are two 3 × 3
3.1 Multi feature extracting (MFE) block convolutions with a total of 16 and 32 channels, respectively,
followed by maxpool. In the third and fourth layers of the
Input 1 and input 2 images are applied to the two MFE encoder, there are three 3 × 3 convolutions with a total of
blocks. MFE block is illustrated in Fig. 2, which uses two 64 and 128 channels, respectively, followed by maxpool.
13
International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062 4051
Four max pooling layers are used in stream1, and three attention module [3]. These operations are repeated after
max pooling layers are used in stream2 of the encoder unit. each upsampling layer, but the number of 3 × 3 convolu-
Because the primary goals of CD techniques are to iden- tions used after the first, second, and third upsampling layers
tify differences between two images, the absolute value of are five, five, and four, respectively. The resulting feature
the difference in the features learned at each encoder layer from the fourth decoder stage is fed into the fourth atten-
is concatenated at the corresponding decoder layer. Equa- tion module. Decoder unit operations are shown in Eq. 6. Ω
tions 2 to 5 denotes the operation performed in encoder represents the upsampling and convolution operations per-
part. encs1 and encs2 are the features of encoder stream1 and formed at decoder unit. dec is the output obtained at each
encoder stream2 respectively. 𝜂 denotes the convolution and decoding stage.
max pooling operations performed in encoder stream1 and { }
stream2. dec = Ω |encs1 − encs2 | (6)
{( ) ( )}
f1 =𝜑 I1 ∗ w3×3 + I1 ∗ w5×5 (2)
3.4 Adaptive attention fusion module
{( ) ( )}
f2 =𝜑 I2 ∗ w3×3 + I2 ∗ w5×5 (3) Nowadays attention mechanism is becoming more and more
famous in the field of deep learning. In recent years, many
{ }
encs1 =𝜂 f1 (4) research scholars have added different type of attention mod-
ules and comparison of spatial and channel attention [38]
{ } modules. This article adds an attention mechanism to the
encs2 =𝜂 f2 (5) proposed CD model. The Network architecture of Adaptive
attention fusion module [3] used in this article is shown in
Fig. 3.
3.3 Decoder unit Spatial attention contributes to increase the distance
between changing and unmodified pixels. Channel atten-
The decoder part is responsible for projecting the learned tion’s job is to boost channels connected with changes in
features onto the pixel space. The decoder part receives the ground features and block channels that are not relevant. Not
feature from the encoder unit. The channel-wise concatena- all high-dimensional characteristics are useful for difference
tion of the features from the corresponding stages in the discriminating in the phase of change detection [39, 40] and
encoder part is used here, as in the FC-Siam-diff model irrelevant features may cause change detection further dif-
[29]. The decoder includes four upsampling layers. The ficult. The attention mechanism presented in [3] is adaptive
input signal is upsampled by a factor of two before going attention fusion module which helps to improve useful infor-
through a 2 × 2 convolution step. The absolute value of the mation and suppress absurd information using a dual-stream
difference features from the corresponding stages of encoder attention mechanism.
streams are then concatenated with this signal. The feature Channel attention module (CAM) As shown in Fig. 3 first
map is passed through a series of 3 × 3 convolutions and average pooling is applied on the concatenated input features
13
4052 International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062
(F) of dimension (H × W × C). To construct a C × 1 × 1 Where F is the input to the SAM module, and 𝜎(.) is the
vector, the elements of each channel are averaged, here C sigmoid activation function, as defined in Eq. 8. The deter-
is equal to number of channels. Then one-dimensional con- mination of 2D convolution kernel size k2 is same as that
volution operation is performed on the vector with a kernel of k1. An adaptive value determination technique depend-
size of k1. As in Fig. 3 the result of convolution is normalised ing on the size of the input matrix of feature (W and H) is
to a weight coefficient with value as in [41]. Then multiply used. Because in the datasets that we have used images are
each element of obtained result with each spatial element of squared shape like in Levir-cd it is (1024 × 1024 × 3) and in
the original feature for obtaining globally enhanced Channel whu-building dataset it is (512 × 512 × 3) so as we can see
attention matrix of feature ( MC ) which has the expression here value of W and H are equal so we can say that here W
as in Eq. 7: = H. The functional link between W and k2 is constructed
as in Eq. 13:
Mc = 𝜎(conv1d(Avgpool(F))) ⊗ F (7)
W = g(k2 ) (13)
here F is input to the CAM (merged matrix of feature ), 𝜎(.)
is the sigmoid activation function as in Eq. 8: Because the size of the input bi-temporal images used in
our study is 1024 × 1024 in one dataset and 512 × 512 in
1
𝜎(Xin ) = (8) the other, the size of the matrix of feature in various times
1 + e−Xin is always an exponential power of 2. k2 is calculated in the
Adaptive technique is used to determine the size of the one- same way as k1, as in Eq. 14:
dimensional convolution kernel k1 based on the number of ( )
log(W) + b
channels C. Relationship between k1 and C is described in k2 = Mod (14)
Eq. 9. a odd
13
International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062 4053
images because of the time span of 5–14 years. LEVIR-CD Even after reducing it to half, if validation accuracy won’t
covers many structures, including villas, high-rise flats, tiny improve, then training will be stopped automatically after
garages, and extensive warehouses. In this dataset there are 20 training epochs. BCDetNet and other benchmark mod-
637 images which are divided into different sets: (1) Train- els are trained for 30 epochs. Parameters used to set up
ing set-445 images, (2) Validation set-64 images (3) Test the neural network are given in Table 2.
set-128 images
WHU-CD [44] This dataset is a sub-dataset of an aerial 4.3 Loss function
Image. There is a total of two aerial images, which are
sub-divided. The aerial dataset contains almost 220,000 Weighted class categorical cross entropy (wcce) is the loss
individual buildings extracted from aerial photographs function used here, for image with b1 × b2 pixels and C
of Christchurch, New Zealand, with a spatial resolution classes, wcce is defined as in Eq. 16:
of 0.075 m and a coverage area of 450 km2 . WHU-CD b1 ,b2 ,c=2
dataset consists of a single image of size 15,354 × 32,507. 1 ∑
(16)
�
lwcce (y , y) = − wc ∗ Yijc ∗ log(Pijc )
We have created patches of 512 × 512 size which resulted C i,j,c=1
in a total of 828 images, which are divided into different
sets: (1) Training set-580 images, (2) Validation set-82 Where P,Y 𝜖 (0, 1)b1 ,b2 ,C and pijc = predicted probability that
images, (3) Test set-166 images. pixel (i,j)
{ belongs to class C, wc is weight of each } class
Table 1 gives the summary of the datasets. Yijc =
1 if pixel(i, j) belongs to class C
.
0 if pixel (i,j) does not belongs to class C
4.4 Evaluation metrics
4.2 Implementation details
The following metrics are used to assess the performance of
The proposed model is built on an NVIDIA Quadro the BCDetNet and benchmark models used for comparison:
RTX4000 GPU with 8GB of onboard memory. The model
is built with TensorFlow 2.7 and the Keras API frame- 1. Jaccard Coefficient [45] It is a widely used approach
work. ADAM optimizer (with momentum parameters as for determining the overlap between two sets. It is a
𝛽 1 = 0.9, 𝛽 2 = 0.9999, epsilon = 10-7) is used with an measure of how similar or different binary data are. In
initial learning rate of 0.0001. But while training, if the the instance of Binary Change Detection using satellite
validation accuracy doesn’t improve after five epochs, photos in a deep learning framework, Jaccard coefficient
then the learning rate will be automatically divided by 2.
13
4054 International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062
13
International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062 4055
Table 3 Ablation study of BCDetNet with simulated data experi- base model alone is tested, and test results are tabulated
ments on LEVIR-CD dataset in the second column of Tables 3 and 4. Secondly, the
Base model Base model BCDetNet base model with the attention module is tested, and the
With attention results are tabulated in the third column of Tables 3 and
4. Finally, the Base model with attention unit and MFE is
MFE × × ✓
tested, and the results are tabulated in the fourth column of
Attention × ✓ ✓ Tables 3 and 4. In the BCDetNet, all performance metrics
Accuracy 0.9828 0.9848 0.9873 are improved when compared with the base model alone
Recall 0.9165 0.8375 0.9406 or with the base model plus the attention module. For a
Precision 0.7829 0.8799 0.9300 better understanding of the model, features extracted at the
F1-Score 0.8444 0.86165 0.9352 different stages are shown in Fig. 5.
Kappa coefficient 0.8385 0.8502 0.8705 Tables 3 and 4 shows the Ablation Study of BCDetNet,
Jaccard Score 0.7349 0.8023 0.8840 from tables it is evident that BCDetNet performs better than
Trainable parameters 1,238,914 1,238,914 1,508,578 base model alone and with base model plus Attention mod-
ule. By incorporating the MFE block, the performance on
Bold indicates the best results the Levir-CD dataset improved significantly, as evidenced
by a 7.36% increase in F1-score, a 2.03% improvement in
Table 4 Ablation study of BCDetNet with simulated data experi- kappa coefficient, and an 8.17% increase in Jaccard score
ments on WHU-CD dataset compared to the modified FC-Siam-diff architecture. On the
Base model Base model BCDetNet WHU-CD dataset, the MFE block led to a 3.84% increase
With attention in F1-score, 0.9% increase in kappa coefficient, and a 6.04%
improvement in the Jaccard score compared to the modi-
MFE ✓
× ×
fied FC-Siam-Diff architecture. The 5 × 5 convolution in the
Attention × ✓ ✓ MFE block captures high-level context information, while
the 3 × 3 convolution focuses on local details to predict small
Accuracy 0.9501 0.9606 0.9678
changes.
Recall 0.7116 0.7023 0.8951
As illustrated in Fig. 5, visualization of Feature maps, we
Precision 0.7808 0.8436 0.9278
can see the features extracted at a different stage of the pro-
F1-Score 0.7396 0.8722 0.9106
posed model. Figure 5d shows the output of the MFE block,
Kappa coefficient 0.4805 0.8121 0.8212
which is used to extract multi-scaled features from smaller
Jaccard Score 0.6336 0.7834 0.8438
and larger change areas of the input image. When the output
Trainable parameters 1,238,914 1,238,914 1,508,578
of MFE is given as input to the model with encoder, decoder,
Bold indicates the best results and attention unit, the results are excellent, evident from the
predicted output as shown in Fig. 5h and results tabulated
(TP + FP) × (TP + FN) + TN + FP) × (TN + FN) in Tables 3 and 4.
𝜂=
(TP + TN + FP + FN)2
(27)
6 Experimental results
13
4056 International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062
Fig. 5 Feature maps visualization. a Input 1. b Input 2. c Label. d After MFE Block. e Before final layer of encoder. f After encoder. g Before
final layer of decoder. h Final output of BCDetNet
existing state-of-the-art models on the LEVIR-CD dataset. which shows the effectiveness of the BCDetNet in terms of
The accuracy of the proposed model is 98.73%, whereas, for computational complexity. Figure 8 shows the Benchmark
AGCDetNet and ADS-Net, it is 98.38% and 98.45%, respec- and BCDetNet prediction results on the LEVIR-CD data-
tively. The recall metric for the BCDetNet is much greater set. From Fig. 8 it is evident that the BCDetNet prediction
than any other benchmark models and higher by 2.5% than results are much better than the benchmark model prediction
the FC-Siam-diff model, which has a higher value among the results.
benchmark models. U-Net++ has the highest value for pre- Table 6 shows Benchmark and BCDetNet quality met-
cision among the benchmark models, whereas the BCDet- rics on the WHU-CD dataset. All Quality Metrics are
Net has a slightly higher value than U-Net++. AGCDetNet increased in the proposed model compared to any bench-
has the highest value for Jaccard score among the existing mark model. From Table 6, it is evident that the num-
state-of-the-art models, but the BCDetNet is having 5% ber of parameters in the BCDetNet is around 1.5 million,
higher value than AGCDetNet. The F1-score is best for the which is 2nd lowest. In contrast, the BCDetNet gives the
BCDetNet, which is 9% higher than the FC-Siam-diff model, best result in every quality metric compared to any other
which has the highest value among benchmark models. The benchmark model. Among all the benchmark models,
FC-Siam-diff has the best Kappa coefficient value among AGCDet-Net gives the best results and contains the high-
benchmark models, whereas the BCDetNet has a 3.20% est parameters of more than 60 million compared to the
higher value than FC-Siam-diff. Fc-Siam-diff has fewer BCDetNet, which only has 1.5 million parameters and
parameters than the proposed model, but the BCDetNet out- gives better results than AGCDet-Net. The Percentage
performs FC-Siam-diff in terms of all quality metrics with improvement in parameters when compared to any other
slightly more parameters. The effectiveness of the BCDetNet benchmark model are Recall: 12.36%; In Precision: 1.13%;
is evident in Table 5 for the LEVIR-CD dataset. AGCDet- F1-Score: 8.74%; kappa: 12.65%.
Net and ADS-Net have good accuracy among the bench- Figure 9 shows the Benchmark and BCDetNet models
mark models with more than 60 and 2.5 million parameters, prediction results on the WHU-CD dataset. From Fig. 9 it
respectively. The BCDetNet gives accuracy more than AGC- is evident that the BCDetNet model prediction results are
DetNet and ADS-Net with around 1.5 million parameters, much better than the benchmark model prediction results.
13
International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062 4057
Table 6 Benchmark and BCDetNet models Quality Metrics on the WHU-CD dataset
U-Net FC-siam-diff U-Net++ AGCDet-Net ADS-Net Proposed BCDetNet
(2015) (2018) (2019) (2021) (2021)
6.2 Computation complexity study 38.74 billion FLOPs and above 1.5 million parameters.
The highest number of FLOPs and parameters are found
The number of Floating-point operations per second in AGCDet-Net. The ADS-Net utilizes the least number
(FLOPs) and the total number of trainable parameters of FLOPs. The FC-Siam-diff uses less number of param-
required to run the model are used to determine the model’s eters. The training time and prediction time per image is
complexity [47]. Tables 7 and 8 shows the Computation also a measure of a model’s complexity [47]. Except for
Complexity study of the BCDetNet and other benchmark FC-Siam-diff, the BCDetNet takes the least training time and
models used for comparison on LEVIR-CD and WHU-CD prediction time per image compared to any of the existing
datasets, respectively. Except for FC-Siam-diff, the BCDet- benchmark models used here for LEVIR-CD dataset. AGC-
Net has the fewest parameters compared to any of the exist- Det-Net takes highest training time of 2.85 h and 0.30 s of
ing benchmark models used here. The BCDetNet requires prediction time per image on LEVIR-CD dataset. Even in the
13
4058 International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062
13
International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062 4059
Input
Image-2
Ground
Truth
ADS-Net
AGCDet-
Net
FC-Siam-
diff
U-Net
U-Net++
Proposed
BCDet-
Net
13
4060 International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062
Input
Image-2
Ground
Truth
ADS-Net
AGCDet-
Net
FC-Siam-
diff
U-Net
U-Net++
Proposed
BCDet-
Net
13
International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062 4061
multi-scaled features from the input image’s smaller and (CVPRW), pp 61–69. https://doi.org/10.1109/CVPRW.2015.
larger change areas and gains contextual intelligence. Dur- 7301384
5. Vakalopoulou M, Platias C, Papadomanolaki M, Paragios N,
ing training, the proposed model can learn from scratch and Karantzalos K (2016) Simultaneous registration, segmentation
outperforms benchmark models with fewer parameters. The and change detection from multisensor, multitemporal satellite
proposed BCDetNet architecture performs better than the image pairs. In: 2016 IEEE international geoscience and remote
existing change detection deep learning models. The pro- sensing symposium (IGARSS), pp. 1827–1830. https://doi.org/
10.1109/IGARSS.2016.7729469
posed model boosts the quality metrics Recall—2.41%, 6. Deng J, Wang K, Deng Y, Qi G (2008) Pca-based land-use change
Jaccard score—5.15%, F1-score—9.08%, and Kappa detection and analysis using multitemporal and multisensor satel-
coefficient—3.2% compared to any benchmark model lite data. Int J Remote Sens 29(16):4823–4838
on the LEVIR-CD dataset. There is an improvement of 7. Singh P, Kato Z, Zerubia J (2014) A multilayer Markovian model
for change detection in aerial image pairs with large time differ-
Recall—12.36%, precision—1.13%, Jaccard score—11.75%, ences. In: 2014 22nd international conference on pattern recogni-
F1-score—8.74%, and Kappa coefficient—12.36% compared tion, pp 924–929 . https://doi.org/10.1109/ICPR.2014.169
to any benchmark model on the WHU-CD dataset. The pro- 8. Volpi M, Tuia D, Camps-Valls G, Kanevski M (2011) Unsuper-
posed BCDetNet model with around 1.5 million parameters vised change detection in the feature space using kernels. In: 2011
IEEE international geoscience and remote sensing symposium, pp
performs better in all quality metrics than U-Net, U-Net++, 106–109. https://doi.org/10.1109/IGARSS.2011.6048909
AGCDet-Net, and ADS-Net with 31.04, 48.82, 60.20, and 9. Liu J, Gong M, Qin K, Zhang P (2018) A deep convolutional
2.57 million parameters, respectively. This work’s limitation coupling network for change detection based on heterogeneous
is that the number of parameters is slightly higher than the optical and radar images. IEEE Trans Neural Netw Learn Syst
29(3):545–559. https://doi.org/10.1109/TNNLS.2016.2636227
FC-Siam-diff model. Future work will include expanding 10. Gong M, Zhao J, Liu J, Miao Q, Jiao L (2016) Change detection
this work to semantic change detection and developing a in synthetic aperture radar images based on deep neural networks.
mechanism to reduce the number of parameters while retain- IEEE Trans Neural Netw Learn Syst 27(1):125–138. https://doi.
ing the quality metrics or further improving than existing org/10.1109/TNNLS.2015.2435783
11. El Amin AM, Liu Q, Wang Y (2017) Zoom out cnns features for
ones. optical remote sensing change detection. In: 2017 2nd Interna-
tional conference on image, vision and computing (ICIVC), pp
812–817. https://doi.org/10.1109/ICIVC.2017.7984667
Funding This research work was supported by RESPOND scheme 12. Zhan Y, Fu K, Yan M, Sun X, Wang H, Qiu X (2017) Change
of Indian Space Research Organization (ISRO), Govt. of India under detection based on deep Siamese convolutional network for opti-
Grant No. ISRO/RES/4/683/19-20, December 30, 2019. cal aerial images. IEEE Geosci Remote Sens Lett 14(10):1845–
1849. https://doi.org/10.1109/LGRS.2017.2738149
Data availability The implementaion code and datasets used during the 13. Stent S, Gherardi R, Stenger B, Cipolla R (2015) Detecting change
current study are available from the corresponding author on reason- for multi-view, long-term surface inspection. In: BMVC, pp 127-1
able request. 14. Liu J, Gong M, Qin K, Zhang P (2016) A deep convolutional
coupling network for change detection based on heterogeneous
Declarations optical and radar images. IEEE Trans Neural Netw Learn Syst
29(3):545–559
Conflict of interest No conflict of interest exits in the submission of 15. Gong M, Zhao J, Liu J, Miao Q, Jiao L (2015) Change detection
this manuscript, and manuscript is approved by all authors for publica- in synthetic aperture radar images based on deep neural networks.
tion. IEEE Trans Neural Netw Learn Syst 27(1):125–138
16. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classifi-
cation with deep convolutional neural networks. Adv Neural Inf
Process Syst 25:84–90
References 17. Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity met-
ric discriminatively, with application to face verification. In: 2005
1. Daudt RC, Le Saux B, Boulch A, Gousseau Y (2018) Urban IEEE computer society conference on computer vision and pattern
change detection for multispectral earth observation using con- recognition (CVPR’05), vol 1. IEEE, pp 539–546
volutional neural networks. In: IGARSS 2018–2018 IEEE interna- 18. Zagoruyko S, Komodakis N (2015) Learning to compare image
tional geoscience and remote sensing symposium, pp 2115–2118 patches via convolutional neural networks. In: Proceedings of the
. https://doi.org/10.1109/IGARSS.2018.8518015 IEEE conference on computer vision and pattern recognition, pp
2. Benedek C, Szirányi T (2009) Change detection in optical aerial 4353–4361
images by a multilayer conditional mixed Markov model. IEEE 19. Stent S, Gherardi R, Stenger B, Cipolla R (2015) Detecting change
Trans Geosci Remote Sens 47(10):3416–3430 for multi-view, long-term surface inspection. In: BMVC, pp 127-1
3. Wang D, Chen X, Jiang M, Du S, Xu B, Wang J (2021) Ads-net: 20. Long J, Shelhamer E, Darrell T (2015) Fully convolutional net-
An attention-based deeply supervised network for remote sens- works for semantic segmentation. In: Proceedings of the IEEE
ing image change detection. Int J Appl Earth Obs Geoinform conference on computer vision and pattern recognition, pp
101:102348 3431–3440
4. Vakalopoulou M, Karatzalos K, Komodakis N, Paragios N (2015) 21. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional
Simultaneous registration and change detection in multitemporal, networks for biomedical image segmentation. In: International
very high resolution remote sensing data. In: 2015 IEEE con- conference on medical image computing and computer-assisted
ference on computer vision and pattern recognition workshops intervention. Springer, pp 234–241
13
4062 International Journal of Machine Learning and Cybernetics (2023) 14:4047–4062
22. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH 36. Diakogiannis FI, Waldner F, Caccetta P (2021) Looking for
(2016) Fully-convolutional siamese networks for object track- change? Roll the dice and demand attention. Remote Sens
ing. In: European conference on computer vision. Springer, pp 13(18):3707
850–865 37. Peng X, Zhong R, Li Z, Li Q (2020) Optical remote sensing image
23. Audebert N, Le Saux B, Lefèvre S (2018) Beyond rgb: very high change detection based on attention mechanism and image differ-
resolution urban remote sensing with multimodal deep networks. ence. IEEE Trans Geosci Remote Sens 59(9):7296–7307
ISPRS J Photogramm Remote Sens 140:20–32 38. Zhang C, Yue P, Tapete D, Jiang L, Shangguan B, Huang L, Liu
24. Audebert N, Le Saux B, Lefèvre S (2017) Segment-before-detect: G (2020) A deeply supervised image fusion network for change
vehicle detection and classification through semantic segmenta- detection in high resolution bi-temporal remote sensing images.
tion of aerial images. Remote Sens 9(4):368 ISPRS J Photogramm Remote Sens 166:183–200. https://doi.org/
25. Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R (1993) Sig- 10.1016/j.isprsjprs.2020.06.003
nature verification using a “Siamese” time delay neural network. 39. Saha S, Bovolo F, Bruzzone L (2019) Unsupervised deep change
Adv Neural Inf Process Syst 6 vector analysis for multiple-change detection in vhr images. IEEE
26. Papadomanolaki M, Vakalopoulou M, Karantzalos K (2021) A Trans Geosci Remote Sens 57(6):3677–3693. https://doi.org/10.
deep multitask learning framework coupling semantic segmenta- 1109/TGRS.2018.2886643
tion and fully convolutional lstm networks for urban change detec- 40. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excita-
tion. IEEE Trans Geosci Remote Sens 59(9):7651–7668. https:// tion networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–
doi.org/10.1109/TGRS.2021.3055584 2023. https://doi.org/10.1109/TPAMI.2019.2913372
27. Lei T, Zhang Y, Lv Z, Li S, Liu S, Nandi AK (2019) Landslide 41. Bruzzone L, Bovolo F (2013) A novel framework for the design
inventory mapping from bitemporal images using deep convolu- of change-detection systems for very-high-resolution remote sens-
tional neural networks. IEEE Geosci Remote Sens Lett 16(6):982– ing images. Proc IEEE 101(3):609–630. https://doi.org/10.1109/
986. https://doi.org/10.1109/LGRS.2018.2889307 JPROC.2012.2197169
28. Peng D, Zhang Y, Guan H (2019) End-to-end change detection for 42. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: effi-
high resolution satellite images using improved unet++. Remote cient channel attention for deep convolutional neural networks.
Sens 11(11):1382. https://doi.org/10.3390/rs11111382 In: 2020 IEEE/CVF Conference on computer vision and pattern
29. Caye Daudt R, Le Saux B, Boulch A (2018) Fully convolutional recognition (CVPR), pp. 11531–11539. https://d oi.org/10.1109/
Siamese networks for change detection. In: 2018 25th IEEE inter- CVPR42600.2020.01155
national conference on image processing (ICIP), pp 4063–4067. 43. Chen H, Shi Z (2020) A spatial-temporal attention-based method
https://doi.org/10.1109/ICIP.2018.8451652 and a new dataset for remote sensing image change detection.
30. Zhang M, Shi W (2020) A feature difference convolutional neu- Remote Sens 12(10):1662
ral network-based change detection method. IEEE Trans Geosci 44. Ji S, Wei S, Lu M (2019) Fully convolutional networks for mul-
Remote Sens 58(10):7232–7246. https://doi.org/10.1109/TGRS. tisource building extraction from an open aerial and satellite
2020.2981051 imagery data set. IEEE Trans Geosci Remote Sens 57(1):574–586.
31. Zhang C, Yue P, Tapete D, Jiang L, Shangguan B, Huang L, Liu https://doi.org/10.1109/TGRS.2018.2858817
G (2020) A deeply supervised image fusion network for change 45. Song K, Jiang J (2021) Agcdetnet: an attention-guided network
detection in high resolution bi-temporal remote sensing images. for building change detection in high-resolution remote sens-
ISPRS J Photogramm Remote Sens 166:183–200 ing images. IEEE J Select Top Appl Earth Obs Remote Sens
32. Ding Q, Shao Z, Huang X, Altan O (2021) Dsa-net: a novel deeply 14:4816–4831
supervised attention-guided network for building change detection 46. Singh R, Rani R (2020) Semantic segmentation using deep con-
in high-resolution remote sensing images. Int J Appl Earth Obs volutional neural network: a review. In: Proceedings of the inter-
Geoinf 105:102591 national conference on innovative computing & communications
33. Chen J, Yuan Z, Peng J, Chen L, Huang H, Zhu J, Liu Y, Li H (icicc)
(2021) Dasnet: dual attentive fully convolutional Siamese net- 47. Basavaraju KS, Sravya N, Lal S, Nalini J, Reddy CS, Dell’Acqua
works for change detection in high-resolution satellite images. F (2022) Ucdnet: a deep learning model for urban change detec-
IEEE J Select Top Appl Earth Obs Remote Sens 14:1194–1206. tion from bi-temporal multispectral sentinel-2 satellite images.
https://doi.org/10.1109/JSTARS.2020.3037893 IEEE Trans Geosci Remote Sens 60:1–10. https://d oi.o rg/1 0.1 109/
34. Shi Q, Liu M, Li S, Liu X, Wang F, Zhang L (2022) A deeply TGRS.2022.3161337
supervised attention metric-based network and an open aerial
image dataset for remote sensing change detection. IEEE Trans Publisher's Note Springer Nature remains neutral with regard to
Geosci Remote Sens 60:1–16. https://doi.org/10.1109/TGRS. jurisdictional claims in published maps and institutional affiliations.
2021.3085870
35. Alimjan G, Jiaermuhamaiti Y, Jumahong H, Zhu S, Nurmamat Springer Nature or its licensor (e.g. a society or other partner) holds
P (2021) An image change detection algorithm based on multi- exclusive rights to this article under a publishing agreement with the
feature self-attention fusion mechanism unet network. Int J Pattern author(s) or other rightsholder(s); author self-archiving of the accepted
Recognit Artif Intell 35(14):2159049 manuscript version of this article is solely governed by the terms of
such publishing agreement and applicable law.
13