Multi-Scale Feature Based Land Cover Change Detection in Mountainous Terrain Using Multi-Temporal and Multi-Sensor Remote Sensing Images
Multi-Scale Feature Based Land Cover Change Detection in Mountainous Terrain Using Multi-Temporal and Multi-Sensor Remote Sensing Images
ABSTRACT Land use and land cover (LULC) change is frequent in mountainous terrain of southern China.
Although remote sensing technology has become an important tool for gathering and monitoring LULC
dynamics, image pairs can occur scale changes, noises, geometrical distortions, and illuminated variations if
these are acquired from different types of sensors (e.g., satellites). Meanwhile, how to design an efficient land
cover change detection algorithm that ensures a high detection rate remains a critical and challenging step.
To address these problems, we propose a robust multi-temporal change detection framework for land cover
change in mountainous terrain which contains the following contributions. i) To transform multi-temporal
remote sensing image pairs acquired by different type of sensors into the same coordinate system by
image registration, a multi-scale feature description is generated using layers formed via a pretrained VGG
network. ii) A gradually increasing selection of inliers is defined for improving the robustness of feature
points registration, and L2 -minimizing estimate (L2 E)-based energy optimization is formulated to calculate a
reasonable position in a reproducing kernel Hilbert space. iii) Fuzzy C-Means classifier is adopted to generate
a similarity matrix between image pair of geometric correction, and a robust and contractive change map
is built through feature similarity analysis. Extensive experiments on multi-temporal image pairs taken by
different type of satellites (e.g., Chinese GF and Landsat) or small unmanned aerial vehicles are conducted.
Experimental results show that our method provides better performances in most cases after comparing with
the five state-of-the-art image registration methods and the four state-of-the-art change detection methods.
INDEX TERMS LULC change, multi-scale feature description, inliers, L2 E, fuzzy C-Means classifier.
2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
77494 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 6, 2018
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
F. Song et al.: Multi-Scale Feature Based Land Cover Change Detection in Mountainous Terrain
on the remote sensing image acquired by satellite sensors maintains a high matching ratio on inliers while taking advan-
(e.g., Landsat, MODIS and SPOT-VGT), and the relatively tage of outliers for varying the warping grids.
low spatial resolution limited the identification of them due In this paper, we present a robust set of change detection
to the small size and scattered distribution of land cover in framework for monitoring land cover change in mountainous
mountainous terrain. Compared with the above-mentioned terrain with multi-temporal remote sensing images. In the
methods, Wei et al. [9] and Milas et al. [11] capture more preprocessing stage of change detection, a multi-scale fea-
land distribution details than satellite remote sensing images ture based image registration method is proposed to align
by small unmanned aerial vehicle (UAV) with a small dig- image pairs acquired by different type of sensors. Compared
ital camera. There has to exist visual difference in camera with the current methods, the major contributions of our
viewpoints, although they were captured from a same location work include: (i) multi-scale feature descriptor (MFD) con-
and were matched using GPS data. Deep networks are robust structed by CNN-based feature descriptor (CFD) and shape
(i.e., invariant) to differences in viewpoints and illumination context (SC). CFD is generated by layers formed a pretrained
condition, and nevertheless are sensitive to highly-abstract, VGG network; (ii) to estimate correspondence and transfor-
semantic differences of images. Specifically, recently popular mations, a gradually increasing selection of inliers is realized
convolutional neural networks (CNNs) are particularly well instead of using a stationary distinction of inliers and outliers.
suited this task, Li and Yu [12] supposed a high-quality visual At the early stage of registration, the rough transformation is
saliency model can be learned from multi-scale features quickly determined by the most reliable feature points. After
extracted using CNNs. which the registration details are optimized by increasing
Moreover, some factors cause that these image pairs the number of feature points. Then, L2 -minimizing estimate
acquired cannot apply directly to identify regions of change (L2 E) based energy optimization is formulated to calculate
since scale changes, noises, geometrical distortions, and dis- a reasonable position in a reproducing kernel Hilbert space;
continuous rotated images with illuminated variations may (iii) fuzzy C-Means classifier is adopted to generate a simi-
also be produced in such multi-temporal images. These fac- larity matrix between transformed image pairs.
tors are as follows: (i) when satellite revolves around its The rest of the paper is organized as follows. Section II
orbit, image acquired can have geometrical distortions due introduces a novel deep learning based framework, which
to the modeling inaccuracy of the sensor geometry, and the is infused with the CNN feature and the deep neural net-
jitter of the instruments platform during image acquisition. work (DNN), to detect land cover change in mountain-
(ii) when collecting multi-temporal images for the same loca- ous terrain. Section III demonstrates our experiments; and
tion, the imaging perspective of small UAVs is often easily Section IV draws conclusions.
affected by wind speed/direction, complex terrain, aircraft
II. METHODOLOGY
posture (pitch, roll, yaw), flying height and other human
In this section, we first give the details of three contributions
factors. In order to effectively improve the matching degree • multi-scale feature description;
between the image and the actual terrain, the preprocessing • dynamic inlier selection;
of these image pairs is an essential step, i.e., image registra- • fuzzy C-means classifier based pre-classification.
tion method can align these image pairs of the same scene Second, we give the details of the proposed land cover change
taken from different viewpoints, from different times or with detection framework. Figure 1 shows the framework of the
different sensors. However, most of the current registration proposed method. Finally, our algorithm and parameter set-
methods are only suitable for a type of sensor, and are not tings are discussed in the latter part of this section.
sensitive enough to multi-temporal image pairs. Therefore, Let us consider a image pair It1 and It2 , acquired over the
our goal focus on multi-temporal remote sensing image pairs same geographical area at two different time t1 and t2 . The
acquired by different type of sensors, and transforms them feature point sets A and B first are extracted from It1 and It2
into the same coordinate system. respectively. Next the transformed image It is obtained by
Numerous algorithms [21], [24], [25], [27]–[31] for dif- our registration algorithm. Note that It01 and It02 is obtained by
ferent registration scenarios have been presented in the last equal split of It and It2 according to a certain ratio. Finally,
few decades. The coherent point drift (CPD) algorithm for we input It01 and It02 into the model as change detection, and a
both rigid and non-rigid point set registration [21] treated change detection map Smap will be generated.
one point set as centroids of a Gaussian mixture model, Throughout the paper we use the following notations:
and then fitted it to the other. It applied a fast Gaussian • AN ×D = {a1 , ..., aN }T , BM ×D = {b1 , ..., bM }T - feature
transform [22] and low-rank matrix [23] approximation tech- point sets are extracted from a image pair It1 and It2 ,
niques to reduce a large computational burden. Recently, respectively. D denote the dimension of feature point
in order to estimate correspondence relationship between sets, and D = 2.
two images, GLMDTPS [24] proposed a global and local • τ - the transformation function.
mixture distance. PRGLS [25] used the point registration as • B∗ - transformed locations of source point set B.
the estimation of a mixture of densities to preserve both • It - the transformed image.
global and local structures during matching. More recently, • It0 , It0 - obtained by equal split of It and It2 .
1 2
Zhang et al. [28], [29] introduced an effective method that • Smap - a change detection map.
FIGURE 1. Flowchart of the proposed land cover change detection framework, consisting of three main phases:
(1) a multi-scale feature description, (2) an effective registration processing, and (3) an effective change detection
processing. Note that correct feature point matches are denoted by yellow lines, incorrect ones are denoted by red lines.
A. MULTI-SCALE FEATURE DESCRIPTOR dataset for image classification [32]. VGG-16 has sixteen
The mountainous terrain is the major geomorphic structure layers (as shown Figure 2) including 5 blocks of convolu-
in the south of China, and have special natural conditions tion computation, each with 2-3 convolution layers and a
(e.g., overcast and foggy), a fragile ecological environment. max-pooling layer at the end of each block, from which we
Therefore, it is sometimes hard to perform precise image select one of its pool3, pool4 and pool5_1 layers. We lay
registration since images acquired by different type of sensors a 28 × 28 grid over the input image dividing our patches,
can aggravate the non-rigid geometric distortions of images. each corresponding to a 256 − d vector in the pool3 output,
We will attempt the features extracted by the convolutional a descriptor is generated in every 8 × 8 square. The center of
neural networks (CNNs) to improve the feature expression. each patch is regarded as a feature point. The 256 − d vector
is defined as the pool3 feature descriptor. The pool3 layer
output directly forms our pool3 feature map f1 , which is of
1) CNN-BASED FEATURE DESCRIPTOR (CFD). size 28 × 28 × 256. The pool4 layer output, which is of size
CFD is constructed by one of the state-of-the-art CNN which 14×14×512, is handled slightly differently. In every 16×16
used the VGG-16 architecture and pre-trained on ImageNet area we obtain a pool4 descriptor, and pool4 feature map f2
FIGURE 2. Architecture of modified VGG-16 network. h and w denote the height and width of the input image, respectively. Since we only use
convolution layers to extract features, the input image will not adjust the size to keep the feature of the original image as long as h and w are
multiples of 32.
FIGURE 4. Shape context (SC) computation and matching. Left of (a) and (b): diagrams of log-polar histogram bins
centered at an and bm used in computing the shape contexts. We use 5 bins for log(r ) and 12 bins for θ . Right
of (a) and (b): each shape context e.g., hb a
m or hn is a log-polar histogram of the coordinates of the rest of the point set
measured using the centered point as the origin.
the shape respectively. Csc (m, n) denotes the cost of match- B. DYNAMIC INLIER SELECTION
ing these two point sets, and is measured using Chi-square Our feature points are acquired at the center of square
distribution as: shaped image patches. Due to reasons of large rotation angles
X
1 X [hbm (x) − han (x)]2 and deformation, corresponding feature points may have
Csc (m, n) = (7) their image patches overlapping partly or completely. Thus,
2 hbm (x) + han (x)
x=1 to improve the effect of the registration, feature points with
where hbm (x) and han (x) are two 1 × X sets, and denote the large overlapping ratios should have a better degree of align-
number of points within each bin surrounding bm and an , ment, where as partly overlapping patches should have a
respectively. small distance between their centers. Therefore, the degree of
alignment is determined using our dynamic inlier selection.
3) MIXTURE FEATURE DESCRIPTOR (MFD). In point set registration, there are several ways to esti-
We first compute a integrated cost matrix Cmfd mate the parameters of the mixture model, such as the EM
J using a
element-wise Hadamard product (denoted by ), and is algorithm, gradient descent and variational inference. Our
written as: point set registration mainly contains the following two steps:
θ (i) correspondence estimation, the corresponding target point
K
Cmfd = Ccfd Csc (8)
set Aψ is estimated between B and A; (ii) transformation
where Ccfdθ and C θ are value in [0, 1]. Then, we apply Updating, the transformation function τ is established to
sc
Jonker-Volgenant algorithm [37] to solve the linear assign- update the position of τ (B) constantly, until τ (B) and Aψ can
ment on cost matrix Cmfd . Assigned point pairs are regarded overlap as much as possible. Note that τ (B) (initial τ (B) = B)
as putatively corresponding. indicates the transformed set B in each iteration.
Therefore, the inliers of selection are assigned in every k generated by the GMM. The GMM probability density
iteration to iteratively address B. Note that these inliers guide function is
for bundle adjustment of point locations whereas outliers are 1 1
moved coherently. At the feature prematching stage, a low p(an |bm ) = − exp 2 kan − bm k2 (11)
2π σ 2 2σ
threshold θ0 is applied to filter out irrelevant points and select
Then, the outlier and noise distribution is supposed an
coarsely a large number of feature points. Then, a large start-
additional uniform distribution p(a|M + 1) = N1 , which
ing threshold θ̃ is adopted to select confident inliers satisfy.
is added to the mixture model. Thus, the mixture model
In the rest of registration process, threshold θ is subtracted
takes the form
by a step-length ι in every k iterations, allowing a few more
N M
feature points with high similarity to affect the estimating X X 1
p(an ) = (1 − ε) log p(m)p(an |bm ) + ε
correspondence and transformation. Such technology enables N
n=1 m=1
feature points with high similarity to complete the overall
(12)
transformation while other feature points optimize registra-
tion accuracy. where p(m) = M1 denotes the mixed weight that are
nonnegativity and sum-to-one. We use equal isotropic
C. PRE-CLASSIFICATION covariances σ 2 and equal membership probabilities p(m)
The pre-classification step chooses the pixels that are for all GMM components (m = 1, ..., M ). ε denotes the
best suited to train the deep neural network. The Fuzzy weight of the uniform distribution, with 0 ≤ ε ≤ 1.
C-Means (FCM) is a popular image segmentation technique We compute the revised parameter as:
that segments an image by discovering cluster centers. Sup- PN PM
p(m|an )
pose a0ij and b0ij denote gray levels of the image pixels at ε = 1 − n=1 m=1 (13)
the corresponding positions (i, j) in It01 and It02 , respectively. N
We use FCM classifier to provide jointly classify for the two Subsequently, inlier selection calculate a m × n prior
input images, and a similarity matrix s0ij is established. probability matrix pmn which is then taken by our
Gaussian mixture model (GMM) based transformation
|a0 ij − b0 ij | solver.
s0ij = (9)
a0 ij + b0 ij 1 if bm and an are corresponding,
pmn = 1 − υ (14)
where 0 ≤ s0ij ≤ 1. Then, a global threshold value of otherwise
similarity T will be applied to s0ij by the iterative threshold N
method. Iterate over all a0 ij and b0 ij , if s0ij > T , then jointly where υ ∈ (0, 1) should be designated according to our
label a0 ij and b0 ij by FCM based on the principle of minimum confidence of the inlier selection to be accurate. Prior
variance δij2 . Otherwise label aij and bij separately. δij2 is probability matrix requires normalization:
written as: pmn
pmn := PN (15)
a0 ij b0 ij k=1 pmk
δij2 = a0ij [s0 ]2 (10)
a0 ij + b0 ij ij By the equation (15), the M × N posterior probability
matrix is obtained, which is used as the fuzzy correspon-
The gray-level of each pixel in the same position of the
dence matrix P between Is and Ir . Then, the correspond-
corresponding two original images are compared to label the
ing target point set is obtained by
pixels. The label of a pixel and its surrounding neighbor-
hood can be used to determine if a pixel is either part of Aψ = PA (16)
an edge or noise. The results are then passed to the neural
Though the target coordinate Aψ is estimated by GMM,
network for training.
the method will inescapable produce mismatching.
• Transformation Updating. Firstly, a positive definite
D. MAIN PROCESS
kernel (e.g., Gaussian kernel) is chosen; and a repro-
1) IMAGE REGISTRATION
ducing kernel hilbert space (RKHS) [38], [39] H is
To effectively eliminate the geometric error and improve the defined. Then, we employ the Gaussian Radial Basis
matching degree between the image and the actual terrain, Function (GRBF), which is in the form G(bi , bj ) =
image registration is an essential step in the preprocessing of |b −b |2
remote sensing image including two processes: feature point exp(− i β 2 j ), where β is a constant to control the
set registration and image transformation. Firstly, we carry spatial smoothness and G is of size m × m. According
out feature point set registration. to the representation theorem, a displacement function
• Correspondence Estimation. Gaussian mixture model ν(B) takes the form
(GMM) has been proven the popular model in computer M
X
vision and pattern recognition. Thus, the set B are used τ (B) = G(b, bm )ψ (17)
as GMM centroids, and the set A as the data points m=1
FIGURE 5. Left: Vectorization of neighborhood features to be fed into the network. Right: The structure of an RBM, consisting of two layers, one
visible (v) and one hidden (h), with no connections within a layer. Hidden nodes are indicated by blue filled circles and the visible nodes indicated by
unfilled circles.
where ψ = (ψ1 , ψ2 , ..., ψm )T is a D × 1 coefficient After updating the coordinates of the source point set by
matrix. Therefore, the minimization over energy equa- B = B + Uψ, we anneal the covariances of the GMM
tion in H boils down to finding a finite coefficients by σ 2 = ρσ 2 , then return to correspondence estimation
matrix ψ. transformation function ν(B) is equivalent to and continue the feature point sets registration process until
the initial position plus a displacement function τ (B), the maximum iteration number is reached. Note the trans-
i.e., ν(B) = B + τ (B). formed source point set B∗ is obtained in the final iteration.
Though the reliable target coordinate Aψ is estimated by Next, we employ the backward approach [40] to establish a
GMM, the method will inescapable produce mismatch- thin-plate spline (TPS) [41] transformation model, then the
ing. Therefore, our next concern draws on formulating transformed image It can be calculated using the model.
a function, by which a reasonable position τ (bm ) of bm It01 and It02 is obtained by equal split of It and It2 according
is determined. This position in turn improves the accu- to a certain ratio. (see Figure 1)
racy of the correspondence estimation as subsequent
iterations interlock. Since the error of L2 -minimizing 2) ESTABLISHING AND TRAINING THE DEEP NEURAL
estimator is less than the error of maximum likelihood NETWORKS FOR CHANGE DETECTION
estimation (MLE), L2 Euclidean an distance is widely Although the difference image method is well researched,
used in multiple applications, and many registration change detection is a comprehensive procedure that requires
methods, especially, the problem of point set registration careful consideration of many factors such as the nature of
can be well formulated by minimizing the L2 Euclidean change detection problems, image preprocessing, selection
distance between two point sets. Therefore, we employ of suitable variables and algorithms. DNN has brought in
the L2 E [39] based energy function to estimate the trans- profound and revolutionary changes to the realm of artifi-
formation function τ , which is written as cial intelligence, and achieved great improvements in many
1 domains such as computer vision, speech recognition and
E(ψ, σ 2 ) = − p̄ + λ k τ k2G (18)
D
2 (πσ )
D
2
natural language processing, etc. Therefore, we employ DNN
to train the pre-classification results and create a change
kAψ −Um,· ψk2
where p̄ = m2 M 1
P
m=1 D exp − 2σ 2 , detection map from pre-processed image pair directly with-
(2π σ 2 ) 2 out generating difference images. After pre-classification,
Uij = G(bi , bj ), Um,· denotes the mth row of matrix U,
the neighborhood features of each pixel and its corresponding
ψi denotes the ith row of the coefficient matrix ψh×D .
pixel in another image are converted into a vector as inputs to
Next, we can directly take the partial derivatives of
a neural network.
equation (18) with respect to coefficients matrix ψ,
The Restricted Boltzmann Machine (RBM). RBM is a
By setting them to zero, and solve the resulting linear
stochastic neural network, which consists of two layers of
system of equations. As follows:
! binary units: a visible layer v with n visible units and hidden
∂E T 28 (H ⊗ 1)
layer h with m hidden units. An example of this structure is
=U + 2λGψ (19) in Figure 5 with the hidden nodes indicated by blue circles
∂ψ D
nσ 2 (2πσ 2 ) 2
and the visible nodes indicated by white circles. A common
where 8 = Uψ − Aψ , H = exp{diag(88T )/2σ 2 } is a use for RBMs is to create features for use in classification.
M × 1 vector, diag(·) denotes the diagonal of a matrix, The energy function of the RBM model for visible and hidden
1 is a 1 × D row vector of all ones. Symbols and ⊗ units can be represented by the following:
denote the Hadamard product and Kronecker product,
respectively. E(v, h) = −ηT v − ς T h − hT Wv (20)
where η and ς are biases of the visible units and hidden Algorithm 1 Land Cover Change Detection Using
units, respectively. The matrix W denotes weights of the Multi-Temporal and Multi-Sensor Remote Sensing
connection between visible and hidden layer units, where Images in Mountainous Terrain
each matrix element is the conditional probability of the next Input: The source point set A and the target point set B
layer neuron conditioned on the previous layer neuron. The Output: The transformed image It
joint probability distribution of visible units v and hidden 1 Initialize θ0 , θ̃, ι, k, β, ω, δ , W and λ;
2
units h of the RBM is interpreted by 2 Image Registration.
1 E(v,h) 3 while not reach the maximum iteration number do
P(v, h) = e (21) 4 Correspondence Estimation:
Z
Compute Ccfd θ , C and C
5 sc mfd by equation (6), (7)
where Z = v0 h0 eE(v ,h ) is the partition function of the
P P 0 0
FIGURE 6. Location of study area in mountainous terrain of southern China. Red dots represent ten key land conservation regions regions
of Sichuan, Guizhou and Hunan China. Note that Sichuan Province, China (Longitude range: 97o 210 E to 108o 330 E; Latitude range: 26o 030 N
to 34o 190 N). Guizhou Province, China (Longitude range: 103o 360 E to 109o 350 E; Latitude range: 24o 370 N to 29o 130 N). Hunan Province, China
(Longitude range: 111o 530 E to 114o 150 E; Latitude range: 27o 510 N to 28o 410 N).
and have four distinct seasons because of the continental methods, we divided this dataset into three pars, 3000 for
monsoon. These areas have a variety of land cover types training, 1000 for validation and the remaining 2000 image
including cropland, building-up, forest, etc. Among these pairs for testing. In order to achieve better training effect, date
land cover types, the most dominant one is cropland, which set is formed by two categories of remote sensing image pairs:
can be easily affected by pseudo changes of phenological dif- (1) 4000 image pairs are acquired by different type of
ferences. In addition, we also obtained some satellite remote multi-sensor and multi-temporal satellites including Chinese
sensing data from other foreign mountainous terrain to verify GF, Landsat. The details of dataset (I) and (II) are summarized
the applicability of the method. in Table 1. In this dataset, a same satellite generally follow
We evaluate the performance of the proposal framework the same orbital paths with the same viewing angles and
on an available data set. The data set contains a total passed over a certain spot on earth at the same local time
of 6000 image pairs. To facilitate a fair comparison with other due to orbital mechanics. Therefore, image pairs acquired by
the same satellite cannot contains lager viewpoint change. formulations are as follows:
However, image pairs acquired by different sensors suffer v
u
u1 X M
serious scale change.
RMSE = t (bti − ati ) (27)
M
i=1
TABLE 2. The experimental dataset (III).
The RMSE can well reflect the spatial deviation of cor-
responding landmarks in the sensed image and the ref-
erence image, respectively. Where M is the total number
of the selected landmarks, and bti is the landmark that
corresponds to ati ;
TABLE 3. Experimental results on image registration. Quantitative comparisons on image registration measured using the mean RMSE are carried out.
FIGURE 7. Registration examples on two typical image pairs from dataset (I). (i) LakeOroumeih; (ii) Bastrop. Left: Image pair It and It acquired over
1 2
the same geographical area at two different time t1 and t2 by Landsat 8. Right: The first column until the end show the registration results of SIFT, SURF,
CPD, GLMDTPS, ZGL_CATE and Ours. For each method, the first row shows 5 × 5 checkboard and the second row shows the transformed image It .
TABLE 4. Experimental results on change detection. Quantitative comparisons on change detection measured using the PRC are carried out.
The comparison results are depicted Figures 10, 11 and 12 mainly because our method adopts DNNs to directly create
and Table 4. As shown in Table 4, the average precision a change detection map from pre-processed image pair by
of our method on dataset (I), (II) and (III) have reached bypassing the steps of filtering or generating a difference
to (98.3%, 97.5%), (97.9%, 96.3%), (98.4%, 96.8%) . How- image (DI). In contrast, PCA_Kmeans performs unsatisfy-
ever, the average precision of PCA_Kmeans only reach ing in some cases since it often have noisy result of not
(78.3%, 77.4%), (73.2%, 72.9%), (76.8%, 77.9%). This is considering the spatial relationship among image pixels.
FIGURE 8. Registration examples on two typical image pairs from dataset (II). (iii) Guizou; (iv) Hunan. Left: Image pair It and It acquired over the
1 2
same geographical area at two different time t1 and t2 by Chinese GF1 and Chinese GF2 respecively respectively. Right: The first column until the end
show the registration results of SIFT, SURF, CPD, GLMDTPS, ZGL_CATE and Ours. For each method, the first row shows 5 × 5 checkboard and the second
row shows the transformed image It .
FIGURE 9. Registration examples on two typical image pairs from dataset (III). (v) Sichuan; (vi) GuiZhou. Left: Image pair It and It acquired over the
1 2
same geographical area at two different time t1 and t2 by small UAV. Right: The first column until the end show the registration results of SIFT, SURF,
CPD, GLMDTPS, ZGL_CATE and Ours. For each method, the first row shows 5 × 5 checkboard and the second row shows the transformed image It .
Moreover, SSFA and Semi_FCM can achieve better perfor- from the multi-temporal images and transform the data into a
mance. Since SSFA employs the slow feature analysis (SFA) new feature space, DI can be better generated. The compared
algorithm to extract the most temporally invariant component methods in terms of PCA_Kmeans and SSFA, Semi_FCM
FIGURE 10. Change detection examples on two typical image pairs from dataset (I). (i) LakeOroumeih; (ii) Bastrop. (i) Yanan, Sichuan Province;
(ii) Ansun, Guizhou Province. Left: It0 and It0 is the division of It and It according to a certain ratio. Right: The first column until the end show the
1 2 2
change detection results of PCK_Kmeans, SSFA, Ground Turth and Ours. (i) PCK_Kmeans (TP:23; FP:8; FN:7; Precision:76.7%; Recall: 75.2%), SSFA
(TP:25; FP:5; FN:5; Precision:83.3%; Recall: 83.3%), LEGS (TP:25; FP:3; FN:5; Precision:89.2%; Recall:83.3%), Semi_FCM (TP:24; FP:5; FN:6;
Precision:82.7%; Recall: 80.0%), Ours (TP:28; FP:0; FN:2; Precision:93.3%; Recall: 100%). (ii) PCK_Kmeans (TP:22; FP:7; FN:8; Precision:73.3%;
Recall: 75.9%), SSFA (TP:27; FP:4; FN:3; Precision: 90.0%; Recall: 87.1%), LEGS (TP:26; FP:3; FN:4; Precision:89.6%; Recall:86.7%),
Semi_FCM (TP:28; FP:4; FN:2; Precision:87.5%; Recall: 93.3%), Ours (TP:29; FP:2; FN:1; Precision:96.7%; Recall: 93.4%).
FIGURE 11. Change detection examples on two typical image pairs from dataset (II). (iii) Guizou; (iv) Hunan. Left: It0 and It0 is the division of It and It
1 2 2
according to a certain ratio. Right: The first column until the end show the change detection results of PCK_Kmeans, SSFA, Ground Turth and Ours.
(iii) PCK_Kmeans (TP:21; FP:5; FN:9; Precision:70.0%; Recall: 80.8%), SSFA (TP:24; FP:2; FN:6; Precision:80.0%; Recall: 92.3%), LEGS (TP:24; FP:2; FN:6;
Precision:80.0%; Recall: 92.3%), Semi_FCM (TP:26; FP:4; FN:4; Precision:86.7%; Recall: 86.7%), Ours (TP:29; FP:1; FN:1; Precision:96.7%; Recall: 96.7%).
(iv) PCK_Kmeans (TP:23; FP:3; FN:7; Precision:76.7%; Recall: 88.4%), SSFA (TP:25; FP:6; FN:5; Precision: 83.3%; Recall: 80.6%), LEGS (TP:26; FP:3; FN:4;
Precision:89.6%; Recall:86.7%), Semi_FCM (TP:26; FP:4; FN:4; Precision:86.7%; Recall: 86.7%), Ours (TP:29; FP:2; FN:1; Precision:96.7%; Recall: 93.3%).
FIGURE 12. Change detection examples on two typical image pairs from dataset (III). (v) Sichuan; (vi)GuiZhou. Left: It0 and It0 is the division of It and
1 2
It according to a certain ratio. Right: The first column until the end show the change detection results of PCK_Kmeans, SSFA, Ground Turth and Ours.
2
(v) PCK_Kmeans (TP:19; FP:5; FN:11; Precision:63.3%; Recall: 79.1%), SSFA (TP:23; FP:2; FN:7; Precision:76.7%; Recall: 92.0%), LEGS (TP:27; FP:6; FN:3;
Precision:81.8%; Recall: 90.0%), Semi_FCM (TP:27; FP:6; FN:3; Precision:81.8%; Recall: 90.0%), Ours (TP:29; FP:1; FN:1; Precision:96.7%; Recall: 96.7%).
(vi) PCK_Kmeans (TP:20; FP:3; FN:10; Precision:66.7%; Recall: 86.9%), SSFA (TP:24; FP:5; FN:6; Precision: 80.0%; Recall: 77.4%), LEGS (TP:27; FP:6; FN:3;
Precision:81.8%; Recall: 90.0%), Semi_FCM (TP:28; FP:6; FN:2; Precision:82.3%; Recall: 93.3%), Ours (TP:28; FP:2; FN:2; Precision:93.3%; Recall: 93.3%).
use semi-supervised fuzzy C-means filter the pseudolabels saliency detection, the complex relationship between dif-
from the difference image. Since LEGS can effectively ferent global saliency cues by local estimation and global
capture local contrast, texture and shape information for search. Therefore, LEGS also achieve better performance.
IV. CONCLUSION [7] K. Yang, Z. Yu, Y. Luo, Y. Yang, L. Zhao, and X. Zhou, ‘‘Spatial and tempo-
In this paper, a robust set of change detection framework ral variations in the relationship between lake water surface temperatures
and water quality—A case study of Dianchi Lake,’’ Sci. Total Environ.,
for land cover change in mountainous terrain is proposed, vol. 624, pp. 859–871, May 2018.
which can detect multi-temporal remote sensing image pairs [8] J. D. T. De Alban, G. M. Connette, P. Oswald, and E. L. Webb, ‘‘Combined
acquired by different type of sensors. The superiority of our Landsat and L-band SAR data improves land cover classification and
change detection in dynamic tropical landscapes,’’ Remote Sens., vol. 10,
framework can be summarized through three main contri- no. 2, p. 306, 2018.
butions as follows: 1) a multi-scale feature description is [9] Z. Wei et al., ‘‘A small UAV based multi-temporal image registration for
generated using layers formed via a pretrained VGG net- dynamic agricultural terrace monitoring,’’ Remote Sens., vol. 9, no. 9,
p. 904, 2017.
work; 2) a gradually increasing selection of inliers is realized [10] K. Yang, A. Pan, Y. Yang, S. Zhang, S. H. Ong, and H. Tang, ‘‘Remote
to estimate correspondence and transformations; 3) fuzzy sensing image registration using multiple image features,’’ Remote Sens.,
C-means classifier is adopted to generate a similarity matrix vol. 9, no. 6, p. 581, 2017.
[11] A. S. Milas, K. Arend, C. Mayer, M. A. Simonson, and S. Mackey,
between image pair of geometric correction, deep neural ‘‘Different colours of shadows: Classification of UAV images,’’ Int. J.
networks (DNNs) are applied to directly create a change Remote Sens., vol. 38, nos. 8–10, pp. 3084–3100, 2017.
detection map from pre-processed image pair by bypassing [12] G. Li and Y. Yu, ‘‘Visual saliency detection based on multiscale
deep CNN features,’’ IEEE Trans. Image Process., vol. 25, no. 11,
the steps of filtering or generating a difference image (DI). pp. 5012–5024, Nov. 2016.
The proposed framework can provide a stable change rule for [13] Z. Lv, W. Shi, X. Zhou, and J. A. Benediktsson, ‘‘Semi-automatic system
monitoring land cover change from multi-temporal data. For for land cover change detection using bi-temporal remote sensing images,’’
Remote Sens., vol. 9, no. 11, p. 1112, 2017.
the purpose of experimental evaluation, dataset was mainly [14] T. Celik, ‘‘Unsupervised change detection in satellite images using prin-
collected in the ten key land conservation regions of Sichuan, cipal component analysis and k-means clustering,’’ IEEE Geosci. Remote
Guizhou and Hunan, China. Compared with five state-of- Sens. Lett., vol. 6, no. 4, pp. 772–776, Oct. 2009.
[15] C. Wu, B. Du, and L. Zhang, ‘‘Slow feature analysis for change detection in
the-art registration methods and four state-of-the-art change multispectral imagery,’’ IEEE Trans. Geosci. Remote Sens., vol. 52, no. 5,
detection methods, our method shows better performances in pp. 2858–2874, May 2014.
most cases. [16] H. Lyu, H. Lu, and L. Mou, ‘‘Learning a transferable change rule from a
recurrent neural network for land cover change detection,’’ Remote Sens.,
Future studies will be conducted in two directions: vol. 8, no. 6, p. 506, 2016.
(i) thematic applications of land cover changes, such as cul- [17] H. Zhang, M. Gong, P. Zhang, L. Su, and J. Shi, ‘‘Feature-level change
tivated land changes; (ii) different sourcing images, such as detection using deep representation and feature change analysis for mul-
tispectral imagery,’’ IEEE Geosci. Remote Sens. Lett., vol. 13, no. 11,
image pair of combination of UAV image and satellite remote pp. 1666–1670, Nov. 2016.
sensing image. Indeed, combining different sourcing images [18] E. M. de Oliveira Silveira, J. M. de Mello, F. W. Acerbi, Jr., and
will identify more regions of change in many other typical L. M. T. de Carvalho, ‘‘Object-based land-cover change detection applied
to Brazilian seasonal savannahs using geostatistical features,’’ Int. J.
regions with various land cover types. Remote Sens., vol. 39, no. 8, pp. 2597–2619, 2018.
[19] R. Xiao, R. Cui, M. Lin, L. Chen, Y. Ni, and X. Lin, ‘‘SOMDNCD:
Image change detection based on self-organizing maps and deep neural
networks,’’ IEEE Access, vol. 6, pp. 35915–35925, 2018.
ACKNOWLEDGMENT
[20] B. Uamkasem, H. L. Chao, and B. Jiantao, ‘‘Regional land use dynamic
We are grateful to David G. Lowe, Herbert Bay, monitoring using Chinese GF high resolution satellite data,’’ in Proc. Int.
Andriy Myronenko, Turgay Celik, Chen Wu, Lijun Wang Conf. Appl. Syst. Innov., 2017, pp. 838–841.
and Pan Shao for providing their implementation source [21] A. Myronenko and X. Song, ‘‘Point set registration: Coherent point drift,’’
IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 12, pp. 2262–2275,
codes and test data sets. This greatly facilitated the compar- Dec. 2010.
ison experiments. (Fei Song and Zhuoqian Yang contributed [22] L. Greengard and J. Strain, ‘‘The fast gauss transform,’’ SIAM J. Sci. Stat.
equally to this work.) Comput., vol. 12, no. 1, pp. 79–94, 2006.
[23] I. Markovsky, ‘‘Structured low-rank approximation and its applications,’’
Automatica, vol. 44, no. 4, pp. 891–909, Apr. 2008.
REFERENCES [24] Y. Yang, S. H. Ong, and K. W. C. Foong, ‘‘A robust global and local mixture
distance based non-rigid point set registration,’’ Pattern Recognit., vol. 48,
[1] Y. Wang, F. Zhao, L. Cheng, and K. Yang, ‘‘Framework for monitoring the no. 1, pp. 156–173, 2015.
conversion of cultivated land to construction land using SAR image time [25] J. Ma, J. Zhao, and A. L. Yuille, ‘‘Non-rigid point set registration by pre-
series,’’ Remote Sens. Lett., vol. 6, no. 10, pp. 794–803, 2015. serving global and local structures,’’ IEEE Trans. Image Process., vol. 25,
[2] K. Simonyan and A. Zisserman. (Sep. 2014). ‘‘Very deep convolu- no. 1, pp. 53–64, Jan. 2016.
tional networks for large-scale image recognition.’’ [Online]. Available: [26] K. Yang et al., ‘‘Quake warning funds on shaky ground,’’ Science, vol. 358,
https://arxiv.org/abs/1409.1556 no. 6368, p. 1263, 2017.
[3] Y. Wu, S. Li, and S. Yu, ‘‘Monitoring urban expansion and its effects [27] S. Zhang, Y. Yang, K. Yang, Y. Luo, and S. H. Ong, ‘‘Point set registration
on land use and land cover changes in Guangzhou city, China,’’ Environ. with global-local correspondence and transformation estimation,’’ in Proc.
Monitor. Assessment, vol. 188, no. 1, p. 54, 2016. Int. Conf. Comput. Vis., 2017, pp. 2688–2696.
[4] L. Wang, H. Lu, R. Xiang, and M.-H. Yang, ‘‘Deep networks for saliency [28] S. Zhang, K. Yang, Y. Yang, and Y. Luo, ‘‘Nonrigid image registration for
detection via local estimation and global search,’’ in Proc. IEEE Conf. low-altitude SUAV images with large viewpoint changes,’’ IEEE Geosci.
Comput. Vis. Pattern Recognit., Jun. 2015, pp. 3183–3192. Remote Sens. Lett., vol. 15, no. 4, pp. 592–596, Apr. 2018.
[5] P. Shao, W. Shi, P. He, M. Hao, and X. Zhang, ‘‘Novel approach to [29] S. Zhang, K. Yang, Y. Yang, Y. Luo, and Z. Wei, ‘‘Non-rigid point set regis-
unsupervised change detection based on a robust semi-supervised FCM tration using dual-feature finite mixture model and global-local structural
clustering algorithm,’’ Remote Sens., vol. 8, no. 3, p. 264, 2016. preservation,’’ Pattern Recognit., vol. 80, pp. 183–195, Aug. 2018.
[6] S. A. Azzouzi, A. Vidal-Pantaleoni, and H. A. Bentounes, ‘‘Desertification [30] F. Song, M. Li, Y. Yang, K. Yang, X. Gao, and T. Dan, ‘‘Small UAV
monitoring in Biskra, Algeria, with Landsat imagery by means of super- based multi-viewpoint image registration for monitoring cultivated land
vised classification and change detection methods,’’ IEEE Access, vol. 5, changes in mountainous terrain,’’ Int. J. Remote Sens., vol. 39, no. 21,
pp. 9065–9072, 2017. pp. 7201–7224, 2018.
[31] T, Dan et al., ‘‘Multifeature energy optimization framework and parameter XUEYAN GAO received the B.S. degree from
adjustment-based nonrigid point set registration,’’ J. Appl. Remote Sens., Henan Normal University, China, in 2016. She is
vol. 12, no. 3, pp. 12–27, 2018. currently pursuing the M.S. degree with the School
[32] P. A. Permatasari, A. Fatikhunnada, Liyantono, Y. Setiawan, Syartinilia, of Information Science and Technology, Yunnan
and A. Nurdiana, ‘‘Analysis of agricultural land use changes in Jombang Normal University. Her current research interests
Regency, East Java, Indonesia using BFAST method,’’ Procedia Environ. include image registration, point set registration,
Sci., vol. 33, pp. 27–35, Apr. 2016. pattern recognition, and change detection.
[33] S. Belongie, J. Malik, and J. Puzicha, ‘‘Shape matching and object recog-
nition using shape contexts,’’ IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 24, no. 4, pp. 509–522, Apr. 2002.
[34] J. Bohg and D. Kragic, ‘‘Learning grasping points with shape context,’’
Robot. Auto. Syst., vol. 58, no. 4, pp. 362–377, 2010.
[35] Y. Gu, K. Ren, P. Wang, and G. Gu, ‘‘Polynomial fitting-based shape
matching algorithm for multi-sensors remote sensing images,’’ Infr. Phys.
Technol., vol. 76, pp. 386–392, May 2016.
[36] R. Jonker and A. Volgenant, ‘‘A shortest augmenting path algorithm for
dense and sparse linear assignment problems,’’ Computing, vol. 38, no. 4, TINGTING DAN received the B.S. degree from
pp. 325–340, Nov. 1987. China West Normal University, China, in 2016.
[37] A. L. Yuille and N. M. Grzywacz, ‘‘A mathematical analysis of She is currently pursuing the M.S. degree with the
the motion coherence theory,’’ Int. J. Comput. Vis., vol. 3, no. 2, School of Information Science and Technology,
pp. 155–175, 1989. Yunnan Normal University. Her current research
[38] J. Ma, J. Zhao, J. Tian, A. L. Yuille, and Z. Tu, ‘‘Robust point matching interests include image registration, point set regis-
via vector field consensus,’’ IEEE Trans. Image Process., vol. 23, no. 4, tration, pattern recognition, and change detection.
pp. 1706–1721, Apr. 2014.
[39] J. Ma, J. Zhao, Y. Ma, and J. Tian, ‘‘Non-rigid visible and infrared face
registration via regularized Gaussian fields criterion,’’ Pattern Recognit.,
vol. 48, no. 3, pp. 772–784, 2015.
[40] S. Ji and S. Peng, ‘‘Terminal perturbation method for the backward
approach to continuous time mean–variance portfolio selection,’’ Stochas-
tic Process. Appl., vol. 118, no. 6, pp. 952–967, 2008.
[41] F. L. Bookstein, ‘‘Principal warps: Thin-plate splines and the
decomposition of deformations,’’ IEEE Trans. Pattern Anal. Mach.
Intell., vol. 11, no. 6, pp. 567–585, Jun. 1989. YANG YANG received the master’s degree from
[42] G. E. Hinton, Training Products of Experts by Minimizing Contrastive Waseda University, Japan, in 2007, and the Ph.D.
Divergence. Cambridge, MA, USA: MIT Press, 2002. degree from the National University of Singapore,
[43] G. E. Hinton, ‘‘A practical guide to training restricted Boltzmann Singapore, in 2013. He is currently an Associate
machines,’’ Momentum, vol. 9, no. 1, pp. 599–619, 2010. Professor with the School of Information Science
[44] A. Lozano-Diez, R. Zazo, D. T. Toledano, and J. Gonzalez-Rodriguez, and Technology, Yunnan Normal University. His
‘‘An analysis of the influence of deep neural network (DNN) topology in research interest covers image registration, remote
bottleneck feature based language recognition,’’ PLoS ONE, vol. 12, no. 8, sensing image processing, medical image process-
p. e0182580, 2017. ing, geography information system, and human
[45] D. G. Lowe, ‘‘Distinctive image features from scale-invariant keypoints,’’
masticatory system.
Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
[46] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, ‘‘Speeded-up robust
features (SURF),’’ Comput. Vis. Image Understand., vol. 110, no. 3,
pp. 346–359, 2008.
ZHUOQIAN YANG is currently pursuing the B.S. RUI YU received the B.S. degree from Harbin
degree with the College of Software, Beihang Uni- Huade University, China, in 2017. She is currently
versity. His research interest includes computer pursuing the M.S. degree with the School of Infor-
vision and image registration. mation Science and Technology, Yunnan Normal
University. Her current research interests include
image registration, point set registration, and pat-
tern recognition.