0% found this document useful (0 votes)
27 views11 pages

Liu 2017

CNN FOR HSI

Uploaded by

Bhavatarini Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views11 pages

Liu 2017

CNN FOR HSI

Uploaded by

Bhavatarini Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Remote Sensing Letters

ISSN: 2150-704X (Print) 2150-7058 (Online) Journal homepage: http://www.tandfonline.com/loi/trsl20

A semi-supervised convolutional neural network


for hyperspectral image classification

Bing Liu, Xuchu Yu, Pengqiang Zhang, Xiong Tan, Anzhu Yu & Zhixiang Xue

To cite this article: Bing Liu, Xuchu Yu, Pengqiang Zhang, Xiong Tan, Anzhu Yu & Zhixiang Xue
(2017) A semi-supervised convolutional neural network for hyperspectral image classification,
Remote Sensing Letters, 8:9, 839-848, DOI: 10.1080/2150704X.2017.1331053

To link to this article: http://dx.doi.org/10.1080/2150704X.2017.1331053

Published online: 23 May 2017.

Submit your article to this journal

Article views: 9

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=trsl20

Download by: [The UC San Diego Library] Date: 26 May 2017, At: 05:09
REMOTE SENSING LETTERS, 2017
VOL. 8, NO. 9, 839–848
https://doi.org/10.1080/2150704X.2017.1331053

A semi-supervised convolutional neural network for


hyperspectral image classification
Bing Liu, Xuchu Yu, Pengqiang Zhang, Xiong Tan, Anzhu Yu and Zhixiang Xue
Institute of Surveying and Mapping, Zhengzhou, China

ABSTRACT ARTICLE HISTORY


Convolutional neural network (CNN) for hyperspectral image classifica- Received 16 January 2017
tion can provide excellent performance when the number of labeled Accepted 4 May 2017
samples for training is sufficiently large. Unfortunately, a small number
of labeled samples are available for training in hyperspectral images. In
this letter, a novel semi-supervised convolutional neural network is
proposed for the classification of hyperspectral image. The proposed
network can automatically learn features from complex hyperspectral
image data structures. Furthermore, skip connection parameters are
added between the encoder layer and decoder layer in order to make
the network suitable for semi-supervised learning. Semi-supervised
method is adopted to solve the problem of limited labeled samples.
Finally, the network is trained to simultaneously minimize the sum of
supervised and unsupervised cost functions. The proposed network is
conducted on a widely used hyperspectral image data. The experi-
mental results demonstrate that the proposed approach provides
competitive results to state-of-the-art methods.

1. Introduction
Hyperspectral remote sensing has became a research focus in remote sensing, with the
continuous improvement of the spectral resolution of remote sensors. The classification is
an important research content of hyperspectral image processing and application, and its
purpose is to assign a unique label to each pixel in the images. Hyperspectral images consists
of several hundreds of narrow contiguous wavelength bands and can provide a wealth of
spectral and spatial information for classification. At the same time, the complex structure of
hyperspectral images makes the features extraction difficult. Given the complex data struc-
tures and limited labeled samples, the classification of hyperspectral remote sensing images
still faces great challenges.
In the early stage of hyperspectral image classification, several types of discriminant
functions are applied, such as nearest neighbor, decision trees and linear functions.
However, the main problem of these classic classifiers is their sensitivity to the Hughes effect
(Bioucas-Dias et al. 2013). Then support vector machine (SVM) with kernel methods is intro-
duced to deal with the Hughes phenomenon and becomes the mainstream methods of
classification for a long time (Camps-Valls et al. 2005). Meanwhile, extreme learning machine
(Li et al. 2015), active learning (Sun et al. 2015), sparse representation (Liu et al. 2013) and other
classifiers for hyperspectral image are investigated to get higher classification accuracy.

CONTACT Bing Liu [email protected]


© 2017 Informa UK Limited, trading as Taylor & Francis Group
840 B. LIU ET AL.

Until recently, deep learning-based methods have drawn increasing attention in remote
sensing image analysis (Li et al. 2016). Stacked autoencoder (SAE) is a simple deep learning
method and is firstly introduced into hyperspectral image classification in (Chen et al. 2014).
Later, a series of improved hyperspectral image classification methods based on SAE are
proposed to obtain better performance (Ma, Wang, and Geng 2016). CNN is firstly used to
extract spectral features for hyperspectral image classification which can get better perfor-
mance than SVM (Hu et al. 2015). Then, CNN are used to extract spatial-spectral features for
hyperspectral image classification and get excellent performance (Yue et al. 2015; 2016;
Ghamisi, Chen, and Zhu 2016). In general, there are a large number of parameters to be
tuned in deep learning methods. In this context, the majority of deep learning-based methods
for hyperspectral image classification can only yield promising results when the number of
labeled samples for training is sufficiently large. A virtual sample enhanced method was
proposed to tackle the problem of limited labeled samples (Chen et al. 2016). A pixel-pair
method was also proposed to significantly increase the number of training samples, which
ensures that the advantage of CNN can be offered (Li et al. 2016). Many semi-supervised
algorithms (Tuia and Gustavo 2009; Muoz-Mar et al. 2010) have demonstrated that the use of
unlabeled data is useful to improve classification performance. However, the current hyper-
spectral images classification methods based on deep learning do not take good advantage of
enormous amounts of unlabeled data.
The main goal of this letter is to deal with the complex data structures and limited
labeled samples in hyperspectral images. In more detail, the main contributions of this letter
can be summarized as follows. 1) A CNN architecture is designed to directly extract spatial-
spectral features from the hyperspectral images cube. 2) Ladder network is introduced to
the CNN architecture in order to make the network suitable for semi-supervised learning. 3)
In order to deal with limited labeled samples, the CNN is trained by semi-supervised method
to simultaneously minimize the sum of supervised and unsupervised cost functions.

2. Proposed semi-supervised CNN


The proposed semi-supervised CNN illustrated in Figure 1 composes of the clean encoder, the
corrupted encoder and the decoder. Gaussian noise is added to each layer of the corrupted
encoder in order to make the model learn to denoise. The goal of decoder is to estimate the
denoised version of the corrupted encoder by minimizing the difference with the clean
encoder. The clean encoder and corrupted encoder shared the weights. Furthermore, batch
normalization (Ioffe and Szegedy 2015) is applied to each preactivation including the topmost
layer in our network to accelerate the convergence and improve the classification accuracy.
In Figure 1, x represents the original input hyperspectral signal. ~x is the corrupted version of
x. ^x is the reconstruction signal of x. zðlÞ is the variable value of the clean layer l. ~zðlÞ is the
variable value of the corrupted layer l. ~zðlÞ is the variable value of the decoder layer l. y~ is the
output label of the corrupted encoder. y is the output label of the clean encoder. convðÞ is
the convolution function of the convolutional layer. poolingðÞ is the pooling function of the
ðlÞ
pooling layer. f ðÞ is the convolution function of the fully connected layer. Cd is the
unsupervised cost of each layer. And gðlÞ ðxÞ is denosing function of the decoder layer l. The
details of training and classification associated with the network will be introduced below.

2.1. Convolutional, pooling and fully connected layer for encoder


Formally, each layer in the clean encoder is formulated as
REMOTE SENSING LETTERS 841

Figure 1. Architecture of the semi-supervised CNN, ~x ! ~zð1Þ ! ~zð2Þ ! ~zð3Þ ! ~y is the corrupted
encoder, x ! zð1Þ ! zð2Þ ! zð3Þ ! y is the clean encoder, ~y ! ^zð3Þ ! ^zð2Þ ! ^zð1Þ ! ^x is the decoder.

zðlÞ ¼ NB ðW ðlÞ hðl1Þ Þwhere : l ¼ 1; 2; 3; 4


(1)
hðlÞ ¼ ϕðγðlÞ ðzðlÞ þ βðlÞ ÞÞwhere : l ¼ 1; 2; 3

In equation (1), hð0Þ ¼ x, y ¼ zð4Þ , NB ðxi Þ ¼ ðxi  μ ^xi is the component-wise batch normal-
^xi Þ=σ
ðlÞ ðl1Þ ^ ^xi are the mean and standard deviation of the
ization. xi is the component of W h . μxi and σ
minibatch respectively. W ðlÞ is the weight matrix between the layer l and the layer l  1. γðlÞ
and βðlÞ are the trainable parameters. ϕðxÞ is the softmax activation function for the output
layer and ϕðxÞ is the rectified linear unit (ReLU) activation function for other layers.
We choose K  K  B neighborhoods of a pixel as the input of the network where B is the
number of hyperspectral image bands. W ð1Þ is a 3  3  B1 kernel with a stride of 1 for the
convolutional layer. W ð2Þ is a 3  3  B1 kernel with a stride of 2 for the pooling layer. B1 is the
number of the ouput bands for the convolutional and pooling layer. The pooling process is
achieved by convolution with W ð2Þ in order to reduce the dimensionality of intermediate
representations. The features need to be flattened to connect with the fully connected layer
after pooling layer.
The corrupted encoder is formulated similar as the clean encoder. In more detail,
each layer in the corrupted encoder is formulated as
ðlÞ
~zpre ~ðl1Þ where : l ¼ 1; 2; 3; 4
¼ W ðlÞ h
ðlÞ
~zðlÞ ¼ NB ð~zpre Þ þ nðlÞ where : l ¼ 1; 2; 3 (2)
ðlÞ
~ ¼ ϕðγðlÞ ð~zðlÞ þ βðlÞ ÞÞwhere : l ¼ 1; 2; 3
h
~ð0Þ ¼ ~x ¼ x þ nð0Þ , nðlÞ ,Nð0; σ2 Þ ðl ¼ 0; 1; 2; 3Þ is the Gaussian noise, y
In which, h ~¼
ð4Þ
NB ð~zpre Þ and other parameters are the same as for the clean encoder. We need to collect
ðlÞ
~zpre to calculate the unsupervised cost.

2.2. Ladder network and decoder


In general deep learning methods, unsupervised learning is only applied as pre-training,
followed by normal supervised learning. And reconstruction of the inputs zðlÞ based on
^zðlþ1Þ at every level of the network is a general choice for unsupervised learning. Each
842 B. LIU ET AL.

layer is trained by minimizing of the difference between zðlÞ and ^zðlÞ . The reconstruction
^zðlÞ is calculated based on ^zðlþ1Þ ,(^zðlÞ ¼ gð^zðlþ1Þ Þ, gðxÞ is the reconstruction function). ^zðlþ1Þ
need to reserve lots of details to reconstruct ^zðlÞ with small errors. However, supervised
training make the network focus on classification. This is the contradiction between
supervised learning and unsupervised learning, namely that unsupervised learning
requires the retention of sufficient detail information to reconstruct the original obser-
vation and supervised learning only requires the retention of useful information for
classification(Rasmus et al. 2015).
Ladder network proposed by Valpola (Valpola 2015) adds a skip connection between
each layer of the encoder and the decoder. This skip connection means that the
reconstruction ^zðlÞ is calculated based on ^zðlþ1Þ and ~zðlÞ ,(^zðlÞ ¼ gð^zðlþ1Þ ; ~zðlÞ Þ,gðxÞ can be
treated as the denoising function). This serves three purposes. Firstly, it allow the
networks to focus on abstract invariant features on the higher levels (Rasmus, Raiko,
and Valpola 2014). Secondly, it makes the network more robust for noise by learning the
denoising function. Thirdly, such skip connections makes it possible for higher levels of
the network to leave some of the details for lower levels to represent and makes the
network a good fit with semi-supervised learning (Rasmus et al. 2015a; Pezeshki et al.
2016; Rasmus et al. 2015b). All above are helpful to improve the classification accuracy,
so we apply ladder network to our network.
The added noise subject to Gaussian distribution, so we follow Rasmus (Rasmus et al.
2015) and choose the parametrization that supports the optimal denosing of Gaussian
latent variables. The corrupted layer variable value ~zðlÞ has the form ~zðlÞ ¼ zðlÞ þ nðlÞ . zðlÞ is
the variable value of the clean encoder layer l and has a Gaussian distribution with
variance σ2z . nðlÞ is the Gaussian noise with variance σ2n . The goal of denoising function is
to learn to estimate ^zðlÞ from ~zðlÞ by minimizing the difference between ^zðlÞ and zðlÞ . When
the functional form of ^zðlÞ ¼ gðzðlÞ Þ is linear, the denoising cost can be minimized.
Specifically, the result can be described by a weighted vðlÞ and a prior μðlÞ , as shown
in equation (3). vðlÞ and μðlÞ are denoising parameters to be trained.

^zðlÞ ¼ gð~zðlÞ Þ ¼ vðlÞ~zðlÞ þ ð1  vðlÞ ÞμðlÞ ¼ ð~zðlÞ  μðlÞ ÞvðlÞ þ μðlÞ


(3)
where : vðlÞ ¼ σ2z =ðσ2z þ σ2n Þ

Furthermore, we assume that the latent variables are independent conditional on the
latent variables of the layer above. The final formulation of the denosing function is
shown in equation (4), where V ðlþ1Þ is the weight matrix between the layer l þ 1 and the
ðlÞ ðlÞ ðlÞ
layer l of the decoder and has the same dimension as the transpose of W ðlþ1Þ , ^zi ; ~zi ; ui
are the component of ^zðlÞ ; ~zðlÞ ; uðlÞ , respectively. μi ðÞ and vi ðÞ are functions of uðlÞ .
ðlÞ ðlÞ
sigmoidðxÞ ¼ ð1 þ ex Þ1 is the sigmoidal function, and a1;i ; . . . ; a10;i are the skip con-
nection parameters to be trained.

^ziðlÞ ¼ gi ð~ziðlÞ ; uðlÞ


i Þ ¼ ð~
ðlÞ ðlÞ ðlÞ ðlÞ
zi  μi ðui ÞÞvi ðui Þ þ μi ðui Þ
ðlþ1Þ
where : uðlÞ ¼ NB ðV ðlþ1Þ^z Þ
ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ (4)
μi ðui Þ ¼ a1;i sigmoidða2;i ui þ a3;i Þ þ a4;i ui þ a5;i
ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ
vi ðui Þ ¼ a6;i sigmoidða7;i ui þ a8;i Þ þ a9;i ui þ a10;i
REMOTE SENSING LETTERS 843

2.3. Semi-supervised learning


The limited availability of labeled training samples is the most challenging for supervised
hyperspectral images classification, since the collection of labeled samples is generally
difficult, expensive and time-consuming in hyperspectral images. However, there are lots
of unemployed unlabeled samples to be classified. The contradiction between supervised
learning and unsupervised learning makes that supervised learning and unsupervised
learning could not be well integrated. In this context, unlabeled samples are only used
for pre-training in conventional deep learning methods. The pressure to represent details
in the higher layers of the model is relieved by ladder network, as the decoder can recover
any details discarded by the encoder through the skip connections between the encoder
and the decoder. Ladder network makes the unsupervised learning suitable for supervised
learning. So a semi-supervised learning stratery is adopted to train the parameters in order
to take advantage of a great number of unlabeled samples. The unsupervised learning
targets on every layer of our network can greatly improve the hyperspectral image
classification accuracy which will be demonstrated in the experiments.
The final cost to be optimized consists of the supervised cost and the unsupervised
cost. Cc as shown in equation (5) is the supervised cost. The supervised cost is the average
negative log probability of the corrupted output ~y matching the target tn .
1X N
Cc ¼  logðPð~y ¼ tn jxn ÞÞ (5)
N n¼1
ðlÞ
The unsupervised cost Cd is the squared error of the difference between zðlÞ and ^zðlÞ . The
final unsupervised cost is the sum of each layer as shown in equation (6).
X
L
ðlÞ
XL
λl X N
ðlÞ
Cd ¼ λ l Cd ¼ jjzðlÞ ðnÞ  ^zBN ðnÞjj2 (6)
l¼0 l¼0
Nm l n¼1
ðlÞ
In equation (6), ^zBN ðnÞ ¼ ðzðlÞ  μÞ=σ, μ and σ are the batch mean and batch standard
ðlÞ
deviation of the ~zpre , N is the number of samples, ml is the number of nodes in each
layer, λl is the unsupervised loss weight for each layer. The final cost is the sum of the
supervised cost and the unsupervised cost.
ðlÞ ðlÞ
The parameters including W ðlÞ , ðlÞ , ðlÞ , V ðlÞ , a1;i ; . . . ; a10;i are trained by backpropagation to
optimize the total cost C ¼ Cc þ Cd. The flowchart of the proposed semi-supervised CNN is
shown in Figure 2. In more detail, 100 labeled samples and 100 unlabeled samples are input to
our network in each training process. The labeled samples are used to compute the supervised
cost and the unlabeled samples are used to compute the unsupervised cost. Then the
parameters are updated by backpropagation. All unlabeled samples are input to the clean
encoder network to get the label, namely, the classification results after the network is fully
trained. Then the classification results are matched with the ground truth map to evaluate the
classification accuracy.

3. Experimental results
The proposed semi-supervised CNN (SS-CNN) is implemented using TensorFlow library.
The results are generated on a PC equipped with an Intel Core i7-5700HQ with 2.7GHz and
a Nvidia GeForce GTX 970M. The PC’s memory is 32G. The University of Pavia scenes data
set consisting of 103 spectral bands with 610  340 pixels is employed to evaluate the
844 B. LIU ET AL.

Labelled samples Corrupted Unsupervised


9×9×B encoder cost

Total Update
Back
cost propagation parameters
Unlabelled samples Corrupted encoder,
Supervised
clean encoder
9×9×B and decoder cost

Figure 2. Flowchart of the proposed semi-supervised CNN.

performance of SS-CNN. There are 42,776 labeled pixels with nine classes in the University
of Pavia data set. 200 labeled samples per class are randomly selected for supervised
training and all the other samples without label are used to test. The number of labeled
training samples and testing samples are listed in Table 1. The results using different
proportion of unlabeled data are shown in Figure 3, which demonstrats that the use of
enormous amount of unlabeled samples can improve the classification accuracy. Although
using a small amount of unlabeled samples will lead to overfitting, which makes the
classification accuracy lower. So we ultilize all unlabeled samples to train the network.
The input of the SS-CNN for the university of Pavia data set is 9  9  103 (K ¼ 9; B ¼ 103)
neighborhoods of a pixel. Selecting larger neighborhoods as the input can get better accuracy.
However, it need more time to train the network with the larger neighborhoods as the input.
The results with different neighborhoods size are listed in Table 2. Considering that there is
strong correlation between differnet bands of hyperspectral image and the labeled samples
are limited, the value of B1 is set to be 80 smaller than the value of B in order to decrease the
number of parameters. In addition, we experiment with different values of B1 , and find that the
effect of B1 on classification accuracy is relatively small when the value of B1 (e.g.
60,80,120,200) is large enough. σ2 (the variance of the Guassian noise) is set to be 0.01. We
also test with various σ2 , e.g., 0.5, 0.1, 0.05, 0.01, 0.005, 0.001, and the corresponding accuracy
(%) are 85.43, 90.14, 97.87,98.32, 98.29,98.31, respectively. It is found that a larger σ2 can reduce
the classification accuracy and σ2 shuold be set relatively smaller than the value of pixel in
hyperspectral image. Note that the hyperspectral image is scaled to 0 and 1 before training the
network. The learning rate is set to 0.001 empirically. The number of epochs is set to be 40. λl is
the weight of unsupervised loss for different layers. We fix λ2 ¼ 1:0; λ3 ¼ 1:0; λ4 ¼ 1:0 and
increase the value of λ1 (e.g. 1.0, 10.0, 100.0). The classification accuaracy (e.g. 96.81%, 97.47%,
97.96%) increases with the increase of λ1 , which reveals that the weight of unsupervised loss
for lower layer has greater contribution to the performance of classification. Consequently, we
set λ1 ¼ 10; λ2 ¼ 1; λ3 ¼ 0:1; λ4 ¼ 0:1 and obtain promising results. The classification accu-
racy of SS-CNN with batch normalization and without batch normalization are shown in
Figure 4, which demonstrate that using batch normalization can accelerate the convergence
and improve the classification accuracy.

Table 1. Number of labeled training samples and testing samples used in


the university of Pavia data set.
Class no. Class name No. of training samples No. of testing samples
1 Asphalt 200 6431
2 Meadows 200 18449
3 Gravel 200 1899
4 Trees 200 2864
5 Sheets 200 1145
6 Bare Soil 200 4829
7 Bitumen 200 1130
8 Bricks 200 3482
9 Shadows 200 747
Total 1800 40976
REMOTE SENSING LETTERS 845

Figure 3. Classification accuracy for different proportion of unlabeled data.

Table 2. Classification accuracy (%) of different neighborhoods size.


Size (pixel) 77 99 11  11 13  13 15  15 17  17
Accuracy (%) 97.96 98.32 98.51 98.79 98.97 99.12
Time (s) 858.1 1220.2 1383.6 1532.0 1740.4 2565.2

To demonstrate the effectiveness of SS-CNN, we compare with several traditional classifiers


such as support vector machine (SVM), spectral-spatial classification(ISODATA-SVM)(Yuliya,
Benediktsson, and Chanussot 2009), CNN (Hu et al. 2015), CNN with pixel-pair features (CNN-
PPF) (Li et al. 2016). The number of labeled training samples (200 samples per class) and testing
samples for different classifiers is the exactly same. The gamma (spread of the RBF kernel) and c
(parameter that controls the amount of penalty during the SVM optimization) for SVM are set
to be 2 and 256, respectively. For the ISODATA algorithm, Cmin is set to be 9 and Cmax is set to be
10, where Cmin and Cmax are the lower and upper bound of the number of classes, respectively.
The compared results of overall accuracy(OA) and individual classification accuracy are listed in
Table 3. Figure 5(a) shows the ground-truth map of the university of Pavia data set. Figure 5(b)
shows the classification map obtained by SS-CNN. It is obvious that SS-CNN obtains best
classification accuracy. Traditional classifiers for hyperspectral image only use spectral informa-
tion. However, hyperspectral image can provide a wealth of spectral and spatial information
for classification. The SS-CNN can extract abstract and invariant spectral-spatial feature from
the hyperspectral data cube. Even, the supervised CNN(S-CNN) of our network without ladder
network can get higher accuracy than the traditional classifier, such as SVM, CNN (Hu et al.
2015). The supervised CNN with ladder network (S-CNN-LN) allow the network to focus on
abstract invariant features on the higher levels, which lead to higher classification accuracy. But
the OA of S-CNN and S-CNN-LN are lower than the CNN-PPF and ISODATA-SVM with the
limited labeled samples. CNN-PPF can deal with the challenge of limited labeled samples by
pixel-pair features and outperform the traditional methods. SS-CNN utilizes the unlabeled
samples to deal with the challenge of limited labeled samples and perform better than the
CNN-PPF and ISODATA-SVM. This demonstrates that the unsupervised learning targets on
every layer of our network can greatly improve the hyperspectral image classification accuracy.
Figure 6 shows the results of SS-CNN with different number of labeled training samples per
class, which demonstrate the effective performance of the SS-CNN with limited labeled
samples.
846 B. LIU ET AL.

Figure 4. Classification accuracy for different numbers of training epochs, BN denotes SS-CNN with
batch normalization, Non-BN denotes SS-CNN without batch normalization.

Training and testing time of different classifiers are shown in Table 4. Batch normalization
can accelerate the convergence. So the number of total trainable parameter is set to be 40 
41 in SS-CNN. The number of total trainable parameter is set to be 81,408 and 629,648 in CNN
and CNN-PPF, respectively. Consequently the training time of SS-CNN is much less than that of
CNN and CNN-PPF. However, due to use the enormous amount of unlabeled samples, training
the network is more time-consuming than SVM and ISODATA-SVM. Note that the training time

Table 3. Class-specific accuracy (%) and overall accuracy (OA)with different techniques.
Class no. SVM CNN S-CNN S-CNN-LN ISODATA-SVM CNN-PPF SS-CNN
1 86.46 88.38 85.15 87.02 94.16 97.42 97.16
2 90.17 91.27 96.51 97.30 97.62 95.76 98.72
3 85.04 85.88 92.71 93.76 84.33 94.05 96.86
4 96.64 97.24 97.81 98.17 94.88 97.52 99.25
5 99.78 99.91 100.0 100.0 97.92 100.0 100.0
6 94.89 96.41 95.17 95.70 95.27 99.13 98.59
7 95.19 93.62 91.88 92.93 98.72 96.19 98.80
8 85.36 87.45 88.02 89.27 97.94 93.62 96.88
9 99.89 99.57 99.58 99.68 99.89 99.60 100.0
OA 90.62 92.27 93.88 94.72 96.08 96.48 98.32

Background

Asphalt

Meadows

Gravel

Trees

Sheets

Bare Soil

Bitumen

Bricks

Shadows

0 100 200
m

(a) (b)

Figure 5. Experiment of the university of Pavia data set. (a) Ground-truth map (b) Classification map
obtained by SS-CNN.
REMOTE SENSING LETTERS 847

Figure 6. Classification accuracy for different methods with different number of labeled samples per class.

Table 4. Training and testing time of different classifiers.


Classifiers SVM CNN CNN-PPF ISODATA-SVM SS-CNN
Training time(s) 0.1 2153.0 14040.7 47.7 + 0.1 1220.2
Testing time(s) 1.4 0.4 16.9 1.9 1.6

of ISODATA-SVM consists of training SVM and clustering by ISODATA. Due to enormous


amount of parameters, the testing time of SS-CNN is more than CNN and SVM. The final
label is determined via a majority voting strategy in CNN-PPF and ISODATA-SVM. So the test of
CNN-PPF and ISODATA-SVM is more time-consuming than SS-CNN.

4. Conclusion
In this letter, a semi-supervised convolutional neural network is constructed for hyper-
spectral image classification. The experimental results demonstrate that the convolution
neural network with ladder network can effectively extract the spatial-spectral features
from the original hyperspectral image cube, and the semi-supervised training strategy
using the enormous amount of unlabeled samples can improve classification accuracy
even with a small number of labeled samples for training. However, due to use the
enormous amount of unlabeled samples, training the network is time-consuming.

Acknowledgement
We thank Prof. Paolo Gamba for providing the Pavia data set.

Funding
This work was supported by the [State Key Laboratory of Geo-information Engineering] under
Grant [SKLGIE2015-M-3-1, SKLGIE2015-M-3-2]; [National Natural Science Foundation of China]
under Grant [41201477]; and [Scientific and Technological Project in Henan Province] under
Grant [152102210014].

References
Bioucas-Dias, J. M., A. Plaza, G. Camps-Valls, P. Scheunders, N. Nasrabadi, and J. Chanussot. 2013.
“Hyperspectral Remote Sensing Data Analysis and Future Challenges.” IEEE Geoscience and
Remote Sensing Magazine 1 (2): 6–36. doi:10.1109/MGRS.2013.2244672.
848 B. LIU ET AL.

Camps-Valls, G., and L. Bruzzone. 2005. “Kernel-Based Methods for Hyperspectral Image Classification.”
IEEE Transactions on Geoscience and Remote Sensing 43 (6): 1351–1362. doi:10.1109/TGRS.2005.846154.
Chen, Y., H. Jiang, L. Chunyang, X. Jia, and P. Ghamisi. 2016. “Deep Feature Extraction and Classification
of Hyperspectral Images Based on Convolutional Neural Networks.” IEEE Transactions on Geoscience
and Remote Sensing 54 (10): 6232–6251. doi:10.1109/TGRS.2016.2584107.
Chen, Y., Z. Lin, X. Zhao, G. Wang, and G. Yanfeng. 2014. “Deep Learning-Based Classification of
Hyperspectral Data.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing 7 (6): 2094–2097. doi:10.1109/JSTARS.2014.2329330.
Ghamisi, P., Y. Chen, and X. X. Zhu. 2016. “A Self-Improving Convolution Neural Network for the
Classification of Hyperspectral Data.” IEEE Geoscience and Remote Sensing Letters 13 (10): 1537–
1541. doi:10.1109/LGRS.2016.2595108.
Hu, W., Y. Huang, L. Wei, F. Zhang, and H. Li. 2015. “Deep Convolutional Neural Networks for
Hyperspectral Image Classification.” Journal of Sensors 2015: 1–12. doi:10.1155/2015/258619.
Ioffe, S., and C. Szegedy. 2015. “Batch Normalization: Accelerating Deep Network Training by
Reducing Internal Covariate Shift.” Arxiv Preprint Arxiv:1502.03167 2015 37: 448–456.
Li, W., C. Chen, S. Hongjun, and D. Qian. 2015. “Local Binary Patterns and Extreme Learning
Machine for Hyperspectral Imagery Classification.” IEEE Transactions on Geoscience and Remote
Sensing 53 (7): 1–13. doi:10.1109/TGRS.2014.2381602.
Li, W., W. Guodong, F. Zhang, and D. Qian. 2016. “Hyperspectral Image Classification Using Deep Pixel-
Pair Features.” IEEE Transactions on Geoscience and Remote Sensing. doi:10.1109/TGRS.2016.2603190.
Liu, J., W. Zebin, Z. Wei, L. Xiao, and L. Sun. 2013. “Spatial-Spectral Kernel Sparse Representation for
Hyperspectral Image Classification.” IEEE Journal of Selected Topics in Applied Earth Observations
and Remote Sensing 6 (6): 2462–2471. doi:10.1109/JSTARS.2013.2252150.
Ma, X., H. Wang, and J. Geng. 2016. “Spectralspatial Classification of Hyperspectral Image Based on Deep
Auto-Encoder.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9 (9):
4073-4085.
Muoz-Mar, J., F. Bovolo, L. Gmez-Chova, L. Bruzzone, and G. Camp-Valls. 2010. “Semisupervised
One-Class Support Vector Machines for Classification of Remote Sensing Data.” IEEE Transactions
on Geoscience and Remote Sensing 48 (8): 3188–3197.
Pezeshki, M., L. Fan, P. Brakel, A. Courville, and Y. Bengio. 2016. “Deconstructing the Ladder
Network Architecture.” In editors C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R.
Garnett International Conference on Machine Learning. 2368–2376.
Rasmus, A., T. Raiko, and H. Valpola. 2014. “Denoising Autoencoder with Modulated Lateral
Connections Learns Invariant Representations of Natural Images.” Arxiv Preprint Arxiv 31(4): 55-63.
Rasmus, A., H. Valpola, and T. Raiko. 2015a. “Lateral Connections in Denoising Autoencoders
Support Supervised Learning.” Computer Science 31 (4): 555–563.
Rasmus, A., M. Berglund, M. Honkala, H. Valpola, and T. Raiko. 2015b. “Semi-Supervised Learning
with Ladder Networks.” In Advances in Neural Information Processing Systems, Montreal, Canada:
Curran Associates, Inc. 3546–3554.
Sun, S., P. Zhong, H. Xiao, and R. Wang. 2015. “Active Learning with Gaussian Process Classifier for
Hyperspectral Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 53 (4):
1746–1760. doi:10.1109/TGRS.2014.2347343.
Tuia, D., and C.-V. Gustavo. 2009. “Semisupervised Remote Sensing Image Classification with Cluster
Kernels.” IEEE Geoscience and Remote Sensing Letters 6 (2): 224–228. doi:10.1109/LGRS.2008.2010275.
Valpola, H. 2015. “From Neural PCA to Deep Unsupervised Learning.” Adv. in Independent
Component Analysis and Learning Machines 2015: 143–171.
Yue, J., S. Mao, and L. Mei. 2016. “A Deep Learning Framework for Hyperspectral Image Classification Using
Spatial Pyramid Pooling.” Remote Sensing Letters 7 (9): 875–884. doi:10.1080/2150704X.2016.1193793.
Yue, J., W. Zhao, S. Mao, and H. Liu. 2015. “Spectral-Spatial Classification of Hyperspectral Images
Using Deep Convolutional Neural Networks.” Remote Sensing Letters 6 (6): 468–477. doi:10.1080/
2150704X.2015.1047045.
Yuliya, T., J. A. Benediktsson, and J. Chanussot. 2009. “Spectral-Spatial Classification of
Hyperspectral Imagery Based on Partitional Clustering Techniques.” IEEE Transactions on
Geoscience and Remote Sensing 47 (8): 2973–2987. doi:10.1109/TGRS.2009.2016214.

You might also like