0% found this document useful (0 votes)

27 views11 pages

Liu 2017

CNN FOR HSI

Uploaded by

Bhavatarini Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views11 pages

Liu 2017

CNN FOR HSI

Uploaded by

Bhavatarini Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Remote Sensing Letters

ISSN: 2150-704X (Print) 2150-7058 (Online) Journal homepage: http://www.tandfonline.com/loi/trsl20

A semi-supervised convolutional neural network

for hyperspectral image classification

Bing Liu, Xuchu Yu, Pengqiang Zhang, Xiong Tan, Anzhu Yu & Zhixiang Xue

To cite this article: Bing Liu, Xuchu Yu, Pengqiang Zhang, Xiong Tan, Anzhu Yu & Zhixiang Xue
(2017) A semi-supervised convolutional neural network for hyperspectral image classification,
Remote Sensing Letters, 8:9, 839-848, DOI: 10.1080/2150704X.2017.1331053

To link to this article: http://dx.doi.org/10.1080/2150704X.2017.1331053

Published online: 23 May 2017.

Submit your article to this journal

Article views: 9

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=trsl20

Download by: [The UC San Diego Library] Date: 26 May 2017, At: 05:09
REMOTE SENSING LETTERS, 2017
VOL. 8, NO. 9, 839–848
https://doi.org/10.1080/2150704X.2017.1331053

A semi-supervised convolutional neural network for

hyperspectral image classiﬁcation
Bing Liu, Xuchu Yu, Pengqiang Zhang, Xiong Tan, Anzhu Yu and Zhixiang Xue
Institute of Surveying and Mapping, Zhengzhou, China

ABSTRACT ARTICLE HISTORY

Convolutional neural network (CNN) for hyperspectral image classifica- Received 16 January 2017
tion can provide excellent performance when the number of labeled Accepted 4 May 2017
samples for training is sufficiently large. Unfortunately, a small number
of labeled samples are available for training in hyperspectral images. In
this letter, a novel semi-supervised convolutional neural network is
proposed for the classification of hyperspectral image. The proposed
network can automatically learn features from complex hyperspectral
image data structures. Furthermore, skip connection parameters are
added between the encoder layer and decoder layer in order to make
the network suitable for semi-supervised learning. Semi-supervised
method is adopted to solve the problem of limited labeled samples.
Finally, the network is trained to simultaneously minimize the sum of
supervised and unsupervised cost functions. The proposed network is
conducted on a widely used hyperspectral image data. The experi-
mental results demonstrate that the proposed approach provides
competitive results to state-of-the-art methods.

1. Introduction
Hyperspectral remote sensing has became a research focus in remote sensing, with the
continuous improvement of the spectral resolution of remote sensors. The classification is
an important research content of hyperspectral image processing and application, and its
purpose is to assign a unique label to each pixel in the images. Hyperspectral images consists
of several hundreds of narrow contiguous wavelength bands and can provide a wealth of
spectral and spatial information for classification. At the same time, the complex structure of
hyperspectral images makes the features extraction difficult. Given the complex data struc-
tures and limited labeled samples, the classification of hyperspectral remote sensing images
still faces great challenges.
In the early stage of hyperspectral image classification, several types of discriminant
functions are applied, such as nearest neighbor, decision trees and linear functions.
However, the main problem of these classic classifiers is their sensitivity to the Hughes effect
(Bioucas-Dias et al. 2013). Then support vector machine (SVM) with kernel methods is intro-
duced to deal with the Hughes phenomenon and becomes the mainstream methods of
classification for a long time (Camps-Valls et al. 2005). Meanwhile, extreme learning machine
(Li et al. 2015), active learning (Sun et al. 2015), sparse representation (Liu et al. 2013) and other
classifiers for hyperspectral image are investigated to get higher classification accuracy.

CONTACT Bing Liu [email protected]

Until recently, deep learning-based methods have drawn increasing attention in remote
sensing image analysis (Li et al. 2016). Stacked autoencoder (SAE) is a simple deep learning
method and is firstly introduced into hyperspectral image classification in (Chen et al. 2014).
Later, a series of improved hyperspectral image classification methods based on SAE are
proposed to obtain better performance (Ma, Wang, and Geng 2016). CNN is firstly used to
extract spectral features for hyperspectral image classification which can get better perfor-
mance than SVM (Hu et al. 2015). Then, CNN are used to extract spatial-spectral features for
hyperspectral image classification and get excellent performance (Yue et al. 2015; 2016;
Ghamisi, Chen, and Zhu 2016). In general, there are a large number of parameters to be
tuned in deep learning methods. In this context, the majority of deep learning-based methods
for hyperspectral image classification can only yield promising results when the number of
labeled samples for training is sufficiently large. A virtual sample enhanced method was
proposed to tackle the problem of limited labeled samples (Chen et al. 2016). A pixel-pair
method was also proposed to significantly increase the number of training samples, which
ensures that the advantage of CNN can be offered (Li et al. 2016). Many semi-supervised
algorithms (Tuia and Gustavo 2009; Muoz-Mar et al. 2010) have demonstrated that the use of
unlabeled data is useful to improve classification performance. However, the current hyper-
spectral images classification methods based on deep learning do not take good advantage of
enormous amounts of unlabeled data.
The main goal of this letter is to deal with the complex data structures and limited
labeled samples in hyperspectral images. In more detail, the main contributions of this letter
can be summarized as follows. 1) A CNN architecture is designed to directly extract spatial-
spectral features from the hyperspectral images cube. 2) Ladder network is introduced to
the CNN architecture in order to make the network suitable for semi-supervised learning. 3)
In order to deal with limited labeled samples, the CNN is trained by semi-supervised method
to simultaneously minimize the sum of supervised and unsupervised cost functions.

2. Proposed semi-supervised CNN

The proposed semi-supervised CNN illustrated in Figure 1 composes of the clean encoder, the
corrupted encoder and the decoder. Gaussian noise is added to each layer of the corrupted
encoder in order to make the model learn to denoise. The goal of decoder is to estimate the
denoised version of the corrupted encoder by minimizing the difference with the clean
encoder. The clean encoder and corrupted encoder shared the weights. Furthermore, batch
normalization (Ioffe and Szegedy 2015) is applied to each preactivation including the topmost
layer in our network to accelerate the convergence and improve the classification accuracy.
In Figure 1, x represents the original input hyperspectral signal. ~x is the corrupted version of
x. ^x is the reconstruction signal of x. zðlÞ is the variable value of the clean layer l. ~zðlÞ is the
variable value of the corrupted layer l. ~zðlÞ is the variable value of the decoder layer l. y~ is the
output label of the corrupted encoder. y is the output label of the clean encoder. convðÞ is
the convolution function of the convolutional layer. poolingðÞ is the pooling function of the
ðlÞ
pooling layer. f ðÞ is the convolution function of the fully connected layer. Cd is the
unsupervised cost of each layer. And gðlÞ ðxÞ is denosing function of the decoder layer l. The
details of training and classification associated with the network will be introduced below.

2.1. Convolutional, pooling and fully connected layer for encoder

Formally, each layer in the clean encoder is formulated as
REMOTE SENSING LETTERS 841

Figure 1. Architecture of the semi-supervised CNN, ~x ! ~zð1Þ ! ~zð2Þ ! ~zð3Þ ! ~y is the corrupted
encoder, x ! zð1Þ ! zð2Þ ! zð3Þ ! y is the clean encoder, ~y ! ^zð3Þ ! ^zð2Þ ! ^zð1Þ ! ^x is the decoder.

zðlÞ ¼ NB ðW ðlÞ hðl1Þ Þwhere : l ¼ 1; 2; 3; 4

(1)
hðlÞ ¼ ϕðγðlÞ ðzðlÞ þ βðlÞ ÞÞwhere : l ¼ 1; 2; 3

In equation (1), hð0Þ ¼ x, y ¼ zð4Þ , NB ðxi Þ ¼ ðxi μ ^xi is the component-wise batch normal-
^xi Þ=σ
ðlÞ ðl1Þ ^ ^xi are the mean and standard deviation of the
ization. xi is the component of W h . μxi and σ
minibatch respectively. W ðlÞ is the weight matrix between the layer l and the layer l 1. γðlÞ
and βðlÞ are the trainable parameters. ϕðxÞ is the softmax activation function for the output
layer and ϕðxÞ is the rectiﬁed linear unit (ReLU) activation function for other layers.
We choose K K B neighborhoods of a pixel as the input of the network where B is the
number of hyperspectral image bands. W ð1Þ is a 3 3 B1 kernel with a stride of 1 for the
convolutional layer. W ð2Þ is a 3 3 B1 kernel with a stride of 2 for the pooling layer. B1 is the
number of the ouput bands for the convolutional and pooling layer. The pooling process is
achieved by convolution with W ð2Þ in order to reduce the dimensionality of intermediate
representations. The features need to be ﬂattened to connect with the fully connected layer
after pooling layer.
The corrupted encoder is formulated similar as the clean encoder. In more detail,
each layer in the corrupted encoder is formulated as
ðlÞ
~zpre ~ðl1Þ where : l ¼ 1; 2; 3; 4
¼ W ðlÞ h
ðlÞ
~zðlÞ ¼ NB ð~zpre Þ þ nðlÞ where : l ¼ 1; 2; 3 (2)
ðlÞ
~ ¼ ϕðγðlÞ ð~zðlÞ þ βðlÞ ÞÞwhere : l ¼ 1; 2; 3
h
~ð0Þ ¼ ~x ¼ x þ nð0Þ , nðlÞ ,Nð0; σ2 Þ ðl ¼ 0; 1; 2; 3Þ is the Gaussian noise, y
In which, h ~¼
ð4Þ
NB ð~zpre Þ and other parameters are the same as for the clean encoder. We need to collect
ðlÞ
~zpre to calculate the unsupervised cost.

2.2. Ladder network and decoder

In general deep learning methods, unsupervised learning is only applied as pre-training,
followed by normal supervised learning. And reconstruction of the inputs zðlÞ based on
^zðlþ1Þ at every level of the network is a general choice for unsupervised learning. Each
842 B. LIU ET AL.

layer is trained by minimizing of the difference between zðlÞ and ^zðlÞ . The reconstruction
^zðlÞ is calculated based on ^zðlþ1Þ ,(^zðlÞ ¼ gð^zðlþ1Þ Þ, gðxÞ is the reconstruction function). ^zðlþ1Þ
need to reserve lots of details to reconstruct ^zðlÞ with small errors. However, supervised
training make the network focus on classification. This is the contradiction between
supervised learning and unsupervised learning, namely that unsupervised learning
requires the retention of sufficient detail information to reconstruct the original obser-
vation and supervised learning only requires the retention of useful information for
classification(Rasmus et al. 2015).
Ladder network proposed by Valpola (Valpola 2015) adds a skip connection between
each layer of the encoder and the decoder. This skip connection means that the
reconstruction ^zðlÞ is calculated based on ^zðlþ1Þ and ~zðlÞ ,(^zðlÞ ¼ gð^zðlþ1Þ ; ~zðlÞ Þ,gðxÞ can be
treated as the denoising function). This serves three purposes. Firstly, it allow the
networks to focus on abstract invariant features on the higher levels (Rasmus, Raiko,
and Valpola 2014). Secondly, it makes the network more robust for noise by learning the
denoising function. Thirdly, such skip connections makes it possible for higher levels of
the network to leave some of the details for lower levels to represent and makes the
network a good fit with semi-supervised learning (Rasmus et al. 2015a; Pezeshki et al.
2016; Rasmus et al. 2015b). All above are helpful to improve the classification accuracy,
so we apply ladder network to our network.
The added noise subject to Gaussian distribution, so we follow Rasmus (Rasmus et al.
2015) and choose the parametrization that supports the optimal denosing of Gaussian
latent variables. The corrupted layer variable value ~zðlÞ has the form ~zðlÞ ¼ zðlÞ þ nðlÞ . zðlÞ is
the variable value of the clean encoder layer l and has a Gaussian distribution with
variance σ2z . nðlÞ is the Gaussian noise with variance σ2n . The goal of denoising function is
to learn to estimate ^zðlÞ from ~zðlÞ by minimizing the difference between ^zðlÞ and zðlÞ . When
the functional form of ^zðlÞ ¼ gðzðlÞ Þ is linear, the denoising cost can be minimized.
Specifically, the result can be described by a weighted vðlÞ and a prior μðlÞ , as shown
in equation (3). vðlÞ and μðlÞ are denoising parameters to be trained.

^zðlÞ ¼ gð~zðlÞ Þ ¼ vðlÞ~zðlÞ þ ð1 vðlÞ ÞμðlÞ ¼ ð~zðlÞ μðlÞ ÞvðlÞ þ μðlÞ

(3)
where : vðlÞ ¼ σ2z =ðσ2z þ σ2n Þ

Furthermore, we assume that the latent variables are independent conditional on the
latent variables of the layer above. The ﬁnal formulation of the denosing function is
shown in equation (4), where V ðlþ1Þ is the weight matrix between the layer l þ 1 and the
ðlÞ ðlÞ ðlÞ
layer l of the decoder and has the same dimension as the transpose of W ðlþ1Þ , ^zi ; ~zi ; ui
are the component of ^zðlÞ ; ~zðlÞ ; uðlÞ , respectively. μi ðÞ and vi ðÞ are functions of uðlÞ .
ðlÞ ðlÞ
sigmoidðxÞ ¼ ð1 þ ex Þ1 is the sigmoidal function, and a1;i ; . . . ; a10;i are the skip con-
nection parameters to be trained.

^ziðlÞ ¼ gi ð~ziðlÞ ; uðlÞ

i Þ ¼ ð~
ðlÞ ðlÞ ðlÞ ðlÞ
zi μi ðui ÞÞvi ðui Þ þ μi ðui Þ
ðlþ1Þ
where : uðlÞ ¼ NB ðV ðlþ1Þ^z Þ
ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ (4)
μi ðui Þ ¼ a1;i sigmoidða2;i ui þ a3;i Þ þ a4;i ui þ a5;i
ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ ðlÞ
vi ðui Þ ¼ a6;i sigmoidða7;i ui þ a8;i Þ þ a9;i ui þ a10;i
REMOTE SENSING LETTERS 843

2.3. Semi-supervised learning

The limited availability of labeled training samples is the most challenging for supervised
hyperspectral images classification, since the collection of labeled samples is generally
difficult, expensive and time-consuming in hyperspectral images. However, there are lots
of unemployed unlabeled samples to be classified. The contradiction between supervised
learning and unsupervised learning makes that supervised learning and unsupervised
learning could not be well integrated. In this context, unlabeled samples are only used
for pre-training in conventional deep learning methods. The pressure to represent details
in the higher layers of the model is relieved by ladder network, as the decoder can recover
any details discarded by the encoder through the skip connections between the encoder
and the decoder. Ladder network makes the unsupervised learning suitable for supervised
learning. So a semi-supervised learning stratery is adopted to train the parameters in order
to take advantage of a great number of unlabeled samples. The unsupervised learning
targets on every layer of our network can greatly improve the hyperspectral image
classification accuracy which will be demonstrated in the experiments.
The final cost to be optimized consists of the supervised cost and the unsupervised
cost. Cc as shown in equation (5) is the supervised cost. The supervised cost is the average
negative log probability of the corrupted output ~y matching the target tn .
1X N
Cc ¼ logðPð~y ¼ tn jxn ÞÞ (5)
N n¼1
ðlÞ
The unsupervised cost Cd is the squared error of the difference between zðlÞ and ^zðlÞ . The
final unsupervised cost is the sum of each layer as shown in equation (6).
X
L
ðlÞ
XL
λl X N
ðlÞ
Cd ¼ λ l Cd ¼ jjzðlÞ ðnÞ ^zBN ðnÞjj2 (6)
l¼0 l¼0
Nm l n¼1
ðlÞ
In equation (6), ^zBN ðnÞ ¼ ðzðlÞ μÞ=σ, μ and σ are the batch mean and batch standard
ðlÞ
deviation of the ~zpre , N is the number of samples, ml is the number of nodes in each
layer, λl is the unsupervised loss weight for each layer. The final cost is the sum of the
supervised cost and the unsupervised cost.
ðlÞ ðlÞ
The parameters including W ðlÞ , ðlÞ , ðlÞ , V ðlÞ , a1;i ; . . . ; a10;i are trained by backpropagation to
optimize the total cost C ¼ Cc þ Cd. The flowchart of the proposed semi-supervised CNN is
shown in Figure 2. In more detail, 100 labeled samples and 100 unlabeled samples are input to
our network in each training process. The labeled samples are used to compute the supervised
cost and the unlabeled samples are used to compute the unsupervised cost. Then the
parameters are updated by backpropagation. All unlabeled samples are input to the clean
encoder network to get the label, namely, the classification results after the network is fully
trained. Then the classification results are matched with the ground truth map to evaluate the
classification accuracy.

3. Experimental results
The proposed semi-supervised CNN (SS-CNN) is implemented using TensorFlow library.
The results are generated on a PC equipped with an Intel Core i7-5700HQ with 2.7GHz and
a Nvidia GeForce GTX 970M. The PC’s memory is 32G. The University of Pavia scenes data
set consisting of 103 spectral bands with 610 340 pixels is employed to evaluate the
844 B. LIU ET AL.

Labelled samples Corrupted Unsupervised

9×9×B encoder cost

Total Update
Back
cost propagation parameters
Unlabelled samples Corrupted encoder,
Supervised
clean encoder
9×9×B and decoder cost

Figure 2. Flowchart of the proposed semi-supervised CNN.

performance of SS-CNN. There are 42,776 labeled pixels with nine classes in the University
of Pavia data set. 200 labeled samples per class are randomly selected for supervised
training and all the other samples without label are used to test. The number of labeled
training samples and testing samples are listed in Table 1. The results using different
proportion of unlabeled data are shown in Figure 3, which demonstrats that the use of
enormous amount of unlabeled samples can improve the classification accuracy. Although
using a small amount of unlabeled samples will lead to overfitting, which makes the
classification accuracy lower. So we ultilize all unlabeled samples to train the network.
The input of the SS-CNN for the university of Pavia data set is 9 9 103 (K ¼ 9; B ¼ 103)
neighborhoods of a pixel. Selecting larger neighborhoods as the input can get better accuracy.
However, it need more time to train the network with the larger neighborhoods as the input.
The results with different neighborhoods size are listed in Table 2. Considering that there is
strong correlation between differnet bands of hyperspectral image and the labeled samples
are limited, the value of B1 is set to be 80 smaller than the value of B in order to decrease the
number of parameters. In addition, we experiment with different values of B1 , and find that the
effect of B1 on classification accuracy is relatively small when the value of B1 (e.g.
60,80,120,200) is large enough. σ2 (the variance of the Guassian noise) is set to be 0.01. We
also test with various σ2 , e.g., 0.5, 0.1, 0.05, 0.01, 0.005, 0.001, and the corresponding accuracy
(%) are 85.43, 90.14, 97.87,98.32, 98.29,98.31, respectively. It is found that a larger σ2 can reduce
the classification accuracy and σ2 shuold be set relatively smaller than the value of pixel in
hyperspectral image. Note that the hyperspectral image is scaled to 0 and 1 before training the
network. The learning rate is set to 0.001 empirically. The number of epochs is set to be 40. λl is
the weight of unsupervised loss for different layers. We fix λ2 ¼ 1:0; λ3 ¼ 1:0; λ4 ¼ 1:0 and
increase the value of λ1 (e.g. 1.0, 10.0, 100.0). The classification accuaracy (e.g. 96.81%, 97.47%,
97.96%) increases with the increase of λ1 , which reveals that the weight of unsupervised loss
for lower layer has greater contribution to the performance of classification. Consequently, we
set λ1 ¼ 10; λ2 ¼ 1; λ3 ¼ 0:1; λ4 ¼ 0:1 and obtain promising results. The classification accu-
racy of SS-CNN with batch normalization and without batch normalization are shown in
Figure 4, which demonstrate that using batch normalization can accelerate the convergence
and improve the classification accuracy.

Table 1. Number of labeled training samples and testing samples used in

the university of Pavia data set.
Class no. Class name No. of training samples No. of testing samples
1 Asphalt 200 6431
2 Meadows 200 18449
3 Gravel 200 1899
4 Trees 200 2864
5 Sheets 200 1145
6 Bare Soil 200 4829
7 Bitumen 200 1130
8 Bricks 200 3482
9 Shadows 200 747
Total 1800 40976
REMOTE SENSING LETTERS 845

Figure 3. Classiﬁcation accuracy for diﬀerent proportion of unlabeled data.

Table 2. Classiﬁcation accuracy (%) of diﬀerent neighborhoods size.

Size (pixel) 77 99 11 11 13 13 15 15 17 17
Accuracy (%) 97.96 98.32 98.51 98.79 98.97 99.12
Time (s) 858.1 1220.2 1383.6 1532.0 1740.4 2565.2

To demonstrate the eﬀectiveness of SS-CNN, we compare with several traditional classiﬁers

such as support vector machine (SVM), spectral-spatial classification(ISODATA-SVM)(Yuliya,
Benediktsson, and Chanussot 2009), CNN (Hu et al. 2015), CNN with pixel-pair features (CNN-
PPF) (Li et al. 2016). The number of labeled training samples (200 samples per class) and testing
samples for different classifiers is the exactly same. The gamma (spread of the RBF kernel) and c
(parameter that controls the amount of penalty during the SVM optimization) for SVM are set
to be 2 and 256, respectively. For the ISODATA algorithm, Cmin is set to be 9 and Cmax is set to be
10, where Cmin and Cmax are the lower and upper bound of the number of classes, respectively.
The compared results of overall accuracy(OA) and individual classification accuracy are listed in
Table 3. Figure 5(a) shows the ground-truth map of the university of Pavia data set. Figure 5(b)
shows the classification map obtained by SS-CNN. It is obvious that SS-CNN obtains best
classification accuracy. Traditional classifiers for hyperspectral image only use spectral informa-
tion. However, hyperspectral image can provide a wealth of spectral and spatial information
for classification. The SS-CNN can extract abstract and invariant spectral-spatial feature from
the hyperspectral data cube. Even, the supervised CNN(S-CNN) of our network without ladder
network can get higher accuracy than the traditional classifier, such as SVM, CNN (Hu et al.
2015). The supervised CNN with ladder network (S-CNN-LN) allow the network to focus on
abstract invariant features on the higher levels, which lead to higher classification accuracy. But
the OA of S-CNN and S-CNN-LN are lower than the CNN-PPF and ISODATA-SVM with the
limited labeled samples. CNN-PPF can deal with the challenge of limited labeled samples by
pixel-pair features and outperform the traditional methods. SS-CNN utilizes the unlabeled
samples to deal with the challenge of limited labeled samples and perform better than the
CNN-PPF and ISODATA-SVM. This demonstrates that the unsupervised learning targets on
every layer of our network can greatly improve the hyperspectral image classification accuracy.
Figure 6 shows the results of SS-CNN with different number of labeled training samples per
class, which demonstrate the effective performance of the SS-CNN with limited labeled
samples.
846 B. LIU ET AL.

Figure 4. Classiﬁcation accuracy for diﬀerent numbers of training epochs, BN denotes SS-CNN with
batch normalization, Non-BN denotes SS-CNN without batch normalization.

Training and testing time of diﬀerent classiﬁers are shown in Table 4. Batch normalization
can accelerate the convergence. So the number of total trainable parameter is set to be 40
41 in SS-CNN. The number of total trainable parameter is set to be 81,408 and 629,648 in CNN
and CNN-PPF, respectively. Consequently the training time of SS-CNN is much less than that of
CNN and CNN-PPF. However, due to use the enormous amount of unlabeled samples, training
the network is more time-consuming than SVM and ISODATA-SVM. Note that the training time

Table 3. Class-speciﬁc accuracy (%) and overall accuracy (OA)with diﬀerent techniques.
Class no. SVM CNN S-CNN S-CNN-LN ISODATA-SVM CNN-PPF SS-CNN
1 86.46 88.38 85.15 87.02 94.16 97.42 97.16
2 90.17 91.27 96.51 97.30 97.62 95.76 98.72
3 85.04 85.88 92.71 93.76 84.33 94.05 96.86
4 96.64 97.24 97.81 98.17 94.88 97.52 99.25
5 99.78 99.91 100.0 100.0 97.92 100.0 100.0
6 94.89 96.41 95.17 95.70 95.27 99.13 98.59
7 95.19 93.62 91.88 92.93 98.72 96.19 98.80
8 85.36 87.45 88.02 89.27 97.94 93.62 96.88
9 99.89 99.57 99.58 99.68 99.89 99.60 100.0
OA 90.62 92.27 93.88 94.72 96.08 96.48 98.32

Background

Asphalt

Meadows

Gravel

Trees

Sheets

Bare Soil

Bitumen

Bricks

Shadows

0 100 200
m

(a) (b)

Figure 5. Experiment of the university of Pavia data set. (a) Ground-truth map (b) Classiﬁcation map
obtained by SS-CNN.
REMOTE SENSING LETTERS 847

Figure 6. Classification accuracy for different methods with different number of labeled samples per class.

Table 4. Training and testing time of diﬀerent classiﬁers.

Classiﬁers SVM CNN CNN-PPF ISODATA-SVM SS-CNN
Training time(s) 0.1 2153.0 14040.7 47.7 + 0.1 1220.2
Testing time(s) 1.4 0.4 16.9 1.9 1.6

of ISODATA-SVM consists of training SVM and clustering by ISODATA. Due to enormous

amount of parameters, the testing time of SS-CNN is more than CNN and SVM. The ﬁnal
label is determined via a majority voting strategy in CNN-PPF and ISODATA-SVM. So the test of
CNN-PPF and ISODATA-SVM is more time-consuming than SS-CNN.

4. Conclusion
In this letter, a semi-supervised convolutional neural network is constructed for hyper-
spectral image classification. The experimental results demonstrate that the convolution
neural network with ladder network can effectively extract the spatial-spectral features
from the original hyperspectral image cube, and the semi-supervised training strategy
using the enormous amount of unlabeled samples can improve classification accuracy
even with a small number of labeled samples for training. However, due to use the
enormous amount of unlabeled samples, training the network is time-consuming.

Acknowledgement
We thank Prof. Paolo Gamba for providing the Pavia data set.

Funding
This work was supported by the [State Key Laboratory of Geo-information Engineering] under
Grant [SKLGIE2015-M-3-1, SKLGIE2015-M-3-2]; [National Natural Science Foundation of China]
under Grant [41201477]; and [Scientiﬁc and Technological Project in Henan Province] under
Grant [152102210014].

References
Bioucas-Dias, J. M., A. Plaza, G. Camps-Valls, P. Scheunders, N. Nasrabadi, and J. Chanussot. 2013.
“Hyperspectral Remote Sensing Data Analysis and Future Challenges.” IEEE Geoscience and
Remote Sensing Magazine 1 (2): 6–36. doi:10.1109/MGRS.2013.2244672.
848 B. LIU ET AL.

Camps-Valls, G., and L. Bruzzone. 2005. “Kernel-Based Methods for Hyperspectral Image Classification.”
IEEE Transactions on Geoscience and Remote Sensing 43 (6): 1351–1362. doi:10.1109/TGRS.2005.846154.
Chen, Y., H. Jiang, L. Chunyang, X. Jia, and P. Ghamisi. 2016. “Deep Feature Extraction and Classification
of Hyperspectral Images Based on Convolutional Neural Networks.” IEEE Transactions on Geoscience
and Remote Sensing 54 (10): 6232–6251. doi:10.1109/TGRS.2016.2584107.
Chen, Y., Z. Lin, X. Zhao, G. Wang, and G. Yanfeng. 2014. “Deep Learning-Based Classification of
Hyperspectral Data.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing 7 (6): 2094–2097. doi:10.1109/JSTARS.2014.2329330.
Ghamisi, P., Y. Chen, and X. X. Zhu. 2016. “A Self-Improving Convolution Neural Network for the
Classification of Hyperspectral Data.” IEEE Geoscience and Remote Sensing Letters 13 (10): 1537–
1541. doi:10.1109/LGRS.2016.2595108.
Hu, W., Y. Huang, L. Wei, F. Zhang, and H. Li. 2015. “Deep Convolutional Neural Networks for
Hyperspectral Image Classification.” Journal of Sensors 2015: 1–12. doi:10.1155/2015/258619.
Ioffe, S., and C. Szegedy. 2015. “Batch Normalization: Accelerating Deep Network Training by
Reducing Internal Covariate Shift.” Arxiv Preprint Arxiv:1502.03167 2015 37: 448–456.
Li, W., C. Chen, S. Hongjun, and D. Qian. 2015. “Local Binary Patterns and Extreme Learning
Machine for Hyperspectral Imagery Classification.” IEEE Transactions on Geoscience and Remote
Sensing 53 (7): 1–13. doi:10.1109/TGRS.2014.2381602.
Li, W., W. Guodong, F. Zhang, and D. Qian. 2016. “Hyperspectral Image Classification Using Deep Pixel-
Pair Features.” IEEE Transactions on Geoscience and Remote Sensing. doi:10.1109/TGRS.2016.2603190.
Liu, J., W. Zebin, Z. Wei, L. Xiao, and L. Sun. 2013. “Spatial-Spectral Kernel Sparse Representation for
Hyperspectral Image Classification.” IEEE Journal of Selected Topics in Applied Earth Observations
and Remote Sensing 6 (6): 2462–2471. doi:10.1109/JSTARS.2013.2252150.
Ma, X., H. Wang, and J. Geng. 2016. “Spectralspatial Classification of Hyperspectral Image Based on Deep
Auto-Encoder.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9 (9):
4073-4085.
Muoz-Mar, J., F. Bovolo, L. Gmez-Chova, L. Bruzzone, and G. Camp-Valls. 2010. “Semisupervised
One-Class Support Vector Machines for Classification of Remote Sensing Data.” IEEE Transactions
on Geoscience and Remote Sensing 48 (8): 3188–3197.
Pezeshki, M., L. Fan, P. Brakel, A. Courville, and Y. Bengio. 2016. “Deconstructing the Ladder
Network Architecture.” In editors C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R.
Garnett International Conference on Machine Learning. 2368–2376.
Rasmus, A., T. Raiko, and H. Valpola. 2014. “Denoising Autoencoder with Modulated Lateral
Connections Learns Invariant Representations of Natural Images.” Arxiv Preprint Arxiv 31(4): 55-63.
Rasmus, A., H. Valpola, and T. Raiko. 2015a. “Lateral Connections in Denoising Autoencoders
Support Supervised Learning.” Computer Science 31 (4): 555–563.
Rasmus, A., M. Berglund, M. Honkala, H. Valpola, and T. Raiko. 2015b. “Semi-Supervised Learning
with Ladder Networks.” In Advances in Neural Information Processing Systems, Montreal, Canada:
Curran Associates, Inc. 3546–3554.
Sun, S., P. Zhong, H. Xiao, and R. Wang. 2015. “Active Learning with Gaussian Process Classifier for
Hyperspectral Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 53 (4):
1746–1760. doi:10.1109/TGRS.2014.2347343.
Tuia, D., and C.-V. Gustavo. 2009. “Semisupervised Remote Sensing Image Classification with Cluster
Kernels.” IEEE Geoscience and Remote Sensing Letters 6 (2): 224–228. doi:10.1109/LGRS.2008.2010275.
Valpola, H. 2015. “From Neural PCA to Deep Unsupervised Learning.” Adv. in Independent
Component Analysis and Learning Machines 2015: 143–171.
Yue, J., S. Mao, and L. Mei. 2016. “A Deep Learning Framework for Hyperspectral Image Classification Using
Spatial Pyramid Pooling.” Remote Sensing Letters 7 (9): 875–884. doi:10.1080/2150704X.2016.1193793.
Yue, J., W. Zhao, S. Mao, and H. Liu. 2015. “Spectral-Spatial Classification of Hyperspectral Images
Using Deep Convolutional Neural Networks.” Remote Sensing Letters 6 (6): 468–477. doi:10.1080/
2150704X.2015.1047045.
Yuliya, T., J. A. Benediktsson, and J. Chanussot. 2009. “Spectral-Spatial Classification of
Hyperspectral Imagery Based on Partitional Clustering Techniques.” IEEE Transactions on
Geoscience and Remote Sensing 47 (8): 2973–2987. doi:10.1109/TGRS.2009.2016214.

CAPE Math Formula Booklet REVISED 2022
0% (1)
CAPE Math Formula Booklet REVISED 2022
11 pages
Probability Interview Questions
80% (5)
Probability Interview Questions
15 pages
Decoding The Moons Surface A Graph Neural Network Based Analysis of Chandrayaan-2 Lunar Data Classification
No ratings yet
Decoding The Moons Surface A Graph Neural Network Based Analysis of Chandrayaan-2 Lunar Data Classification
4 pages
Paper 82-Hyperspectral Image Classification
No ratings yet
Paper 82-Hyperspectral Image Classification
7 pages
Attention - Based Residualgoood4
No ratings yet
Attention - Based Residualgoood4
18 pages
Full Document - Hyperspectral PDF
No ratings yet
Full Document - Hyperspectral PDF
96 pages
A Convolution - Transformer Fusion Network For Hyperspectral Image Classification
No ratings yet
A Convolution - Transformer Fusion Network For Hyperspectral Image Classification
21 pages
Dual-Branch Domain Adaptation Few-Shot Learning For Hyperspectral Image Classification
No ratings yet
Dual-Branch Domain Adaptation Few-Shot Learning For Hyperspectral Image Classification
16 pages
Deep Feature Extraction and Classification of Hyperspectral Images Based On Convolutional Neural Networks
No ratings yet
Deep Feature Extraction and Classification of Hyperspectral Images Based On Convolutional Neural Networks
38 pages
GAO 2020 Combining T-Distributed Stochastic (AAM)
No ratings yet
GAO 2020 Combining T-Distributed Stochastic (AAM)
6 pages
Spectralformer: Rethinking Hyperspectral Image Classification With Transformers
No ratings yet
Spectralformer: Rethinking Hyperspectral Image Classification With Transformers
13 pages
Combining T-Distributed Stochastic Neighbor Embedding With Convolutional Neural Networks For Hyperspectral Image Classification
No ratings yet
Combining T-Distributed Stochastic Neighbor Embedding With Convolutional Neural Networks For Hyperspectral Image Classification
5 pages
Zhang 2018
No ratings yet
Zhang 2018
12 pages
Zhong Et Al. - 2017 - Learning To Diversify Deep Belief Networks For Hyperspectral Image Classification
No ratings yet
Zhong Et Al. - 2017 - Learning To Diversify Deep Belief Networks For Hyperspectral Image Classification
15 pages
A Lightweight Transformer Network For Hyperspectral Image Classification
No ratings yet
A Lightweight Transformer Network For Hyperspectral Image Classification
17 pages
Deep Feature Learning and Classification of Remote Sensing Images
No ratings yet
Deep Feature Learning and Classification of Remote Sensing Images
19 pages
A Survey of Deep Learning For Hyperspectral Image Classification
No ratings yet
A Survey of Deep Learning For Hyperspectral Image Classification
26 pages
2019 Deep Learning Ensemble For Hyperspectral Image Classification
No ratings yet
2019 Deep Learning Ensemble For Hyperspectral Image Classification
16 pages
Paper 8 PDF
No ratings yet
Paper 8 PDF
13 pages
R&D HiFACE
No ratings yet
R&D HiFACE
5 pages
1 s2.0 S1110982324000048 Main
No ratings yet
1 s2.0 S1110982324000048 Main
17 pages
Sensors: Comparison of CNN Algorithms On Hyperspectral Image Classification in Agricultural Lands
No ratings yet
Sensors: Comparison of CNN Algorithms On Hyperspectral Image Classification in Agricultural Lands
17 pages
Koumoutsou 2020
No ratings yet
Koumoutsou 2020
8 pages
Neural Ordinary Differential Equations For Hyperspectral Image Classification-Plaza2020
No ratings yet
Neural Ordinary Differential Equations For Hyperspectral Image Classification-Plaza2020
17 pages
Small Sample Classification For Hyperspectral Imagery Using Temporal Convolution and Attention Mechanism
No ratings yet
Small Sample Classification For Hyperspectral Imagery Using Temporal Convolution and Attention Mechanism
11 pages
Spectral-Spatial Hyperspectral Image Classification With Edge-Preserving Filtering
No ratings yet
Spectral-Spatial Hyperspectral Image Classification With Edge-Preserving Filtering
12 pages
GlobalLocal Multigranularity Transformer For Hyperspectral Image Classification
No ratings yet
GlobalLocal Multigranularity Transformer For Hyperspectral Image Classification
20 pages
Remote Sensing: An Enhanced Spectral Fusion 3D CNN Model For Hyperspectral Image Classification
No ratings yet
Remote Sensing: An Enhanced Spectral Fusion 3D CNN Model For Hyperspectral Image Classification
24 pages
4.final Version
No ratings yet
4.final Version
18 pages
2017 Multiple Kernel Learning For Hyperspectral Image Classification A Review
No ratings yet
2017 Multiple Kernel Learning For Hyperspectral Image Classification A Review
19 pages
Deep Learning Meets Hyperspectral Image Analysis: A Multidisciplinary Review
No ratings yet
Deep Learning Meets Hyperspectral Image Analysis: A Multidisciplinary Review
32 pages
2015 Hyperspectral Image Classification With Limited Labeled Training Samples Using Enhanced Ensemble Learning and Conditional Random Fields
No ratings yet
2015 Hyperspectral Image Classification With Limited Labeled Training Samples Using Enhanced Ensemble Learning and Conditional Random Fields
12 pages
Remote Sensing: Spectral-Spatial Classification of Hyperspectral Imagery With 3D Convolutional Neural Network
No ratings yet
Remote Sensing: Spectral-Spatial Classification of Hyperspectral Imagery With 3D Convolutional Neural Network
21 pages
Retracted-Advances in Hyperspectral Image Classification With A Bottleneck Attention Mechanism Based On 3D-FCNN Model and Imaging Spectrometer Sensor
No ratings yet
Retracted-Advances in Hyperspectral Image Classification With A Bottleneck Attention Mechanism Based On 3D-FCNN Model and Imaging Spectrometer Sensor
17 pages
Auto Encoder 1
No ratings yet
Auto Encoder 1
25 pages
Chen 2016
No ratings yet
Chen 2016
20 pages
Hasan 2019 IOP Conf. Ser. Earth Environ. Sci. 357 012035
No ratings yet
Hasan 2019 IOP Conf. Ser. Earth Environ. Sci. 357 012035
11 pages
Kumar 2021 J. Phys. - Conf. Ser. 1950 012087
No ratings yet
Kumar 2021 J. Phys. - Conf. Ser. 1950 012087
13 pages
Hyperspectral Image Classification Based On Parame
No ratings yet
Hyperspectral Image Classification Based On Parame
16 pages
Going Deeper With Contextual CNN For Hyperspectral Image Classification
No ratings yet
Going Deeper With Contextual CNN For Hyperspectral Image Classification
14 pages
Research Article
No ratings yet
Research Article
13 pages
Remote
No ratings yet
Remote
24 pages
2018 Recent Advances On Spectral-Spatial Hyperspectral Image Classification An Overview and New Guidelines
No ratings yet
2018 Recent Advances On Spectral-Spatial Hyperspectral Image Classification An Overview and New Guidelines
19 pages
IET Image Processing - 2019 - Hamouda - Hyperspectral Imaging Classification Based On Convolutional Neural Networks by
No ratings yet
IET Image Processing - 2019 - Hamouda - Hyperspectral Imaging Classification Based On Convolutional Neural Networks by
7 pages
WIREs Data Min Knowl - 2018 - Li - Deep Learning For Remote Sensing Image Classification A Survey
No ratings yet
WIREs Data Min Knowl - 2018 - Li - Deep Learning For Remote Sensing Image Classification A Survey
17 pages
Deep Convolutional Neural Networks For The Classification of Snapshot Mosaic Hyperspectral Imagery
No ratings yet
Deep Convolutional Neural Networks For The Classification of Snapshot Mosaic Hyperspectral Imagery
6 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Dictionary Based Clustered Sparse Representation For Hyperspectral Images
No ratings yet
Dictionary Based Clustered Sparse Representation For Hyperspectral Images
7 pages
Jstars 2014
No ratings yet
Jstars 2014
12 pages
HyperSpecTral Image Classification
No ratings yet
HyperSpecTral Image Classification
17 pages
A Fast Cluster-Assumption Based Active-Learning Technique For Classification of Remote Sensing Images
No ratings yet
A Fast Cluster-Assumption Based Active-Learning Technique For Classification of Remote Sensing Images
10 pages
3-D Deep Learning Approach For Remote Sensing Image Classification
No ratings yet
3-D Deep Learning Approach For Remote Sensing Image Classification
15 pages
WaveFormer SpectralSpatial Wavelet Transformer For Hyperspectral Image Classification
No ratings yet
WaveFormer SpectralSpatial Wavelet Transformer For Hyperspectral Image Classification
5 pages
Convolutional Neural Network For Satellite Image Classification
100% (1)
Convolutional Neural Network For Satellite Image Classification
14 pages
Sample EIP-II Report
No ratings yet
Sample EIP-II Report
7 pages
Major Project Report
No ratings yet
Major Project Report
30 pages
Lee 2016
No ratings yet
Lee 2016
4 pages
Hsi Target Detection With Weak Lebels
No ratings yet
Hsi Target Detection With Weak Lebels
13 pages
Cmtnet: A Hybrid Cnn-Transformer Network For Uav-Based Precision Agriculture
No ratings yet
Cmtnet: A Hybrid Cnn-Transformer Network For Uav-Based Precision Agriculture
18 pages
SpectralSpatial Morphological Attention Transformer For Hyperspectral Image Classification
No ratings yet
SpectralSpatial Morphological Attention Transformer For Hyperspectral Image Classification
15 pages
Advanced Spectral Classifiers For Hyperspectral Images A Review
No ratings yet
Advanced Spectral Classifiers For Hyperspectral Images A Review
25 pages
Probabilistic Classification of Hyperspectral Images by Learning Nonlinear Dimensionality Reduction Mapping
No ratings yet
Probabilistic Classification of Hyperspectral Images by Learning Nonlinear Dimensionality Reduction Mapping
8 pages
GUID - 1 en-US
No ratings yet
GUID - 1 en-US
18 pages
Opening Range Paper
0% (1)
Opening Range Paper
7 pages
STAT 5 Week 8 Part 1
No ratings yet
STAT 5 Week 8 Part 1
2 pages
Stat 234 Chang. Section 02, 391255: Ben Jacobson March 6, 2012
No ratings yet
Stat 234 Chang. Section 02, 391255: Ben Jacobson March 6, 2012
4 pages
(Ebook PDF) Introductory Econometrics: Asia-Pacific 2nd Edition PDF Download
100% (1)
(Ebook PDF) Introductory Econometrics: Asia-Pacific 2nd Edition PDF Download
47 pages
MAE 108 - Probability and Statistical Methods For Engineers - Spring 2015 Final Exam, June 10 Instructions
No ratings yet
MAE 108 - Probability and Statistical Methods For Engineers - Spring 2015 Final Exam, June 10 Instructions
8 pages
Are The Skewness and Kurtosis Useful Statistics
No ratings yet
Are The Skewness and Kurtosis Useful Statistics
6 pages
AI - ML Curriculum Powered by IBM - Pregrad
No ratings yet
AI - ML Curriculum Powered by IBM - Pregrad
31 pages
System: Power State Estimation
No ratings yet
System: Power State Estimation
7 pages
Ebook Ebook PDF The Analysis of Biological Data Second Edition All Chapter PDF Docx Kindle
100% (37)
Ebook Ebook PDF The Analysis of Biological Data Second Edition All Chapter PDF Docx Kindle
47 pages
40.1997-IEEE-Signal Stability-Based Adaptive Routing (SSA) For Ad Hoc Mobile Networks
No ratings yet
40.1997-IEEE-Signal Stability-Based Adaptive Routing (SSA) For Ad Hoc Mobile Networks
22 pages
c58c2f07-a641-4d65-b32f-db16b6908cae
No ratings yet
c58c2f07-a641-4d65-b32f-db16b6908cae
7 pages
CE504 - HW2 - Dec 27, 20
No ratings yet
CE504 - HW2 - Dec 27, 20
4 pages
SW5 Areas Under The Normal Curve - EDA 1CE
No ratings yet
SW5 Areas Under The Normal Curve - EDA 1CE
2 pages
Theoretical Distributions 2
No ratings yet
Theoretical Distributions 2
3 pages
Weka
No ratings yet
Weka
22 pages
Set 6
No ratings yet
Set 6
4 pages
The Overlapping Data Problem
No ratings yet
The Overlapping Data Problem
38 pages
3 - Baker Jayaram 2008 PDF
No ratings yet
3 - Baker Jayaram 2008 PDF
19 pages
Control Chart For Variables
No ratings yet
Control Chart For Variables
29 pages
Basic Concepts of The Theory of Survey Error and Adjustment Calculation
No ratings yet
Basic Concepts of The Theory of Survey Error and Adjustment Calculation
4 pages
(1966) Is Nature Probable or Capricious
No ratings yet
(1966) Is Nature Probable or Capricious
4 pages
Homework 3 Solutions
No ratings yet
Homework 3 Solutions
9 pages
Uzielli Et Al. NS2006 Overview - Final
No ratings yet
Uzielli Et Al. NS2006 Overview - Final
104 pages
Statistics
No ratings yet
Statistics
101 pages
Module 16 - Analyzing Data - 2
No ratings yet
Module 16 - Analyzing Data - 2
37 pages
1032 Design and Development of Biological Assays
No ratings yet
1032 Design and Development of Biological Assays
18 pages
Matthew Hong JMP
No ratings yet
Matthew Hong JMP
48 pages

Liu 2017

Uploaded by

Liu 2017

Uploaded by

Remote Sensing Letters

ISSN: 2150-704X (Print) 2150-7058 (Online) Journal homepage: http://www.tandfonline.com/loi/trsl20

A semi-supervised convolutional neural network

To link to this article: http://dx.doi.org/10.1080/2150704X.2017.1331053

Published online: 23 May 2017.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

A semi-supervised convolutional neural network for

ABSTRACT ARTICLE HISTORY

CONTACT Bing Liu [email protected]

2. Proposed semi-supervised CNN

2.1. Convolutional, pooling and fully connected layer for encoder

zðlÞ ¼ NB ðW ðlÞ hðl1Þ Þwhere : l ¼ 1; 2; 3; 4

2.2. Ladder network and decoder

^zðlÞ ¼ gð~zðlÞ Þ ¼ vðlÞ~zðlÞ þ ð1 vðlÞ ÞμðlÞ ¼ ð~zðlÞ μðlÞ ÞvðlÞ þ μðlÞ

^ziðlÞ ¼ gi ð~ziðlÞ ; uðlÞ

2.3. Semi-supervised learning

Labelled samples Corrupted Unsupervised

Figure 2. Flowchart of the proposed semi-supervised CNN.

Table 1. Number of labeled training samples and testing samples used in

Figure 3. Classiﬁcation accuracy for diﬀerent proportion of unlabeled data.

Table 2. Classiﬁcation accuracy (%) of diﬀerent neighborhoods size.

To demonstrate the eﬀectiveness of SS-CNN, we compare with several traditional classiﬁers

Table 4. Training and testing time of diﬀerent classiﬁers.

of ISODATA-SVM consists of training SVM and clustering by ISODATA. Due to enormous

You might also like