Dictionary Based Clustered Sparse Representation For Hyperspectral Images
Dictionary Based Clustered Sparse Representation For Hyperspectral Images
Journal of Spectroscopy
Volume 2015, Article ID 678765, 6 pages
http://dx.doi.org/10.1155/2015/678765
Research Article
Dictionary-Based, Clustered Sparse Representation for
Hyperspectral Image Classification
Zhen-tao Qin,1,2 Wu-nian Yang,1 Ru Yang,2 Xiang-yu Zhao,2 and Teng-jiao Yang2
1
Key Laboratory of Geo-Special Information Technology, Ministry of Land and Resources, Institute of Remote Sensing & GIS,
Chengdu University of Technology, Chengdu, Sichuan 610059, China
2
Panzhihua College, Panzhihua, Sichuan 617000, China
Correspondence should be addressed to Zhen-tao Qin; [email protected] and Wu-nian Yang; [email protected]
Copyright © 2015 Zhen-tao Qin et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This paper presents a new, dictionary-based method for hyperspectral image classification, which incorporates both spectral and
contextual characteristics of a sample clustered to obtain a dictionary of each pixel. The resulting pixels display a common sparsity
pattern in identical clustered groups. We calculated the image’s sparse coefficients using the dictionary approach, which generated
the sparse representation features of the remote sensing images. The sparse coefficients are then used to classify the hyperspectral
images via a linear SVM. Experiments show that our proposed method of dictionary-based, clustered sparse coefficients can create
better representations of hyperspectral images, with a greater overall accuracy and a Kappa coefficient.
Group 2 Group 3
Group 1
fband1 (7, 2) fband1 (7, 8)
fband1 (2, 3)
..
.. .
.. .
.
Figure 1: The pixels of a hyperspectral image partitioned into a number of different clustered groups.
representation in a spatial domain using the spatial relation- norm); 𝑆(Y) can be interpreted as finding a maximum a pos-
ships within the hyperspectral remote sensing image [11]. teriori (MAP) estimate of the coefficients under the assump-
In this paper we propose a new high-quality and effi- tions of Gaussian noise and a prior independent identically
cient classification technique that extends existing dictionary distributed (i.i.d.); traditionally, a Laplacian distribution is
learning-based classification frameworks in several aspects. preferred as it leads to the well-known Lasso or ℓ1 minimiza-
We incorporate both the spectral and contextual character- tion and can be expressed as [12]
istics of a hyperspectral sample by clustering remote sensing
𝑁
images to obtain a dictionary of pixels and present a cluster- 1
arg min ‖X − DY‖2𝐹 + 𝛾∑ y𝑖 1 . (3)
based dictionary learning method; pixels that belong to the 𝑦𝑖 2 𝑖=1
same cluster group are often made up of the same materials.
This property holds for various hyperspectral image singular- Tibshirani et al. used (3) to solve problem (2). The basic
ities such as straight and corner edges, as shown in Figure 1. problem of dictionary learning is to learn through sparse
Using a linear SVM as a classifier, we completely classified regularization representations 𝑦1 , . . . , 𝑦𝑁. Its vectors form
the hyperspectral remote sensing images. We compare this y(1) , . . . , y(𝐾) . The above optimization is convex in either D
cluster-based dictionary learning method with other alterna- or Y, but not in both. Commonly, a two-step strategy is used
tives for classification and show that it performs significantly for this problem.
better in terms of both accuracy and a Kappa coefficient.
(1) Sparse Coding. In this step, D is fixed and the optimization
2. Basic Model is solved with regard to Y. The objective function (3) is
converted to (4) and can be solved for each y𝑖 independently;
For a set of pixels of hyperspectral image, let x ∈ R𝑀, and that is,
the fundamental goal of the dictionary learning method is 1 2
to find a set of atomic signals D = [d1 , . . . , d𝐾 ] to represent arg min x𝑖 − Dy𝑖 𝐹 + 𝛾 y𝑖 1 . (4)
the hyperspectral data by a small number of terms in a linear 𝑦𝑖 2
generative model; that is, Several efficient algorithms have been proposed to solve
this issue (4) in recent years, though this paper implements
X = Dy + 𝜀. (1) the data [11] provided by the SPAMS toolbox.
In this paper, we use lowercase letters to represent vectors (2) Dictionary Update. In this step, researchers apply the dic-
(such as x) and capital letters to represent matrixes. Moreover, tionary update, in which Y is fixed and the optimization
𝜀 is a small residual due to modeling x in a linear manner with becomes
the sparse representation vector y ∈ R𝐾 . The formulation of
1
(1) is often a regularized least squares optimization as follows: arg min ‖X − DY‖2𝐹
D 2 (5)
1
arg min ‖X − DY‖2𝐹 + 𝛾𝑆 (Y) , (2) s.t. ∀𝑖 d𝑖 2 ≤ 1,
2
where X = [x1 , . . . , x𝑁], Y = [y1 , . . . , y𝑁], and ‖ ⋅ ‖𝐹 denotes which is quadratic in D. The gradient of the objective function
the Frobenius norm of a matrix. The second part of 𝛾 is a equals DYY𝑇 − XY𝑇 , in which zero is used for D =
parameter that trades off between the data fidelity (least- XY𝑇 (YY𝑇 )−1 . There are many ways of solving the problem; we
squares) term and the sparsity based regularizer (the 𝑙1 use the block coordinate descent (BCD) [13], which updates
Journal of Spectroscopy 3
the dictionary atoms iteratively. Since the objective function norm, we can update 𝛾𝐺 according to the estimated value.
is strongly convex, BCD is guaranteed to achieve the unique Setting the gradient of the objective function at (8) zero, we
solution. The atom of 𝑗 of objective function can expressed as arrive at
𝑗𝑇 𝑗𝑇
(1/2)‖R𝑗 − dY𝑇 ‖2𝐹 , where Y𝑇 is the row 𝑗 of Y and R𝑗 = X −
ΛD𝑇 DY𝐺 − ΛX𝐺
𝑇
D + 𝛾𝐺Y𝐺 = 0, (9)
𝑗𝑇
∑𝑖=𝑗̸ d𝑖 Ψ𝑇 . In order to solve for d𝑗 , the algebra is as follows:
where Λ = diag(‖Y𝑖𝐺,𝑇 ‖2 ). By solving (9), we arrive at Y𝐺 =
1 ΛD𝑇 (DΛD𝑇 + 𝛾𝐺I)X𝐺. According to the relationship of the
arg min ‖X − DY‖2𝐹
D 2 (6) pixels in the same cluster groups have a hidden relationships
within the spectral bands, showed in Figure 1. In order to
s.t. ∀𝑖 d𝑖 2 ≤ 1. implement the dictionary method, we used x1 , . . . , x𝑁 to
denote the spectral representation of the training data with
3. Clustered Sparse Representation for respective labels l1 , . . . , l𝑁 and then applied the dictionary
Hyperspectral Image Classification learning formulation of (4) to these samples to yield the cor-
responding sparse representations y1 , . . . , y𝑁 and the dictio-
Recently, Song and Jiao used the sparse representation nary, D. When there is a new hyperspectral sampling, sparse
method for hyperspectral image classification [14], in which coding can be applied (as in (4)) to find the corresponding
the sparse representation coefficients y𝑖 are considered to be sparse representation y, which is then classified using the
independent of each other. Soltani-Farani et al. [11] parti- trained linear SVM to find the corresponding label l. The
tioned the pixels into groups of the same size, such as group specific steps are as follows:
1 in Figure 1. Yet, the features of the group are not necessarily
similar when grouped by identical size, such as those in (1) cluster the hyperspectral image into different groups
groups 2 and 3 in Figure 1. In order to solve this problem by 𝑘-means++ [16];
and further improve the classification accuracy, we propose (2) apply the dictionary learning method using the
partitioning the pixels of the hyperspectral images into a SPAMS toolbox to solve the formula of (4), which
number of spatial neighborhoods called groups by clustering. yields the dictionary, D, and the corresponding sparse
Pixels that belong to the same cluster group are often made up representations coefficient, y, with respective labels
of the same materials, so we assume that their representations l 1 , . . . , l𝑁 ;
use a common set of atoms from the dictionary. Thus, the (3) a linear SVM classifier is trained on the sparse repre-
sparse representations of the HSI pixels that belong to the sentations and their corresponding labels l1 , . . . , l𝑁;
same group are no longer independent. In fact, the pixels in
the same cluster groups have revealed hidden relationships (4) according to the sparse representation of remote sens-
within the spectral bands. HSI are a collection of hundreds of ing images, we used a linear SVM classifier to achieve
images that have been acquired simultaneously in narrow and classification.
adjacent spectral bands. In this research, x1 , . . . , x𝑁 denote
the representation of the pixels in a hyperspectral image and 4. Experimental Results and Analysis
define the cluster groups 𝐺1 , . . . , 𝐺𝑛 as nonoverlapping image
patches. Figure 1 shows how the pixels of a hyperspectral In this section, in order to validate and test the effectiveness of
image may be partitioned into a number of different groups. the proposed clustered dictionary-based algorithm, we pro-
Accounting for the above assumption, the establishment of a vide the experimental results from two sets of real hyperspec-
sparse representation model can now be written as tral images. We then compare the classification accuracies of
the basic SVM classification (SVM) [17], which is the hot issue
X𝐺𝑖 = DY𝐺𝑖 + E𝐺𝑖 . (7) accompanying artificial neural network in machine learning,
and it involves any practical problems such as classification
In this model, the columns of Y𝐺𝑖 and E𝐺𝑖 are the sparse and regression estimation. In this paper, Libsvm 3.17 is used
representations and error vectors corresponding to the hyper- to do the experiments. Classification accuracy depends on
spectral samples, respectively. In order to get the dictionary the choice of parameters. All parameters (polynomial kernel
and sparse representations, we employ the ℓ2 /ℓ1 convex joint degree 𝑑, RBF kernel parameter, regularization parameter 𝐶,
sparsity-inducing regularizer in order (2) to arrive at the composite kernel weight, and the window width 𝑤) are
obtained by fivefold cross validation. The spectral-contextual
𝑔
1 dictionary learning (SCDL) is presented by Soltani-Farani
arg min ‖X − DY‖2𝐹 + ∑ 𝛾𝐺𝑖 Y𝐺𝑖 2,1
D,Y 2 𝑖=1 (8)
et al. [11]. In the paper, the authors partitioned the pix-
els into groups of the same size. The clustered dictionary
s.t. ∀𝑖 d𝑖 2 ≤ 1, learning (Cluster-DL) is presented by our team; the pixels
of a hyperspectral image are partitioned into a number of
where 𝛾𝐺𝑖 is the regularization parameter for the group of 𝑖 different groups by 𝑘-means++. We also compare the spectral
and ‖Y𝐺𝑖 ‖2,1 is the ℓ2 /ℓ1 norm of the row of Y. In order to characteristics that have been gathered from the dictionary
solve the problem, we have empirically adopted a regularized learning method, which are made up of dictionary atoms with
M-FOCUSS algorithm [15]. By estimating each row of the ℓ2 the original spectral remote sensing images’ characteristics.
4 Journal of Spectroscopy
(a) Indian Pines image (composed by (b) Ground-truth (c) Train data
bands 50, 20, and 17)
(d) Classification map obtained by (e) Classification map obtained by SCDL (f) Classification map obtained by Clus-
SVM ter-DL
Figure 2: Indian Pines hyperspectral image and the comparison maps of different classification.
0.10
4.1. The 1st Experiment. We collected the 1st experiment 0.08
over an agricultural/forested area in NW Indiana using the 0.06
AVIRIS sensor, called the Indian Pines image. The image is 0.04
145 pixels × 145 pixels and consists of 220 bands across the 0.02
0 20 40 60 80 100 120 140 160 180 200
spectral range 0.2 to 2.4 𝜇m and 20 noisy bands (104–108,
Bands
150–163, and 220) that correspond to the region of water
absorption that has been removed. The image consists of Clustered-DL learned atom
16 ground-truth classes; the specific classes and the number SCDL learned atom
of train and test data in each class are shown in Table 1. Alfalfa sample
We randomly chose 10% as the training data, as shown in Figure 3: The comparison map of sample spectra for Alfalfa in the
Figure 2(c), and the remaining 90% is the test data. Table 2 Indian Pines dataset and the learned dictionary atom obtained by
displays the test results, which contain the OA, AA, and Cluster-DL and SCDL.
Kappa coefficient. The SVM classification result is shown
in Figure 2(d), whereas the classification maps obtained by
other methods can be found in Figures 2(e) and 2(f). As a
means of visual comparison, we used learning dictionaries spectra. The two obvious gaps in the spectra correspond to
with 138 atoms (using 10% of the Indian Pines training the regions of water absorption, which were removed.
data). Figure 3 demonstrates the comparison map of sample
spectra for Alfalfa in the Indian Pines dataset and the learning 4.2. The 2nd Experiment. In this experiment, we chose to
dictionary atom obtained by SCDL and Cluster-DL. focus on the Center of Pavia (shown in Figure 4(a)); our
In Figure 3, we can see the sample spectra for Alfalfa in samples were collected in 2003 by the ROSIS sensor with a
the Indian Pines dataset, and the learning dictionary atoms spatial resolution of 1.3 m/pixel in 115 spectral bands covering
obtained by Cluster-DL and SCDL are close to each sample. 0.43 𝜇m to 0.86 𝜇m. Figures 4(b) and 4(c) show the ground-
Relatively speaking, the Cluster-DL is closer to the sample truth and classification maps obtained by SVM, whereas
Journal of Spectroscopy 5
(a) Pavia Center image (composed by (b) Ground-truth (c) Classification map obtained by SVM
bands 50, 20, and 17)
Figure 4: Pavia Center hyperspectral image and the comparison maps of different classification.
Table 1: Indian Pines ground-truth classes and train/test sets. Table 2: Classification accuracy and execution time (S) for AVIRIS
Indian Pines for different classifiers.
Number Class Train Test
Classifiers SVM SCDL Cluster-DL
1 Alfalfa 6 48
2 Corn-notill 144 1290 OA 0.7678 0.9664 0.9679
3 Corn-min 84 750 AA 0.6752 0.9401 0.9434
4 Corn 24 210 Kappa 0.7301 0.9617 0.9634
5 Grass/pasture 50 447 Time (S) 1.9765 46.204 19.4878
6 Grass/trees 75 672
7 Grass/pasture-mowed 3 23 Table 3: Classification accuracy for Pavia Center for different
8 Hay-windrowed 49 440 classifiers.
9 Oats 2 18
Classifiers SVM SCDL Cluster-DL
10 Soybeans-notill 97 871
OA 0.9122 0.9488 0.9734
11 Soybeans-min 247 2221
AA 0.8491 0.8970 0.9320
12 Soybean-clean 62 552
Kappa 0.8941 0.9378 0.9678
13 Wheat 22 190
Time (S) 0.50 68.34 4.73
14 Woods 130 114
15 Building-grass-trees-drives 38 342
16 Stone-steel towers 10 85
Cluster-DL can improve classification accuracy from 0.9664,
without the clustered dictionary learning algorithm, to
0.9679. In the 2nd experiment, Cluster-DL can improve the
Figures 4(d) and 4(e) show the classification maps obtained classification accuracy from 0.9488 to 0.9734; this means that
by SCDL and Cluster-DL (Table 3). the clustered dictionary leaning algorithm has more obvious
According to the experimental results, the clustered dic- advantages when the terrain is more complex. The execution
tionary learning algorithm proposed in this paper can signif- time of the SVM algorithm is less than the time of the
icantly improve classification accuracy. In the 1st experiment, dictionary learning algorithm, which also illustrates the fact
6 Journal of Spectroscopy
that the clustered structural dictionary learning improved the [4] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal
classification accuracy by increasing the execution time. of Cognitive Neuroscience, vol. 3, no. 1, pp. 71–86, 1991.
[5] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigen-
faces vs. fisherfaces: recognition using class specific linear
5. Conclusion and Discussion projection,” IEEE Transactions on Pattern Analysis and Machine
In this paper, we have investigated clustered dictionary learn- Intelligence, vol. 19, no. 7, pp. 711–720, 1997.
ing algorithms based on the models of hyperspectral data for [6] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component
HSI classification. Our research represents a hyperspectral Analysis, John Wiley & Sons, New York, NY, USA, 2004.
sample with a linear combination of a few atoms learned from [7] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for
the data. The identical clustered groups share the atoms of a hyperspectral image classification,” IEEE Transactions on Geo-
science and Remote Sensing, vol. 43, no. 6, pp. 1351–1362, 2005.
dictionary. The hyperspectral samples are classified by a linear
SVM trained on the coefficients of this linear combination. [8] D. L. Donoho, “Compressed sensing,” IEEE Transactions on
Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
Experiments on two sets of real HSI data confirmed this
model’s effectiveness for HSI classification and show that the [9] M. D. Iordache, J. Bioucas-Dias, and A. Plaza, “Dictionary prun-
ing in sparse unmixing of hyperspectral data,” in Proceedings of
proposed method can achieve better overall accuracy and
the 4th IEEE GRSS Workshop on Hyperspectral Image and Signal
Kappa coefficients. This is because the basic SVM classifi- Processing: Evolution in Remote Sensing (WHISPERS), pp. 1–4,
cation does not take into account the relationship between Shanghai, China, 2012.
the pixels. The SCDL classification partitioned the pixels into
[10] A. S. Charles, B. A. Olshausen, and C. J. Rozell, “Learning sparse
groups of the same size; the features of the group are not codes for hyperspectral imagery,” IEEE Journal on Selected
necessarily similar when grouped by size. In this paper, Topics in Signal Processing, vol. 5, no. 5, pp. 963–978, 2011.
hyperspectral image is partitioned into a number of different [11] A. Soltani-Farani, H. R. Rabiee, and S. A. Hosseini, “Spatial-
groups by 𝑘-means++; the pixels in the same cluster groups aware dictionary learning for hyperspectral image classifica-
have revealed hidden relationships within the spectral bands; tion,” IEEE Transactions on Geoscience and Remote Sensing, vol.
this is closer to real object. Further research is needed in order 53, no. 1, pp. 527–541, 2014.
to better understand how to integrate information between [12] R. Tibshirani, “Regression shrinkage and selection via the lasso,”
spatial and spectral information of HSI and utilize supervised Journal of the Royal Statistical Society. Series B: Methodological,
classification algorithms to improve the classification accu- vol. 58, pp. 267–288, 1996.
racy and execution time. [13] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online learning
for matrix factorization and sparse coding,” Journal of Machine
Learning Research, vol. 11, no. 1, pp. 19–60, 2010.
Conflict of Interests
[14] X.-F. Song and L.-C. Jiao, “Classification of hyperspectral
The authors declare that there is no conflict of interests remote sensing image based on sparse representation and
regarding the publication of this paper. spectral,” Journal of Electronics & Information Technology, vol.
34, no. 2, pp. 268–272, 2012.
[15] S. F. Cotter, B. D. Rao, K. Engan, and K. Kreutz-Delgado, “Sparse
Acknowledgments solutions to linear inverse problems with multiple measurement
vectors,” IEEE Transactions on Signal Processing, vol. 53, no. 7,
The authors were sponsored by the National Natural Science pp. 2477–2488, 2005.
Funds (nos. 41372340 and 41071265). Sincere thanks are due
[16] D. Arthur and S. Vassilvitskii, “K-means++: the advantages of
to the Committee of Development Foundation of Sciences careful seeding,” in Proceedings of the 18th Annual ACM-SIAM
and Technology for Geology and Minerals, Ministry of Land Symposium on Discrete Algorithms, Society for Industrial and
and Resources (MLR), China, that provided the financial Applied Mathematics, 2007.
support for doing an advanced research of the project. Sincere [17] C.-C. Chang and C.-J. Lin, “LIBSVM: a Library for support
thanks are due to Soltani-Farani A and Paolo Gamba for vector machines,” ACM Transactions on Intelligent Systems and
giving one of the authors a very friendly help. Technology, vol. 2, no. 3, article 27, 2011.
References
[1] G. Shaw and D. Manolakis, “Signal processing for hyperspectral
image exploitation,” IEEE Signal Processing Magazine, vol. 19,
no. 1, pp. 12–16, 2002.
[2] A. Plaza, J. A. Benediktsson, J. W. Boardman et al., “Recent
advances in techniques for hyperspectral image processing,”
Remote Sensing of Environment, vol. 113, supplement 1, pp. S110–
S122, 2009.
[3] R. K. Robinson and S. A. Jennings, “Hyperspectral imaging on
the international space station: an innovative approach to
commercial development of space,” in Proceedings of the 42nd
AIAA Aerospace Sciences Meeting and Exhibit, pp. 10584–10591,
Reno, Nev, USA, January 2004.
Photoenergy
International Journal of
International Journal of Organic Chemistry International Journal of Advances in
Medicinal Chemistry
Hindawi Publishing Corporation
International
Hindawi Publishing Corporation Hindawi Publishing Corporation
Analytical Chemistry
Hindawi Publishing Corporation
Physical Chemistry
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
International Journal of
Carbohydrate Journal of
Chemistry
Hindawi Publishing Corporation
Quantum Chemistry
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
Journal of
The Scientific Analytical Methods
World Journal
Hindawi Publishing Corporation
in Chemistry
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014