CS Decomposition Based Bayesian Subspace Estimation
CS Decomposition Based Bayesian Subspace Estimation
Abstract—In numerous applications, it is required to estimate where is an matrix whose columns span the -di-
the principal subspace of the data, possibly from a very limited mensional subspace of interest, is a matrix whose
number of samples. Additionally, it often occurs that some rough columns correspond to the coordinates of the signal in the range
knowledge about this subspace is available and could be used to
improve subspace estimation accuracy in this case. This is the space of , and denotes the additive noise. In this
problem we address herein and, in order to solve it, a Bayesian paper, contrary to plenty of source separation techniques such
approach is proposed. The main idea consists of using the CS as non-negative matrix factorization or independent component
decomposition of the semi-orthogonal matrix whose columns analysis, we are not interested in factorizing into a product
span the subspace of interest. This parametrization is intuitively of unknown matrices . Conversely, the problem addressed
appealing and allows for non informative prior distributions of
the matrices involved in the CS decomposition and very mild in this work consists of estimating the -dimensional subspace
assumptions about the angles between the actual subspace and of interest , which is spanned by the columns of . As a
the prior subspace. The posterior distributions are derived and consequence, without loss of generality, we assume in the sequel
a Gibbs sampling scheme is presented to obtain the minimum that the columns of are orthonormal, i.e., . When
mean-square distance estimator of the subspace of interest. Nu- the columns of are independent and Gaussian distributed with
merical simulations and an application to real hyperspectral data
assess the validity and the performances of the estimator. zero mean and covariance matrix , the maximum likeli-
hood (ML) estimate of is obtained from the most sig-
Index Terms—Bayesian inference, CS decomposition, minimum nificant left singular vectors of [1]. Therefore, the singular
mean-square distance estimation, simulation method, Stiefel man-
value decomposition (SVD) plays a central role in subspace es-
ifold, subspace estimation.
timation (in the frequentist framework) as it naturally reveals
the low-rank structure of the signal. The SVD turns out to pro-
I. PROBLEM STATEMENT vide very accurate estimates of in most cases [3]–[5].
However, two situations of practical interest may undermine it.
The first situation corresponds to the low sample regime, a case
where, to obtain the last line, we have used the fact that the
integral in the fourth line of (4) is that of a multivariate Gaussian
distribution with mean and covariance matrix , and
hence is proportional to . Note that depends on
only through the projection matrix .
Let us turn now to the hypotheses regarding . We assume
(2) that we have some a priori knowledge about the subspace
spanned by the columns of : This knowledge can come from
some available models or can be deduced from the data itself,
where stands for the principal eigenvectors of the
as in the hyperspectral imagery application. More precisely, we
matrix between braces. The MMSD estimator thus amounts to
assume that the range space of is close to the range
finding the principal subspace of the posterior mean of the pro-
space of some semi-orthogonal matrix and, without loss of
jection matrix on . Note that this approach
generality, we will assume that through the
is general and independent of the conditional and prior distri-
paper.3
butions: depending on the latter, it may or may not be an easy
In [17], we tackled the problem by assigning the matrix
task to obtain the MMSD estimator. In the sequel, we state our
either a Bingham— —or
assumptions regarding and derive its corresponding MMSD
a von Mises–Fisher (vMF) distribution—
estimator. The latter will then be tested on real hyperspectral
. The Bingham and vMF are the most widely used
data in Section IV.
distributions on the Stiefel manifold and they have proved to
be relevant in a number of applications, including meteorology,
II. DATA MODEL AND SUBSPACE ESTIMATION biology, image, or shape analysis [20]. Moreover, there exists
Let us consider the linear model (1) and let us assume that computationally efficient simulation tools to sample from these
is Gaussian distributed with independent columns so that the distributions, which makes them a sensible choice. However,
probability density function of , conditioned on and , is they suffer from two drawbacks. First, from a user point of
given by view, it is not obvious to set a value for the concentration
parameter since the latter is not an intuitively appealing
(3) parameter, in contrast to the angles between and
which are more directly meaningful. Moreover, the Bingham
1The true (square) distance between the subspaces is given by and vMF distributions hold for the whole matrix : the choice
, where for stand for the principal angles between
2The case of unknown can be considered by assigning a prior distribution
and . The distance we use herein, i.e.,
, is thus different from . (typically a conjugate prior, in our case an inverse gamma distribution) to
However, the two distances are close for small values of and the distance and modifying accordingly the posterior distributions to be derived next.
between projection matrices is widely accepted. Moreover, using the distance 3In the case where is close to an arbitrary semi-orthogonal matrix ,
between projection matrices allows one to obtain a closed-form expression for the measurements in (1) can be pre-multiplied by the unitary matrix such
the MMSD estimator, see (2). Minimization of would not yield that . Note that pre-multiplication by the unitary matrix
such closed-form expression since cannot be expressed simply as a does not modify the angles between and nor the distribution
function of and . in (3).
of a distribution and a value for will consequently induce so that, with the partitioning , we have
a distribution for the angles, but this relation is not revealed
in a straightforward and intelligible manner. In the present
paper, we attempt to remedy these shortcomings with a view
to obtain a parametrization of the statistical model that directly
involves the most meaningful parameters, namely the angles
, between and . Indeed, these
angles are instrumental as the distance between and
is directly connected to them. Furthermore, we look for
a less constrained model which relies on mild assumptions, and (7)
the latter would only concern the angles .
The model proposed herein is based on the CS decomposition
Assuming a priori independence between and and , it
of , which writes [19]
follows from (4) that the joint posterior distribution of and
and is given by
(5)
(10)
(11) (13)
Since , the sampling scheme of Hoff [23] can be where . Forgetting the exponential term
used to draw matrices from the distribution in (11). Let us now in (13), this distribution is similar to that of a scaled beta
examine the posterior distribution of distribution. Therefore, we choose a scaled beta distribu-
tion as a proposal
distribution in a Metropolis–Hastings scheme. Through
preliminary investigation, we ended up with the choice
and
which
turns out to provide a good approximation to (13) for low
to moderate SNR. The resulting Gibbs sampling scheme is
summarized in Table I.
Once the matrices have been generated, the
(12)
MMSD estimator, which theoretically entails computing
, can be approximated by
where , , are the th diagonal entries of
, and , (14)
respectively. The first thing to be noted is that the variables
, conditioned on , and , are independent and
hence one needs to generate independent random variables. Remark 1: Similarly, a maximum a posteriori (MAP) ap-
Unfortunately, the distribution in (12) does not belong to any proach can be advocated where the MAP estimator is obtained
known class of distributions and, therefore, generating random as
variables drawn from appears problematic.
In order to overcome this problem, we propose to resort to
a Metropolis–Hastings (MH) move [21], [22]. The basic (15)
idea is to generate a random variable drawn from a proposal
distribution and to accept it with a certain probability, the latter Note that is maximized when is the matrix
being equal to one if the candidate contributes to increase the of the most significant left singular vectors of and, hence,
target posterior distribution. Of course, the closer the proposal the MAP approach is in some way linked to the SVD-based ap-
and target distributions, the higher the acceptance rate and proach. Observe also that it does not make much sense to con-
hence the faster the convergence of the Markov chain. In order sider here a minimum mean-square error (MMSE) estimator. In-
to obtain a proposal distribution in our case, we make the deed the latter entails computing , which could
change of variable in (12), and come up with the be approximated by the arithmetic mean of the set of matrices
. However, the range space of is given up to right mul-
tiplication by an orthogonal matrix. Therefore, could
be close to without the actual matrices and being
close. It results that the arithmetic mean of the matrices
could result in a poor subspace estimate despite the fact that,
individually, the subspaces spanned by each matrix might
be accurate.
III. SIMULATIONS
In this section, we use Monte Carlo simulations to assess
the performance of the estimator defined previously. The per-
formance measure will be the distance between the subspace
spanned by and the subspace spanned by where stands
for one of the estimates. More precisely, we will display the
Fig. 1. Mean-square distance between true and estimated subspaces versus
mean-square distance (MSD), which is defined as . 20, .
MSD (16)
Fig. 5. Mean-square distance between true and estimated subspaces versus . Fig. 8. Mean-square distance between true and estimated subspaces versus
, , SNR 3 dB, and 75 . SNR. , , , and 60 .
Fig. 6. Mean-square distance between true and estimated subspaces versus Fig. 9. Mean-square distance between true and estimated subspaces versus
SNR. , , , and 60 . SNR. , , , and 75 .
of the soil characteristics, and thus numerous studies have fo- can be achieved by well-known and computationally efficient
cused on information retrieval from multi-band data; see, e.g., techniques such as principal component analysis (PCA), a pri-
[26]–[29]. So far, a widely accepted model is that the image mordial asset to using the linear (or subspace) model. However,
can be linearly decomposed as a combination of a few compo- it may be argued that the linear model does not fully account
nents, referred to as the endmembers [30]. One critical issue is for all physical phenomenon that give rise to the image, e.g.,
thus to identify the subspace where the data lies together with the possibly non-linear mixing of the components. In order to
the coordinates in this subspace, which provide the respective obtain a finer image analysis, non-linear models can be investi-
abundances, i.e., the proportion of the soil components. This gated [30] but generally at the price of a higher computational
complexity. Furthermore, in most cases non-linear effects are
not that important and an interesting alternative is to continue
to resort to a linear model but at a local level (i.e., within a few
pixels) rather than at the full image level. Doing so, one can
characterize the data locally and track the evolution of the local
subspaces in order to assess the degree of non-linearity. The sub-
space estimation scheme developed above can fulfill this task
and it is now tested against real hyperspectral data, acquired by
the NASA spectro-imager AVIRIS over Moffett Field, CA, in
1997. More precisely, we consider a 50 50 sub-image, which
contains partly a lake (upper part of the sub-image) and partly
a coastal area (lower part of the sub-image) composed of soil
and vegetation, see [31] for a more detailed description. The
data is collected in spectral bands and we have thus
a total of 2500 pixels. Under the linear mixing model and
in the absence of noise, the data matrix ,
where stands for the th pixel, can be written as
where and , de-
notes the set of endmembers, i.e., the spectral signatures which
best describe the soil components. In [31], it was shown that a
value was sufficient to obtain an accurate description
of the data. The columns of the matrix
are the so-called abundances: They satisfy
Fig. 10. Moffett image. MSD . , , .
the positivity constraint and the sum-to-one property,
i.e., where is the -length vector whose ele-
ments are all equal to 1. In other words, the matrix satis-
this figure the result obtained with the SVD, the SMT and the
fies the constraint . The pixels thus belong to
method of [17], which assumes a Bingham prior distribution
a simplex whose vertices are the endmembers [31]. Let
for . Fig. 10 shows that a local SVD or SMT would predict
denote the mean value of the pixels. Then,
rather large differences between the local subspaces and ,
the centered data matrix belongs to a -dimen-
especially for pixels in the lake area. However, it cannot be
sional subspace (with ), which can be estimated by a
concluded that does not apply for most of the image since,
number of techniques, including PCA [31].
with , the subspace estimated by the SVD may not
Usually, PCA is performed on the whole image, which
be very accurate. In contrast, the Bayesian CS-based MMSD
makes sense if the linear mixing model is in force for all
estimator shows that is a rather accurate subspace for the
pixels. Herein, we are interested in assessing the validity of
whole image (especially on the lake), except for the pixels
this model at the pixel level. More precisely, the PCA on the
along the transition between lake and coastal area. This seems
whole image will provide us with the “average” subspace: the
logical as non-linear mixing effects are more likely to occur
pixels are then unitarily transformed ( ) such that
along the shore, while the linear model is likely to apply well
, and we are interested in the distance
elsewhere. Therefore, the MMSD estimator is able to reveal
between and the subspace spanned by a pixel and its few
the zones of the image where departure from the linear model
nearest pixels. If this distance is very small, then it is likely
might occur. Finally, we note that it is not intuitive to set of
that the linear model described by is rather accurate. On the
value for : The values and do not have
other hand, if the distance is not negligible, it may be that
a real meaning and lead to different interpretations of the
does not describe accurately the scene around pixel or that
image. It is much easier to set a value for , a significant
some non-linear mixing effects might occur there. Therefore,
advantage of the CS-based model compared to the method
subspace estimation at the pixel level together with distance to
of [17]. However, the latter is computationally less intensive.
evaluation enables one to gain insight into the understanding
As a final comment, we would like to point out that the
of the mixing process. This is the approach we take here and
computational complexity of the present MMSD-CS method
our MMSD estimator is used towards this end. To be more
could be prohibitive in large dimensional problems ( large),
specific, for each pixel we use the latter and its three nearest
for which more computationally efficient algorithms, such as
neighbors (hence ) to obtain the MMSD estimator of the
the sparse matrix transform of [8], should be favored.
local subspace. The mean square distance between and
, MSD is then determined to evaluate how close
are the local subspace and the global subspace. The results are V. CONCLUSION
shown in Fig. 105: For comparison purposes, we display in In this paper, we considered the problem of subspace estima-
5Application to another image and results with a different value of can be tion from a possibly very limited number of snapshots under
found in [24]. the assumption that some prior knowledge about the subspace
is available. A Bayesian statistical model was formulated to ac- [19] G. Golub and C. V. Loan, Matrix Computations, 3rd ed. Baltimore,
count for this situation, based on the CS decomposition of the MD: The John Hopkins Univ. Press, 1996.
[20] K. V. Mardia and P. E. Jupp, Directional Statistics. New York: Wiley,
semi-orthogonal matrix whose columns span the subspace of 1999.
interest. This model was shown to rely on rather mild assump- [21] C. P. Robert and G. Casella, Monte Carlo Statistical Methods, 2nd
tions and, moreover, these assumptions involve meaningful and ed. New York: Springer-Verlag, 2004.
[22] C. P. Robert, The Bayesian Choice—From Decision-Theoretic
intuitively appealing quantities, namely the angles between the Foundations to Computational Implementation. New York:
prior subspace and the true subspace . The minimum mean- Springer-Verlag, 2007.
square distance estimator was implemented through a Gibbs [23] P. D. Hoff, “Simulation of the matrix Bingham–Von Mises–Fisher
distribution, with applications to multivariate and relational data,” J.
sampling scheme. It was shown to provide accurate estimates, Comput. Graph. Stat., vol. 18, no. 2, pp. 438–456, Jun. 2009.
in particular in the low SNR or low sample support regimes. [24] O. Besson, N. Dobigeon, and J.-Y. Tourneret, “CS decomposition
The estimator was also successfully applied to real hyperspec- based Bayesian subspace estimation,” IRIT/ENSEEIHT, Toulouse,
France, 2012.
tral data, demonstrating its ability to reveal the limits of linear [25] C.-I Chang, Hyperspectral Imaging: Techniques for Spectral Detection
mixing models. and Classification. New York: Kluwer, 2003.
[26] D. Manolakis, C. Siracusa, and G. Shaw, “Hyperspectral subpixel
target detection using the linear mixing model,” IEEE Trans. Geosci.
Remote Sens., vol. 39, no. 7, pp. 1392–409, Jul. 2001.
REFERENCES [27] M. Lewis, V. Jooste, and A. A. de Gasparis, “Discrimination of arid
vegetation with airborne multispectral scanner hyperspectral imagery,”
[1] L. L. Scharf, Statistical Signal Processing: Detection, Estimation and IEEE Trans. Geosci. Remote Sens., vol. 39, no. 7, pp. 1471–1479, Jul.
Time Series Analysis. Reading, MA: Addison-Wesley, 1991. 2001.
[2] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation [28] B. Datt, T. R. McVicar, T. G. V. Niel, D. L. B. Jupp, and J. S. Pearlman,
Theory. Englewood Cliffs, NJ: Prentice-Hall, 1993. “Preprocessing EO-1 hyperion hyperspectral data to support the appli-
[3] R. Kumaresan and D. Tufts, “Estimating the parameters of exponen- cation of agricultural indexes,” IEEE Trans. Geosci. Remote Sens., vol.
tially damped sinusoids and pole-zero modeling in noise,” IEEE Trans. 41, no. 6, pp. 1246–1259, Jun. 2003.
Acoust., Speech, Signal Process., vol. 30, no. 6, pp. 833–840, Dec. [29] J. Plaza, R. Pérez, A. Plaza, P. Martínez, and D. Valencia, “Mapping
1982. oil spills on sea water using spectral mixture analysis of hyperspectral
[4] R. Kumaresan and D. Tufts, “Estimating the angles of arrival of mul- image data,” in Chemical and Biological Standoff Detection III, J. O.
tiple plane waves,” IEEE Trans. Aerosp. Electron. Syst., vol. 19, no. 1, Jensen and J.-M. Thériault, Eds. Bellingham, WA: SPIE, 2005, vol.
pp. 134–139, Jan. 1983. 5995, pp. 79–86.
[5] P. Stoica and A. Nehorai, “MUSIC, maximum likelihood and [30] N. Keshava and J. Mustard, “Spectral unmixing,” IEEE Signal Process.
Cramér–Rao bound,” IEEE Trans. Acoust., Speech, Signal Process., Mag., vol. 19, no. 1, pp. 44–57, Jan. 2002.
vol. 37, no. 5, pp. 720–741, May 1989. [31] N. Dobigeon, S. Moussaoui, M. Coulon, J.-Y. Tourneret, and A. O.
[6] O. Ledoit and M. Wolf, “A well-conditioned estimator for large-di- Hero, “Joint Bayesian endmember extraction and linear unmixing for
mensional covariance matrices,” J. Multivar. Anal., vol. 88, no. 2, pp. hyperspectral imagery,” IEEE Trans. Signal Process., vol. 57, no. 11,
365–411, Feb. 2004. pp. 4355–4368, Nov. 2009.
[7] T. L. Marzetta, G. H. Tucci, and S. H. Simon, “A random matrix theo-
retic approach to handling singular covariance estimates,” IEEE Trans.
Inf. Theory, vol. 57, no. 9, pp. 6256–6271, Sep. 2011.
[8] G. Cao, L. R. Bachega, and C. A. Bouman, “The sparse matrix trans-
form for covariance estimation and analysis of high dimensional sig- Olivier Besson (S’90–M’93–SM’04) received the
nals,” IEEE Trans. Image Process., vol. 20, no. 3, pp. 625–640, Mar. Ph.D. degree in signal processing and the Habilita-
2011. tionà Diriger des Recherches from INP Toulouse,
[9] X. Mestre, “Improved estimation of eigenvalues and eigenvectors of France, in 1992 and 1998, respectively.
covariance matrices using their sample estimates,” IEEE Trans. Inf. He is currently a Professor with the Department
Theory, vol. 54, no. 11, pp. 5113–5129, Nov. 2008. of Electronics, Optronics and Signal of the Institut
[10] J. Thomas, L. Scharf, and D. Tufts, “The probability of a subspace swap Supérieur de l’Aéronautique et de l’Espace (ISAE ),
in the SVD,” IEEE Trans. Signal Process., vol. 43, no. 3, pp. 730–736, Toulouse, France. His research interests are in the
Mar. 1995. area of robust adaptive array processing, mainly for
[11] M. Hawkes, A. Nehorai, and P. Stoica, “Performance breakdown radar applications.
of subspace-based methods: prediction and cure,” Proc. Int. Conf. Dr. Besson is a former Associate Editor of the
Acoust., Speech, Signal Process. (ICASSP), pp. 4005–4008, May IEEE TRANS. SIGNAL PROCESS. and the IEEE SIGNAL PROCESSING LETTERS.
2001. He is a member of the Sensor Array and Multichannel Technical Committee
[12] Z. Bai and J. W. Silverstein, Spectral Analysis of Large Dimensional (SAM TC) of the IEEE Signal Processing Society.
Random Matrices, ser. Springer Series in Statistics, 2nd ed. New
York: Springer-Verlag, 2010.
[13] D. Paul, “Asymptotics of sample eigenstructure for a large dimensional
spiked covariance model,” Stat. Sinica, vol. 17, no. 4, pp. 1617–1642, Nicolas Dobigeon (S’05–M’08) was born in An-
Oct. 2007. gouleme, France, in 1981. He received the Eng.
[14] F. Benaych-Georges and R. R. Nadakuditi, “The singular values degree in electrical engineering from ENSEEIHT,
and vectors of low rank perturbations of large rectangular random Toulouse, France, and the M.Sc. degree in signal
matrices,” 2011 [Online]. Available: http://arxiv.org/abs/1103.2221 processing from the National Polytechnic Institute of
[15] F. Benaych-Georges and R. R. Nadakuditi, “The eigenvalues and Toulouse, France, both in 2004, and the Ph.D. degree
eigenvectors of finite, low rank perturbations of large random ma- in signal processing from the National Polytechnic
trices,” Adv. Math., vol. 211, no. 1, pp. 494–521, May 2011. Institute of Toulouse, France, in 2007.
[16] A. Srivastava, “A Bayesian approach to geometric subspace estima- From 2007 to 2008, he was a Postdoctoral Re-
tion,” IEEE Trans. Signal Process., vol. 48, no. 5, pp. 1390–1400, May search Associate at the Department of Electrical
2000. Engineering and Computer Science, University of
[17] O. Besson, N. Dobigeon, and J.-Y. Tourneret, “Minimum mean square Michigan. Since 2008, he has been an Assistant Professor with the National
distance estimation of a subspace,” IEEE Trans. Signal Process., vol. Polytechnic Institute of Toulouse (ENSEEIHT—University of Toulouse),
59, no. 12, pp. 5709–5720, Dec. 2011. France, within the Signal and Communication Group of the IRIT Laboratory.
[18] A. Edelman, T. Arias, and S. Smith, “The geometry of algorithms with His research interests are centered around statistical signal and image pro-
orthogonality constraints,” SIAM J. Matrix Anal. Appl., vol. 20, no. 2, cessing, with a particular interest in Bayesian inference and Markov chain
pp. 303–353, 1998. Monte Carlo (MCMC) methods.
Jean-Yves Tourneret (SM’08) received the In- Dr. Tourneret has been involved in the organization of several conferences,
génieur degree in electrical engineering from the including the European Conference on Signal Processing (EUSIPCO) 2002
Ecole Nationale Supérieure d’Electronique, d’Elec- (as the program chair), the International Conference on Acoustics, Speech and
trotechnique, d’Informatique et d’Hydraulique, Signal Processing (ICASSP) 2006 (in charge of plenaries) and the Statistical
Toulouse (ENSEEIHT), France, in 1989 and the Signal Processing Workshop (SSP) 2012 (for international liaisons). He
Ph.D. degree from the National Polytechnic Institute, has been a member of different technical committees, including the Signal
Toulouse, France, in 1992. Processing Theory and Methods (SPTM) Committee of the IEEE Signal
He is currently a Professor in the University of Processing Society from 2001 to 2007 and from 2010 to present. He served as
Toulousee (ENSEEIHT), Franc, and a member of an Associate Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING from
the IRIT laboratory (UMR 5505 of the CNRS). His 2008 to 2011.
research activities are centered around statistical
signal processing, with a particular interest to Bayesian and Markov chain
Monte Carlo methods.