0% found this document useful (0 votes)
33 views12 pages

Compact Structure Hashing Via Sparse and Similarity Preserving Embedding

Uploaded by

xingyanzhou687
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views12 pages

Compact Structure Hashing Via Sparse and Similarity Preserving Embedding

Uploaded by

xingyanzhou687
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

718 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO.

3, MARCH 2016

Compact Structure Hashing via Sparse and


Similarity Preserving Embedding
Renzhen Ye and Xuelong Li, Fellow, IEEE

Abstract—Over the past few years, fast approximate near- search, also known as nearest search, has been a hot topic
est neighbor (ANN) search is desirable or essential, e.g., in in information retrieval, database, and computer science. The
huge databases, and therefore many hashing-based ANN tech- task of the image search aims to take a query image and accu-
niques have been presented to return the nearest neighbors of
a given query from huge databases. Hashing-based ANN tech- rately find its nearest neighbors within a large database. The
niques have become popular due to its low memory cost and direct method of finding neighbors is to search over the given
good computational complexity. Recently, most of hashing meth- database and sort them in the light of their similarity to the
ods have realized the importance of the relationship of the data query. However, the complexity of exhaustive search is ampli-
and exploited the different structure of data to improve retrieval fied and becomes prohibitively expensive when the database
performance. However, a limitation of the aforementioned meth-
ods is that the sparse reconstructive relationship of the data items are large. Moreover, the searching performance will drop
is neglected. In this case, few methods can find the discrim- significantly due to the storage of the original data with high
inating power and own the local properties of the data for dimensionality. Hence, it is necessary to consider approximate
learning compact and effective hash codes. To take this crucial nearest neighbor (ANN) techniques to make large-scale search
issue into account, this paper proposes a method named special practical.
structure-based hashing (SSBH). SSBH can preserve the under-
lying geometric information among the data, and exploit the Over the past decade, fast indexing and similarity search
prior information that there exists sparse reconstructive relation- in large database have attracted considerable attention and
ship of the data, for learning compact and effective hash codes. many ANN techniques have been developed for information
Upon extensive experimental results, SSBH is demonstrated to retrieval. Broadly, the fast nearest neighbor search methods can
be more robust and more effective than state-of-the-art hashing be categorized into two families: 1) tree-based methods [5]–[9]
methods.
and 2) hashing-based methods [10]–[13].
Index Terms—Hashing, nearest neighbor search, structure Tree-based methods can exploit various tree structures,
sparse-based hashing. which include M-trees [5], k–d trees [6], [7], cover trees [8],
and metric trees [9], to find the approximate spatial parti-
I. I NTRODUCTION tions of the feature space. These methods try to perform
N RECENT years, large-scale image search has attracted fast similarity search. However, they suffer from the curse of
I considerable attention because of the rapid growing of
Web data including documents, images, and videos [1]–[4].
dimensionality and cannot deal with the large-scale database
due to the memory constrain. Moreover, the search perfor-
For example, popular photo sharing website Flickr has sur- mance of the aforementioned methods is more prone to drop
passed six billion photos. YouTube is a popular video sharing over the data with high dimensionality.
website that receives more than 48 h of uploaded videos Accordingly, the research of hashing-based methods has
every minute. It is very important to retrieve relevant informa- received large amount of interest in recent years [14]–[16].
tion from such massive image database. Efficiently, similarity The task of hashing-based methods tries to find a map from
the space of high-dimensional data to the space of low-
Manuscript received October 13, 2014; revised January 26, 2015; accepted dimensional binary while the topology structure of the original
March 10, 2015. Date of publication April 20, 2015; date of current ver-
sion February 12, 2016. This work is supported in part by the National data space can be preserved. Hashing-based methods are
Basic Research Program of China (973 Program) under Grant 2012CB316400, promising in performing fast similarity searches by generating
in part by the National Natural Science Foundation of China under Grant compact binary codes for a large-scale database. In this case,
61125106 and Grant 61300142, and in part by the Key Research Program of
the Chinese Academy of Sciences under Grant KGZD-EW-T03. This paper similar neighbors can be retrieved by returning the images
was recommended by Associate Editor Y. Zhao. from a given database using a small Hamming distance. By
R. Ye is with the Center for Optical Imagery Analysis and Learning, State encoding images as a set of compact binary codes, hashing-
Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics
and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, based methods can make the search operation extremely
China, and also with the School of Electronics and Information Engineering, fast. Hence, hashing-based methods can reduce the storage
Xi’an Jiaotong University, Xi’an 710119, China requirement and achieve the fast query time. Recently, much
X. Li is with the Center for Optical Imagery Analysis and Learning, State
Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics emphasis has been directed at the data-dependent or learning-
and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, based hashing methods [17]–[19]. Most recent researches on
China. data-dependent methods aim to find the inherent neighbor-
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. hood structure while the original data is embedded into a
Digital Object Identifier 10.1109/TCYB.2015.2414299 low-dimensional space.
2168-2267 c 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 15,2023 at 02:17:58 UTC from IEEE Xplore. Restrictions apply.
YE AND LI: COMPACT STRUCTURE HASHING VIA SPARSE AND SIMILARITY PRESERVING EMBEDDING 719

Most of the aforementioned hashing methods have realized Unsupervised methods can achieve binary codes
the importance of the relationship of the data and exploited for the given data points with the unlabeled data
the different structure information of data to improve retrieval in unsupervised way [26], such as locality sensitive
performance. However, a limitation of the aforementioned hashing (LSH) [12], [27], [28], spectral hashing (SH) [29],
methods is that the sparse reconstructive relationship of the principal component analysis hashing [30], and its rotational
data is neglected. Hence, few methods can find the discrim- variant [16]. LSH is most notably unsupervised hashing
inating power and own the local properties of the data for method and provides probabilistic guarantees of retrieving
learning compact and effective hash codes. items for similar examples [31], [32]. An LSH-based method
Realizing the importance of the sparse reconstructive rela- for similar examples over arbitrary kernel functions was
tionship of the data for learning compact, effective hash codes, presented in [13] and can be exploited in many existing
we propose a special structure-based hashing (SSBH) frame- useful measures. Another effective unsupervised hashing
work that can preserve the underlying geometric information method, also known as SH, was proposed recently by
among the data. The proposed objective function in SSBH Weiss et al. [29]. Compared to SH method, the optimal code
framework is composed of three components: 1) the term com- using multidimensional SH(MDSH) is guaranteed to faithfully
bined empirical fitness; 2) information theoretic regularization; reproduce the affinities as the number of bits increases [33].
and 3) structure sparsity representation regularization. First, By exploring the geometric structure of the data, density
we construct an objective function to balance the maximum sensitive hashing (DSH) avoids the purely random projections
of empirical accuracy combined with the information theo- selection and uses those projective functions which best agree
retic and the minimum of sparse reconstruction provided by with the distribution of the data [34].
each bits. Second, the sparisty is introduced to encode the Recently, AGH method was proposed to design appropri-
data domain via locality preserving embedding, which can ate compact codes by exploiting the inherent neighborhood
effectively reflect the intrinsic geometric properties of the structure in the data [2], [35], [36]. Zhang et al. [37] pro-
data. Finally, L2,1 norm is introduced into a modified sparse posed a novel research problem: composite hashing with
representation framework so that the discriminate informa- multiple information sources, which incorporated the features
tion of the hashing code can be preserved in the weight from different information sources into the binary hash-
matrix. ing codes. In addition, the existing state-of-the-art methods
The rest of this paper is organized as follows. The related including [16], [38], [39], and [40], have shown good perfor-
work of several hashing methods is briefly reviewed in mance to retrieve high-dimensional data. Dubbed iterative
Section II. Then, Section III presents the proposed SSBH quantization performed a rotation of zero centered data so as to
framework. A comprehensive set of comparison experiments alleviate the problem of unbalanced variances. Robust sparse
are reported in Section IV, and finally Section V concludes hashing (RSH) method [38] and sparse embedding and least
this paper. variance encoding (SELVE) method [39] can achieve good
hashing of with binary codes by exploiting sparse coding.
A novel algorithm, named iterative expanding hashing, was
II. R ELATED W ORK proposed to exploit very small Hamming radius and iteratively
Many data-dependent methods have been developed and expand a few nearest candidates, and can obtain high recall
can be divided into two categories: 1) supervised methods and low search time simultaneously [40].
and 2) unsupervised methods. Supervised methods exploit the In the following section, sequential learning for hashing
label information to preserve the semantic similarity and thus for the application of image retrieval will be discussed. For
improve the ability of fast similarity search for a large-scale later convenience, some annotations are introduced. Given
database [17], [20]. Unsupervised hash-based methods [such the databases X = [x1 , x2 , . . . , xi , . . . , xN ] ∈ Rd×N , the
as anchor graph hashing (AGH)] generate compact hash codes task of hashing-based method is to exploit K hash func-
via employing the techniques on machine learning. tions to map a data point xi to a K-bit hash code H(xi ) =
To exploit the label information, many supervised hashing [h1 (xi ), . . . , hK (xi )], where hk is the kth hash function and
methods have been proposed. A boosted similarity sensitive hk (x) ∈ {−1, 1}.
coding method was proposed to design a series of weighted
hash functions by using labeled data [21]. Liu et al. [24]
tried to learn efficient kernelized hash functions by utilizing A. Sequential Learning for Hashing methods
pairwise labels. Other supervised hashing methods, like deep Sequential learning for hashing methods [30] tried to intro-
neural network stacked with restricted Boltzmann machines, duce a neighbor-pair set M and a nonneighbor-pair set C
have been developed in recent years [22], which retains the to learn hash functions when the same bits for (xi , xj ) ∈ M
semantic similarity structure of data. A binary reconstructive and different bits for (xi , xj ) ∈ C are given. In [30], a semi-
embedding technique was proposed to learn hashing function supervised hashing framework is proposed to minimize empir-
by minimizing the reconstruction error between the metric ical error over the labeled data and an information theoretic
space and Hamming space [23]. Although existing supervised constraint over data points. Without loss of generality, let us
hashing techniques can improve retrieval performance due to denote the matrix formed by these lth columns of X as Xl .
its ability of exploiting label information [10], they need many Let H = [h1 , . . . , hK ] be a sequence of K hash functions. The
labeled images, which is a tough task to label [24], [25]. objective function in [30] for the empirical accuracy over the

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 15,2023 at 02:17:58 UTC from IEEE Xplore. Restrictions apply.
720 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 3, MARCH 2016

labeled data can be denoted as can provide mappings to encode unseen data points through
⎧   ⎫ a linear regression function and effectively cope with the
hk (xi )hk xj
⎪ ⎨
(xi ,xj )
∈M

⎬ reconstructive coefficients vary smoothly along the geodesics
J(H) =  . (1)

⎩ − hk (xi )hk xj ⎪

of the data manifold. In this case, the locality and the simi-
k
(xi ,xj )∈C larity of data can be preserved when binary codes of data are
generated, which makes the encoding process highly efficient.
According to [30], the empirical fitness in (1) can be
expressed in a compact matrix form
 A. Objective Function
1
J(H) = Tr H(Xl )TH(Xl )T Given the data X = [x1 , x2 , . . . xi , . . . , xN ] ∈ Rd×n , each
2
1    T
 sample xi is expected to be reconstructed from a few samples
⇒ J(W) = Tr sgn W T Xl Tsgn W T Xl (2) of the data matrix X. Hence, we try to find a sparse represen-
2
tation vector si for each sample xi by using the following l1
where W = [w1 , . . . , wK ] ∈ Rd×K is the sequence of projec- minimization problem:
tion vectors, sgn(W T Xl ) is the matrix of signs of individual
elements, the pairwise matrix T, incorporating the pariwise 
n

labeled information from Xl , can be defined as min si 1


si
⎧  ⎫ i=1
⎨ 1 : xi , xj ∈ M ⎬ s.t. xi = Xsi , i = 1, 2, . . . , n (7)
Tij = −1 : xi , xj ∈ C . (3)
⎩ ⎭ where si is an n-dimensional vector, and the element sij is
0 : otherwise
regarded as the contribution of each sample xj to xi . The
After this relaxation [26], [41], the relaxed empirical fitness
weight matrix S = [s1 , s2 , . . . , si , . . . , sn ] not only measures
term can be obtained from (2)
the similarity among different samples, but also captures some
1  intrinsic structure properties and the discriminate informa-
J(W) = Tr W T Xl TXlT W . (4)
2 tion of the data. This is because the elements are invariant
It mostly gets overfitting to measure the empirical accuracy to rotations and rescalings according to the first constraint
on the labeled data. Hence, it is necessary to consider the in (7). Moreover, the nonzero entries in sparse matrix may
information provided by each binary bit to obtain desir- help to distinguish the samples from the given class even if
able properties of hash codes. By incorporating informa- no class-labels are provided. In this case, the sparse recon-
tion theoretic constraint into the relaxed empirical fitness structive weight vector from all the samples tends to include
term as a regularizer, the objective function is obtained as potential discriminant information. As described above, the
follows: sparse weight matrix can capture intrinsic structure properties
1  λ  of the data and contain natural discrimination information.
J1 (W) = Tr W T Xl TXlT W + Tr W T XX T W (5) One might further expect that the desirable characteristics
2 2
in the high-dimensional features can be preserved in a low-
where Tr{W T XX T W} is the information theoretic term. dimensional Hamming space where items can be efficiently
Sequential learning for hashing method tried to maximize the searched. In this paper, we can construct hash functions based
objective function in (5) to learn the optimal W, which can be on a structure sparse representation framework such that the
exploited to design hash functions. hashing coding can be characterized by sparse weight matrix
of the data. In real applications, similar samples in training
III. l2,1 -N ORM R EGULARIZED H ASHING L EARNING VIA data space are often reconstructed with different reconstruc-
S IMILARITY P RESERVING E MBEDDING tion coefficients. Moreover, the decomposition error of similar
In this paper, we try to find a hash function H that map data data will be increased in spare linear reconstruction with-
points Xl into binary codes by using the following formula: out considering the local topological structure of the data
 space [42], [43]. Therefore, the priori information of local-
H(Xl ) = sign W T Xl (6) ity and similarity constraint can be exploited to regularize
where W is the projection matrix. H(Xl ) can map Xl into binary the sparse reconstruction [43]–[45]. Based on the above dis-
codes. Hence, it is important to learn the projection matrix cussion, we seek K hash functions {hk }K k=1 , which can best
for deriving final hashing function. To obtain the projection preserve the optimal weight vector si , to map a data point xi
matrix, a novel regularized technique is formulated by extend- to a K-bit hash code H(x) = [h1 (x), . . . , hK (x)]
ing the existing hashing method to the proposed method in this 
n 
n 
n
 2
paper. The proposed method has three features. First, the con- min si 1 + β si − sj Aij
si
structed hash function can capture the intrinsic attribute of the i=1 i=1 j=1
data, which generate interpretable binary codes. Second, it is s.t. hk (xi ) = hk (Xsi )
expected to exploit only a linear combination of a few entries i = 1, 2, . . . , n; k = 1, 2, . . . , K (8)
in the dictionary to reconstruct the original data. Thus, the
low-dimensional nature of sparse representation make it appro- where Aij = exp(−(xi − xj 2 /σ )) (σ is the heat kernel
priate to encode unseen data. Finally, the proposed method parameter) and β is a regularized parameter. For convenience,

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 15,2023 at 02:17:58 UTC from IEEE Xplore. Restrictions apply.
YE AND LI: COMPACT STRUCTURE HASHING VIA SPARSE AND SIMILARITY PRESERVING EMBEDDING 721

the problem in (8) can be transformed as follows: where U is a diagonal matrix whose ith diagonal element is
Uii = 1/(2si. 2 ), and si is the ith row of S. In practice, the

K 
n
min hk (xi ) − hk (Xsi )2 ith diagonal element is redefined Uii = 1/(2si. 2 + ε) (ε is
hk ,si
k=1 i=1
a small constant) due to si 2 may be zero. In summary, we

n 
n 
n try to learn optimal map W and S by minimizing the objective
 2
+α si 1 + β si − sj Aij (9) function as
i=1 i=1 j=1
(W, S) = arg min J2 (W, S). (13)
where α is the balance parameter. Let Bbe a diagonal matrix W,S

and its elements are defined as Bi,i = j Aj,i , V = B − A is Although recent hashing methods can get the promising
the graph Laplacian matrix. Equation (9) can be transformed results in the application of image retrieval, they may neglect
as follows: the sparse reconstructive relationship of the data and cannot

K 
n find the discriminating power and own the algebraic struc-
min hk (xi ) − hk (Xsi )2 + αS1 + βTr(SVS). (10) ture of the data. Consequently, it is necessary to exploit the
hk ,si sparse relationship of the data points in hashing learning. In
k=1 i=1
this paper, we construct an objective function, which incorpo-
It can be found in (10) that the low-dimensional binary
rates to build a balance between the maximum of empirical
embeddings can preserve the original structure information of
accuracy combined with the information theoretic and the min-
the data. Since the sparse matrix S has natural discriminating
imum of sparse reconstruction of data points. By considering
power, the hash function learned from the feature space to a
the terms in (5) and (13), the following objective function of
Hamming space has a good discriminative power even if no
the proposed method is presented as (14) as shown at the top
supervised information is provided.
of the next page, where the matrix M = 1/2(Xl TXlT + λXX T )
In [41], the l2,1 -norm of a matrix is exploited as a rotational
and the matrix L = I − S − ST + ST S. The proposed objective
invariant l1 -norm and has attracted increase attention. Different
function is divided into two steps: 1) learning sparse weight
from the flat penalty introduced by the l1 -norm, l2,1 -norm
matrix S (fixing W) and 2) learning map W (fixing S).
term can regularize all the elements {si }ni=1 corresponding to
the training data as a whole and compute the l1 -norm over
B. Computing Map Matrix W
s = [s1 2 , . . . , si 2 , . . . , sn 2 ]T . The samples in original
data X corresponding to the nonzero entries of the weight Fixing the sparse weight matrix S, (14) can write the new
matrix can be automatically selected to approximate the given objective function as
 
data vector. Therefore, when the l2,1 -norm is imposed on all Tr W T Xl I − S − ST + ST S XlT W
the elements {si }ni=1 , the objective function in (10) can be W = arg min  . (15)
W Tr W T MW
formulated as follows:
For compact expression, the objective function in (15) can

K 
n
hk (xi ) − hk (Xsi ) + αS2,1 + βTr(SVS)
2 be transformed as follows:
min 
hk ,si Tr W T Xl LXlT W
k=1 i=1
W = arg min 1  . (16)
(11) T
W 2 Tr W MW
where the l2,1 -norm of S2,1 is defined as: S2,1 =

The minimization problem in (16) can be solved by gen-
n
i=1 si. 2 and si is the ith row of S. The objective func- eralized eigenvalue decomposition problem. Therefore, the
tion in (11) is not differentiable due to hk (xi ) = sgn(wTk xi ). optimal map matrix is obtained by computing the eigenvectors
Therefore, the objective function in (11) is relaxed by replac- corresponding to the first smallest eigenvalues.
ing the sgn() with its signed magnitude and represented the
objective as a function of W C. Computing Sparse Weight Matrix S
 n  If W is fixed, the minimization problem in (13) can be
K  2
J2 (W, S) =  w xi − w Xl si
T T  + αS2,1 transformed as
k k
k=1 i=1  
S = arg min Tr W T Xl I − S − ST S XlT W

n 
n
 S

2
si − sj Aij 
+ αS2,1 + βTr SVST
i=1 j=1  
  = arg min Tr D I − S − ST + ST S D

K 
n
S
= Tr wTk (xi − Xl si )(xi − Xl si ) wk
T  
+ αTr SUST + βTr SVST
k=1 i=1
  = arg min C(S) (17)
+ αTr SUST + βTr SVST
  S
= Tr W T Xl XlT − Xl SXlT − Xl ST XlT + Xl ST SXlT W where D = W T Xl and C(S) = W T Xl (I − S − ST + ST S)XlT W +
 
+ αTr SUST + βTr SVST αTr(SUST ) + βTr(SVST ). Let (∂C(S)/∂S) equals to zero,
 
= Tr W T Xl I − S − ST + ST S XlT W namely
  ∂C(S)
+ αTr SUST + βTr SVST (12) ∂S = −2DT D + 2SDT D + 2αSU + 2βSV (18)

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 15,2023 at 02:17:58 UTC from IEEE Xplore. Restrictions apply.
722 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 3, MARCH 2016

J2 (W, S)
(W, S) = arg min
W,S J1 (W)
  
Tr W T Xl I − S − ST + ST S XlT W + αS2,1 + βTr SVST
= arg min  λ 
l W + 2 Tr W XX W
1 T T T T
W,S
 T 2 Tr W Xl TX 
Tr W XLX T W + αTr SUST + βTr SVST
= arg min  
W,S
1
Tr W T Xl TXlT W + λ2 Tr W T XX T W
 2  
Tr W T XLX T W + αTr SUST + βTr SVST
= arg min 1
 (14)
T
W,S 2 Tr W MW

Algorithm 1 Computing Map Matrix W and Sparse Weight DSH method [34], complementary hashing (CH) method [26],
Matrix S LSH method [12], SH method [29], MDSH method [33],
Input: given a set of dimensional training data points Xl = and unsupervised sequential projection learning for hash-
[x1 , . . . , xt ], the parameters α and β, number of iterations J. ing (USPLH) method [30]. In our experiments, to perform
Initialize: set S0 = In×n , where In×n is matrix of ones, and fair evaluation, all methods are performed in the unsupervised
compute the diagonal matrix U0 manner. Similar to [34], a returned point is regarded as a true
⎡ 1 ⎤ neighbor if it lies in the top two percentile points closest (mea-
 
2s01. 2 +ε
⎢ ⎥ sured by the Euclidian distance in the original space) to the
U0 = ⎣ ··· ⎦. query. For each query, all the data points in the database are
1
2s0n. 2 +ε ranked according to their Hamming distances to the query. In
this case, the pairwise matrix T in (3) is not available. Similar
For: t = 1:J
to [30], a pesudolable Tij = 1 is assigned for a pair of sam-
1: Compute the map matrix Wt from the following mini-
ples (xi , xj ) ∈ M and Tij = −1 is assigned for those samples
mization problem by generalized eigenvalue decomposition
(xi , xj ) ∈ C. M and C are a neighbor-pair and nonneighbor-pair
problem in the t-th iteration:
 sets, respectively. Two sets can be defined as follows:
Tr Wt T Xl LXlT Wt   
Wt = arg min 1 T
 . M = xi , xj : h(xi ) ∗ h xj
Wt 2 Tr Wt MWt   
= −1, wT xi − xj  ≤ ε
2: Compute the sparse weight matrix St by exploiting the   
following equation: C = xi , xj : h(xi ) ∗ h xj
  
 −1 = 1, wT xi − xj  ≥ ξ. (20)
St = DTt Dt DTt Dt + αUt + βV ,
t t 
where Dt = Wt Xl and St = s1. , s2. , . . . , sn. .
T t
A. Datasets and Settings
3: Update the diagonal matrix Ut : Three benchmark datasets are exploited to verify the effi-
⎡ 1 ⎤
ciency of the proposed method. Specifically, the experiment
2st1. 2 +ε
⎢ ⎥ settings are listed in the following.
Ut = ⎣ ··· ⎦.
1 1) CIFAR-10 Dataset: The CIFAR-10 dataset is a labeled
2stn. 2 +ε subset of the well known 80 M tiny image collection [46]. It
End: contains ten classes of 60 000 samples, and each class consists
Output: the sparse weight matrix SJ and the map matrix WJ . of 6000 samples with size 32×32 color images. We randomly
split this database into two parts: 1) a training set with 59 000
samples and 2) a test set with 1000 samples. The training set
or equivalently is exploited to learning hash function and construct the hash
 lookup. A few image samples from the CIFAR-10 dataset are
−1
S = DT D DT D + αU + βV . (19) listed in Fig. 1.
2) SIFT-1M Dataset: The SIFT-1M dataset consists of
The corresponding method for computing the map matrix W one million local scale invariant feature transform (SIFT)
and sparse weight matrix S is summarized as Algorithm 1. descriptors, which are extracted from random images and
described in [47]. Each sample in this dataset is composed of
IV. E XPERIMENTAL R ESULTS a 128-dimensional vector representing histograms of gradient
This section reports a set of experiments to verify the orientation. Similar to [24], one milion samples in SIFT-1M
efficiency of the proposed method. To validate the perfor- dataset are chosen to train and an additional 10 000 is regarded
mance of the proposed method in image retrieval, we compare as test samples.
eight hashing methods including our method and seven other 3) 22K LabelMe Dataset: 22K LabelMe dataset, compiled
unsupervised state-of-the-art methods: SELVE method [39], by Torralba et al. [48], contains 22 019 images and 2000 test

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 15,2023 at 02:17:58 UTC from IEEE Xplore. Restrictions apply.
YE AND LI: COMPACT STRUCTURE HASHING VIA SPARSE AND SIMILARITY PRESERVING EMBEDDING 723

Fig. 3. Parameter selection results on CIFAR-10 dataset: MAP varies with


Fig. 1. Gallery of images from the CIFAR-10 dataset. From top to bottom the parameter α for 64-bit.
rows, the image classes are airplane, automobile, bird, cat, deer, dog, horse,
ship, and truck.

Fig. 4. Parameter selection results on CIFAR-10 dataset: MAP varies with


Fig. 2. Parameter selection results on CIFAR-10 dataset: MAP varies with the parameter λ for 64-bit.
the parameter β for 64-bit.

images. The size of each image is 32 × 32 pixels with a hashing performance measure to select the optimal parameters.
512-dimensional Gist descriptor. First, we fix parameter pair (α, λ) = (0.1, 0.1). We study how
4) 100K Tiny Image Dataset: 100K tiny image consists MAP varying the parameter β for the proposed method when
of 100 000 images sampled from the large 80 million tiny the parameters α and λ keep unchanged. Fig. 2 plots the exper-
images [46]. Tiny image data tries to provide a visualiza- imental results varying with the value of parameter β from
tion of all the nouns in the English language and is mainly 0.1 to 0.9 for 64-bit when fixing parameter pair α = 0.1 and
collected by Goolges image search engine. The size of the λ = 0.1. It can be shown from Fig. 2 that our method achieves
original images is 32 × 32 pixels with a 384-dimensional Gist the best MAP performance when β = 0.3. Second, we fix
descriptor. The entire dataset can be divided into two parts: parameter pair (β, λ) = (0.3, 0.1), and then select the best
1) a test set with 4000 images and 2) a training set with parameter α. Fig. 3 presents the values varying with param-
96 000 images. eter α from 0.1 to 0.9 for 64-bit when fixing β = 0.3 and
λ = 0.1. It can be seen from Fig. 3 that the best parame-
ter α is 0.5. Finally, two parameters β and α are fixed 0.3
B. Parameters Selection and 0.5, respectively. Fig. 4 shows the MAP varying with λ
To further verify the proposed method, we choose the for 64-bit. In Fig. 4, we can find λ = 0.8 achieves the best
CIFAR-10 dataset for a thorough study on three important performance while other parameters keep unchanged. Thus,
parameters in the objective function of (14). In parameters we select the best parameter pair for the proposed method is
analysis, mean average precision (MAP) is exploited as a (β, α, λ) = (0.3, 0.5, 0.8).

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 15,2023 at 02:17:58 UTC from IEEE Xplore. Restrictions apply.
724 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 3, MARCH 2016

Fig. 5. Precision–recall curve on CIFAR-10 image data. (a)–(h) Performances for the hash codes of 8, 12, 16, 24, 32, 48, 64, and 128 bits, respectively.

Fig. 6. Precision curves on CIFAR-10 image data. (a)–(h) Performances for the hash codes of 8, 12, 16, 24, 32, 48, 64, and 128 bits, respectively.

C. Competitors 2) LSH method [12] is a classic unsupervised method for


In this paper, the results of the proposed method are com- multitemporal data, and has a wide application in change
pared with seven state-of-the-art methods. Details of these detection domains.
methods are as follows. 3) DSH method [34] is considered to be an extension of
1) SELVE method [39] is one of the latest method for LSH. Compared with LSH, DSH avoids the purely random
hashing, which learns a dictionary to encode the sparse projections selection and achieves better performance.
embdding feature, and binarizes the coding coefficients 4) SH method [29] is a classic method to conduct principal
as the hash codes. component analysis on the original data, and exploits

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 15,2023 at 02:17:58 UTC from IEEE Xplore. Restrictions apply.
YE AND LI: COMPACT STRUCTURE HASHING VIA SPARSE AND SIMILARITY PRESERVING EMBEDDING 725

Fig. 7. Recall curves on CIFAR-10 image data. (a)–(h) Performances for the hash codes of 8, 12, 16, 24, 32, 48, 64, and 128 bits, respectively.

Fig. 8. Results on the SIFT-1M image dataset. Precision curves with (a) 8 and (b) 12 bits. Recall curves with (c) 8 and (d) 12 bits.

Fig. 9. Precision–recall curve on LableMe image data. (a)–(d) Performances for the hash codes of 8, 16, 24, and 32 bits, respectively.

Laplacian eigenfunctions computed along the principal 6) MDSH method [33] is a popular method to find binary
directions of the data to generate the hash codes. codes so that inner product among the codes approxi-
5) CH method [26] is a latest hashing method, which mates the affinity between datapoints.
aims to learn a series of hashing functions which 7) USPLH (semi-supervised novelty detection)
cross the sparse data region and generate balanced hash method [30] is an unsupervised hashing method,
buckets. which tries to learn efficient hash codes by

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 15,2023 at 02:17:58 UTC from IEEE Xplore. Restrictions apply.
726 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 3, MARCH 2016

Fig. 10. Precision–recall curve on tiny image data. (a)–(h) Performances for the hash codes of 8, 12, 16, 24, 32, 48, 64,and 128 bits, respectively.

Fig. 11. Precision of the top 500 returned samples on different dataset using Hamming ranking. (a) CIFAR-10, (b) SIFT1M, (c) LableMe, and
(d) tiny datasets.

simple linear mapping that can handle semantic to false positive rate. As shown in Fig. 6, our method has much
similarity/dissimilarity among the data. better precision–recall curves compared with other methods.
The proposed method can perform best in all cases from
D. Experimental Results Figs. 5–7. Higher precision and recall in Figs. 5–7 demon-
A number of experimental results, in which the number strate the advantage of our method. Moreover, compared with
of bits is varied from 8 to 64, are planned to conduct over other methods, the drop in precision for the proposed method
the CIFAR-10, SIFT-1M, 22K LableMe, and 100K tiny image is much less when the codes increase, which indicates fewer
datasets. The proposed method and other methods are imple- queries for the proposed methods. Fig. 8 presents the results
mented in a MATLAB environment on a modern desktop com- of the six methods on SIFT-1M dataset by exploiting the pre-
puter (3.8 GHz Core 4 Quad Core with 8 GB random access cision and recall curves. It can be observed in Fig. 8 that the
memory) to compute the training time and compression time. results illustrate significant performance improvement using
The comparison of precision and recall curves on CIFAR-10 the proposed method over other methods. In Fig. 9, we show
image datasets is shown in Fig. 5, respectively. Similarly, the comparison of precision–recall curves on LabelMe image
the precision and recall curves for CIFAR-10 are listed in dataset. It can be shown in Fig. 9 that the proposed method
Figs. 6 and 7, respectively. It can be shown in Fig. 5 that the confirms its superiority on search performance compared with
precision for all methods drops significantly when the recall other methods.
increase, and vice versa. This is because the value of precision The experimental results of different methods on tiny image
is sensitive to true positive rate while that of recall is sensitive dataset from 8- to 128-bit can be listed in Fig. 10. We can find

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 15,2023 at 02:17:58 UTC from IEEE Xplore. Restrictions apply.
YE AND LI: COMPACT STRUCTURE HASHING VIA SPARSE AND SIMILARITY PRESERVING EMBEDDING 727

Fig. 12. Training time and compression time on LableMe dataset. (a) Training and (b) compress times.

Fig. 13. Sample top retrieved image for query in (a) using 32 bits. Red rectangle denotes false positive. Best viewed in color. (a) Query, (b) uur method
(precision: 0.69), (c) CH method (precision: 0.36), (d) LSH method (precision: 0.44), (e) SH method (precision: 0.23), (f) MDSH method (precision: 0.38),
(g) USPLH method (precision: 0.58), (i) DSH method (precision: 0.44), and (j) SELVE method (precision: 0.50).

that the proposed method significantly improves the perfor- have similar training cost. However, the training time of
mance on tiny image dataset. This indicates that the proposed USPLH method is longer than other methods since it needs to
method can find the discriminating power and capture the alge- perform eigenvalue decomposition and update pairwise label
braic structure among the data points. In Fig. 11, we present matrix. As shown in Fig. 12(b), SH method needs a little more
the experimental results on four different image datasets to time compared with other methods since the sinusoidal func-
demonstrate the effectiveness of the proposed hashing method. tion in compression process must be calculated. In addition,
It is not very surprising to see that the proposed hashing other methods incur compression time than LSH methods.
method can obtain the best performance over four datasets. Finally, the experimental results of the unsupervised test on
Therefore, the proposed method leads to fewer query failures a sample query from CIFAR-10 dataset are demonstrated in
compared with other methods. Additionally, the comparison Fig. 13. It can be clearly seen from Fig. 13 that the proposed
of computational cost, including compression time and train- method provides more visually consistent search results than
ing time, for different methods is reported in Fig. 12. The other methods.
compression time indicates the encoding time of transform-
ing the original test data into binary codes and the training V. C ONCLUSION
time refers to computational cost of learning the hash func- In this paper, we propose a SSBH framework that can pre-
tions from training data. It can be observed in Fig. 12(a) serve the underlying geometric information among the data.
that LSH method needs negligible training time compared to The proposed objective function of SSBH is composed of
other methods. This is because that the projections in LSH three components: 1) the term combined empirical fitness; 2)
method are not learned but randomly generated. It is not sur- information theoretic regularization; and 3) structure sparsity
prising that the eigenvalue decomposition-based techniques, representation regularization. First, we construct an objective
including SELVE, CH, DSH, SH, MDSH, and our method, function to build a balance between the maximum of empirical

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 15,2023 at 02:17:58 UTC from IEEE Xplore. Restrictions apply.
728 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 46, NO. 3, MARCH 2016

accuracy combined with the information theoretic and the min- [21] G. Shakhnarovich, “Learning task-specific similarity,” Ph.D. dissertation,
imum of sparse reconstruction provided by each bits. Second, Dept. Electr. Eng. Comput. Sci., MIT, Cambridge, MA, USA, 2005.
[22] R. Salakhutdinov and G. Hinton, “Semantic hashing,” Int. J. Approx.
the sparsity is introduced to encode the data domain via local- Reason., vol. 50, no. 7, pp. 969–978, 2009.
ity preserving embedding, which can effectively reflect the [23] B. Kulis and T. Darrell, “Learning to hash with binary reconstructive
intrinsic geometric properties of the data. Finally, L2,1 norm embeddings,” in Proc. Adv. Neural Inf. Process. Syst., Vancouver, BC,
Canada, 2009, pp. 1042–1050.
is introduced into a modified sparse representation frame-
[24] W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang, “Supervised
work so that the discriminate information of the hashing code hashing with kernels,” in Proc. IEEE Conf. Comput. Vis. Pattern
can be preserved in the weight matrix. Experimental results Recognit. (CVPR), Providence, RI, USA, 2012, pp. 2074–2081.
on different datasets show that our method outperforms the [25] K. Grauman and T. Darrell, “Pyramid match hashing: Sub-linear time
indexing over partial correspondences,” in Proc. IEEE Conf. Comput.
state-of-the-art methods at a large margin. Vis. Pattern Recognit. (CVPR), Minneapolis, MN, USA, 2007, pp. 1–8.
[26] H. Xu et al., “Complementary hashing for approximate nearest neighbor
search,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Barcelona, Spain,
R EFERENCES 2013, pp. 1631–1638.
[27] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li, “Multi-probe
[1] M.-S. Chen, M. Lo, P. S. Yu, and H. C. Young, “Applying segmented LSH: Efficient indexing for high-dimensional similarity search,” in Proc.
right-deep trees to pipelining multiple hash joins,” IEEE Trans. Knowl. 33rd Int. Conf. Very Large Data Bases, Vienna, Austria, 2007,
Data Eng., vol. 7, no. 4, pp. 656–668, Aug. 1995. pp. 950–961.
[2] J. Song, Y. Yang, X. Li, Z. Huang, and Y. Yang, “Robust hashing with [28] M. Bawa, T. Condie, and P. Ganesan, “LSH forest: Self-tuning indexes
local models for approximate similarity search,” IEEE Trans. Cybern., for similarity search,” in Proc. 14th Int. Conf. World Wide Web, Chicago,
vol. 44, no. 7, pp. 1225–1236, Jul. 2014. IL, USA, 2005, pp. 651–660.
[3] L. Chen, D. Xu, I. Tsang, and X. Li, “Spectral embedded hashing [29] Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing,” in Proc.
for scalable image retrieval,” IEEE Trans. Cybern., vol. 44, no. 7, Adv. Neural Inf. Process. Syst., Vancouver, BC, Canada, 2008,
pp. 1180–1190, Jul. 2014. pp. 1753–1760.
[4] Y. Yuan, X. Lu, and X. Li, “Learning hash functions using sparse
[30] J. Wang, S. Kumar, and S. Chang, “Semi-supervised hashing for large
reconstruction,” in Proc. ACM Conf. Internet Multimedia Comput.
scale search,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 12,
Serv. (ICIMCS), Xiamen, China, 2014, pp. 14–18.
pp. 2393–2406, Dec. 2012.
[5] P. Ciaccia, M. Patella, and P. Zezula, “An efficient access method for
[31] Y. Mu, J. Shen, and S. Yan, “Weakly-supervised hashing in kernel
similarity search in metric spaces,” in Proc. 23rd Int. Conf. Very Large
space,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Data Bases, vol. 23. Athens, Greece, 1997, pp. 426–435.
San Francisco, CA, USA, 2010, pp. 3344–3351.
[6] C. Silpa-Anan and R. Hartley, “Optimised KD-trees for fast image
[32] B. Kulis, P. Jain, and K. Grauman, “Fast similarity search for learned
descriptor matching,” in Proc. IEEE Conf. Comput. Vis. Pattern
metrics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 12,
Recognit. (CVPR), Anchorage, AK, USA, 2008, pp. 1–8.
pp. 2143–2157, Dec. 2009.
[7] J. L. Bentley, “Multidimensional binary search trees used for associative
searching,” Commun. ACM, vol. 18, no. 9, pp. 509–517, 1975. [33] Y. Weiss, R. Fergus, and A. Torralba, “Multidimensional spectral hash-
ing,” in Proc. 12th Eur. Conf. Comput. Vision (ECCV), Florence, Italy,
[8] A. Beygelzimer, S. Kakade, and J. Langford, “Cover trees for nearest
2012, pp. 340–353.
neighbor,” in Proc. 23rd Int. Conf. Mach. Learn., Pittsburgh, PA, USA,
2006, pp. 97–104. [34] Z. Jin, C. Li, Y. Lin, and D. Cai, “Density sensitive hashing,”
[9] J. K. Uhlmann, “Satisfying general proximity/similarity queries with IEEE Trans. Cybern., vol. 44, no. 8, pp. 1362–1371, Aug. 2014.
metric trees,” Inf. Process. Lett., vol. 40, no. 4, pp. 175–179, 1991. [35] W. Liu, J. Wang, S. Kumar, and S.-F. Chang, “Hashing with graphs,”
[10] C. Wu, J. Zhu, D. Cai, C. Chen, and J. Bu, “Semi-supervised nonlin- in Proc. 28th Int. Conf. Mach. Learn. (ICML), Bellevue, WA, USA,
ear hashing using bootstrap sequential projection learning,” IEEE Trans. 2011, pp. 1–8.
Knowl. Data Eng., vol. 25, no. 6, pp. 1380–1393, Jun. 2013. [36] L. Chen, D. Xu, I. W.-H. Tsang, and X. Li, “Spectral embedded hash-
[11] A. Gionis, P. Indyk, and R. Motwani, “Similarity search in high ing for scalable image retrieval,” IEEE Trans. Cybern., vol. 44, no. 7,
dimensions via hashing,” in Proc. 25th Int. Conf. Very Large Data pp. 1180–1190, Jul. 2014.
Bases (VLDB), vol. 99. Edinburgh, Scotland, 1999, pp. 518–529. [37] D. Zhang, F. Wang, and L. Si, “Composite hashing with multiple infor-
[12] M. Raginsky and S. Lazebnik, “Locality-sensitive binary codes from mation sources,” in Proc. ACM SIGIR Conf. Res. Develop. Inf. Retrieval,
shift-invariant kernels,” in Proc. Adv. Neural Inf. Process. Syst., Beijing, China, 2011, pp. 225–234.
Vancouver, BC, Canada, 2009, pp. 1509–1517. [38] A. Cherian, S. Sra, V. Morellas, and N. Papanikolopoulos, “Efficient
[13] B. Kulis and K. Grauman, “Kernelized locality-sensitive hashing for nearest neighbors via robust sparse hashing,” IEEE Trans. Image
scalable image search,” in Proc. IEEE 12th Int. Conf. Comput. Vis., Process., vol. 23, no. 8, pp. 3646–3655, Aug. 2014.
Kyoto, Japan, 2009, pp. 2130–2137. [39] X. Zhu, L. Zhang, and Z. Huang, “A sparse embedding and least variance
[14] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of encoding approach to hashing,” IEEE Trans. Image Process., vol. 23,
data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, no. 9, pp. 3737–3750, Sep. 2014.
2006. [40] Z. Jin et al., “Fast and accurate hashing via iterative nearest neigh-
[15] Y. Zhen, Y. Gao, R. Ji, D. Yeung, and X. Li, “Spectral multimodal hash- bors expansion,” IEEE Trans. Cybern., vol. 44, no. 11, pp. 2167–2177,
ing and its application to multimedia retrieval,” IEEE Trans. Cybern., Nov. 2014.
vol. 44, no. 7, pp. 1225–1236, Jul. 2014. [41] C. Ding, D. Zhou, X. He, and H. Zha, “R1-PCA: Rotational
[16] Y. Gong and S. Lazebnik, “Iterative quantization: A procrustean invariant L1-norm principal component analysis for robust subspace
approach to learning binary codes,” in Proc. IEEE Conf. Comput. factorization,” in Proc. 23rd Int. Conf. Mach. Learn., Pittsburgh, PA,
Vis. Pattern Recognit. (CVPR), Providence, RI, USA, 2011, USA, 2006, pp. 281–288.
pp. 817–824. [42] X. Lu, H. Yuan, P. Yan, Y. Yuan, and X. Li, “Geometry constrained
[17] J. Wang, S. Kumar, and S.-F. Chang, “Sequential projection learn- sparse coding for single image super-resolution,” in Proc. IEEE Conf.
ing for hashing with compact codes,” in Proc. 27th Int. Conf. Mach. Comput. Vis. Pattern Recognit. (CVPR), Providence, RI, USA, 2012,
Learn. (ICML), Haifa, Israel, 2010, pp. 1127–1134. pp. 1648–1655.
[18] X. Liu, Y. Mu, D. Zhang, B. Lang, and X. Li, “Large-scale unsupervised [43] X. Lu, Y. Yuan, and P. Yan, “Image super-resolution via double spar-
hashing with shared structure learning,” IEEE Trans. Cybern., vol. 45, sity regularized manifold learning,” IEEE Trans. Circuits Syst. Video
no. 3, pp. 358–369, Mar. 2015. Technol., vol. 23, no. 12, pp. 2022–2033, Dec. 2013.
[19] P. Jain, B. Kulis, and K. Grauman, “Fast image search for learned [44] X. Lu, Y. Wang, and Y. Yuan, “Sparse coding from a Bayesian
metrics,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), perspective,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 6,
Anchorage, AK, USA, 2008, pp. 1–8. pp. 929–939, Jun. 2013.
[20] J. Wang, S. Kumar, and S.-F. Chang, “Semi-supervised hashing for [45] X. Lu, Y. Yuan, and P. Yuan, “Alternatively constrained dictionary learn-
scalable image retrieval,” in Proc. IEEE Conf. Comput. Vis. Pattern ing for image super-resolution,” IEEE Trans. Cybern., vol. 44, no. 3,
Recognit. (CVPR), San Francisco, CA, USA, 2010, pp. 3424–3431. pp. 366–377, Mar. 2014.

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 15,2023 at 02:17:58 UTC from IEEE Xplore. Restrictions apply.
YE AND LI: COMPACT STRUCTURE HASHING VIA SPARSE AND SIMILARITY PRESERVING EMBEDDING 729

[46] A. Torralba, R. Fergus, and W. T. Freeman, “80 million tiny images: Xuelong Li (M’02–SM’07–F’12) is a Full Professor with the Center for
A large data set for nonparametric object and scene recognition,” Optical Imagery Analysis and Learning, State Key Laboratory of Transient
IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 11, pp. 1958–1970, Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics,
Nov. 2008. Chinese Academy of Sciences, Xi’an, China.
[47] D. G. Lowe, “Object recognition from local scale-invariant features,”
in Proc. 7th IEEE Int. Conf. Comput. Vision, vol. 2. Kerkyra, Greece,
1999, pp. 1150–1157.
[48] A. Torralba, R. Fergus, and Y. Weiss, “Small codes and large image
databases for recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. (CVPR), Anchorage, AK, USA, 2008, pp. 1–8.

Renzhen Ye is currently pursuing the Ph.D. degree


with the Center for Optical Imagery Analysis
and Learning, State Key Laboratory of Transient
Optics and Photonics, Xi’an Institute of Optics and
Precision Mechanics, Chinese Academy of Sciences,
Xi’an, China.
She is an Associate Professor with the Department
of Mathematics, Huazhong Agricultural University,
Wuhan, China. Her current research interests include
partial differential equations, mathematical mech-
anization and mathematical physics, and machine
learning.

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on December 15,2023 at 02:17:58 UTC from IEEE Xplore. Restrictions apply.

You might also like