0% found this document useful (0 votes)
6 views8 pages

Binary Code Ranking With Weighted Hamming Distance

Uploaded by

xingyanzhou687
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views8 pages

Binary Code Ranking With Weighted Hamming Distance

Uploaded by

xingyanzhou687
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

2013 IEEE Conference on Computer Vision and Pattern Recognition

Binary Code Ranking with Weighted Hamming Distance

Lei Zhang1,2 , Yongdong Zhang1 , Jinhui Tang3 , Ke Lu2 , Qi Tian4


1
Institute of Computing Technology, Chinese Academy Sciences, No.6 Kexueyuan South Road, Beijing, China
2
University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, China
3
Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, China
4
University of Texas at San Antonio, Texas, TX, USA
{zhanglei09, zhyd}@ict.ac.cn, [email protected], [email protected], [email protected]

Abstract Hamming space. Given a dataset, binary hashing gener-


ates binary code for each data point and approximates the
Binary hashing has been widely used for efficient simi- distance or similarity of two points by the Hamming dis-
larity search due to its query and storage efficiency. In most tance between their binary codes, which means most hash-
existing binary hashing methods, the high-dimensional da- ing methods rank the returned results based on their Ham-
ta are embedded into Hamming space and the distance or ming distances to query. This distance measure is widely
similarity of two points are approximated by the Hamming used because of the calculation efficiency. However, since
distance between their binary codes. The Hamming dis- the Hamming distance is discrete and bounded by the code
tance calculation is efficient, however, in practice, there are length, in practice, there will be a lot of data points sharing
often lots of results sharing the same Hamming distance to the same Hamming distance to the query and the ranking of
a query, which makes this distance measure ambiguous and these data points is ambiguous, which poses a critical issue
poses a critical issue for similarity search where ranking is for similarity search, e.g. k nearest neighbor search, where
important. In this paper, we propose a weighted Hamming ranking is important. As a result, most existing binary hash-
distance ranking algorithm (WhRank) to rank the binary ing methods lack in providing a good ranking of results.
codes of hashing methods. By assigning different bit-level In this paper, we propose a weighted Hamming distance
weights to different hash bits, the returned binary codes ranking algorithm (WhRank) to improve the ranking perfor-
are ranked at a finer-grained binary code level. We give mance of binary hashing methods. By assigning different
an algorithm to learn the data-adaptive and query-sensitive bit-level weights to different hash bits, it is possible to rank
weight for each hash bit. Evaluations on two large-scale two binary codes sharing the same Hamming distance to a
image data sets demonstrate the efficacy of our weighted query at a finer-grained binary code level, and gives binary
Hamming distance for binary code ranking. hashing methods the ability to distinguish between the rela-
tive importance of different bits. We also give an algorithm
to learn a set of dynamic bit-level weights of hash bits for a
1. Introduction given query. By taking account of the information provided
by the hash functions and dataset, we learn a set of data-
High-dimensional similarity search is a fundamental adaptive and query-sensitive bit-level weights to reveal the
problem in many content-based search systems [20, 23] relative importance of different hash bits.
and it has been widely used in many related application The rest of this paper is organized as follows. The relat-
areas, such as machine learning, computer vision and da- ed work is discussed in Section 2. The weighted Hamming
ta mining. To solve this problem efficiently, many meth- distance ranking algorithm is proposed in Section 3 and an-
ods have been proposed, such as KD-Tree [2][18] and Lo- alyzed in Section 4. Section 5 describes our experiments
cality Sensitive Hashing (LSH) [1]. Recently, binary hash- and Section 6 concludes this paper.
ing [20, 21, 16, 14, 12, 3, 6, 11] is becoming increasingly
popular for efficient approximated nearest neighbor (ANN)
2. Related Work
search due to its good query and storage efficiency.
The goal of binary hashing is to learn binary represen- With the proliferation of various kinds of data, e.g. mu-
tations for data such that the neighborhood structure in the sic, image and video, in content-based search systems, fast
original data space can be preserved after embedded into similarity search has attracted a significant attention. One

1063-6919/13 $26.00 © 2013 IEEE 1584


1586
DOI 10.1109/CVPR.2013.208
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on April 18,2024 at 02:19:14 UTC from IEEE Xplore. Restrictions apply.
classical kind of methods to address this problem is the tree- ing different bit-level weights to different hash bits. The
based index, such as KD-Tree [2][18]. However, this kind weighted Hamming distance has been used for image re-
of methods cannot work well for high-dimensional data s- trieve, including Hamming distance weighting [4] and the
ince the performance degrade significantly to linear scan AnnoSearch [20]. In [20], each bit of the binary code is
as the dimensionality increases. Recently, hashing based assigned with a bit-level weight, while in [4], the aim is
methods [5, 3, 24] has been widely used for efficient simi- to weight the overall Hamming distance of local features
larity search in a large variety of applications due to its effi- for image matching. In these works, only a single set of
ciency in terms of query speed and storage space. The goal weights is used to measure either the importance of each
of binary hashing is to map each dataset point to a compact bit in Hamming space [20], or to rescale the Hamming dis-
binary code such that similar data points in the original data tance for better image matching [4]. In [7], Jiang et al. pro-
space can be mapped to similar binary codes in Hamming pose a query-adaptive Hamming distance for image retrieve
space. One of the representative methods is Locality Sensi- which assigns dynamic weights to hash bits, such that each
tive Hashing (LSH) [1] and its variants [15, 9, 14, 6]. The- bit is treated differently and dynamically. They harness a
oretically, LSH-related methods usually require long code set of semantic concept classes that cover most semantic el-
to achieve good precision. However, long code results in ements of image content. Then, different weights for each
low recall since the collision probability of similar points of the classes are learned with a supervised learning algo-
mapped to similar binary codes decreases exponentially as rithm. To compute the bit-level weights for a given query, a
the code length increases. As a result, LSH-related methods k nearest neighbor search is performed based on the origi-
usually construct multi-tables to ensure a reasonable proba- nal Hamming distance first, then a linear combination of the
bility that a query will collide with its near neighbors in at weights of classes contained in the result list is used as the
least one of the tables, which leads to a long query time and query-adaptive weights.
increases the memory occupation. To generate more com- In [22], the authors propose a query-sensitive hash code
pact binary code, many algorithms have been proposed. Se- ranking algorithm (QsRank) for PCA-based hashing meth-
mantic Hashing [17] adopts a deep generative model based ods. Given a query, QsRank assigns two weights to each
on restricted Boltzmann machine to learn the hash function- hash bit and defines a score function to measure the confi-
s that map similar points to similar binary codes. Spectral dence of the neighbors of a query mapped to a binary code.
Hashing (SPH) [21] uses spectral graph partitioning strate- The returned codes are ranked based on the their scores.
gy for hash function learning and uses the simple analytical Experimental results demonstrate the efficacy of QsRank.
eigenfunction solution of 1-D Laplacians as the hash func- There are three key differences between QsRank and our
tion. In PCA-Hashing (PCAH)[20], the eigenvectors corre- method WhRank. First, QsRank is developed only for PCA-
sponding to the largest eigenvalues of the dataset covariance based binary hashing while WhRank can be applied to most
matrix are used to get binary codes. In [3], Iterative Quan- existing binary hashing methods. Second, QsRank is devel-
tization (ITQ) is proposed to learn an orthogonal rotation oped for -neighbor search, which requires the structure of
matrix to refine the initial PCA-projection matrix of PCAH the original data space maintained well after dimension re-
to minimize the quantization error of mapping the data from duction, while WhRank does not. Third, QsRank makes a
original data space to Hamming space. To minimize the re- strong assumption about the data distribution (uniform dis-
construction error between the distances in the original data tribution), while WhRank only makes an assumption about
space and the Hamming distances of the corresponding bi- the distribution of the differences between a query and its
nary codes, the Binary Reconstruction Embedding (BRE) neighbors, which is more appropriate in most cases.
is proposed in [8]. Moreover, to exploit the spectral prop-
erties of the data affinity to generate better binary codes, 3. Ranking with Weighted Hamming Distance
many other algorithms, such as Semi-Supervised Sequen-
tial Projection Hashing [19], Anchor Graph Hashing [12] In this section, we present the weighted Hamming dis-
and Kernel-Based Supervised Hashing [11] have been de- tance ranking algorithm. In most binary hashing algorithms,
veloped and give commendable search performances. the distance between two points is simply measured by the
Hamming distance between their binary codes. This dis-
In most existing binary hashing methods, including those tance metric is somewhat ambiguous,
  since for a K-bits bi-
methods discussed above, the returned results of a given nary code H(p), there are K m different binary codes shar-
query are simply ranked based on their Hamming distance ing the same distance m to H(p). In most binary hash-
to the query. The calculation of Hamming distance is ef- ing algorithms, each hash bit takes the same weight and
ficient, however, since this distance metric gives each hash makes the same contribution to distance calculation. On
bit the same weight, it unable to distinguish between the the contrary, in our algorithm, we give different bits differ-
relative importance of different bits and causes ambiguity ent weights. With the bit-level weights, the returned binary
for ranking. One way to alleviate this ambiguity is assign- codes can be ranked by the weighted Hamming distance at

1587
1585

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on April 18,2024 at 02:19:14 UTC from IEEE Xplore. Restrictions apply.
a finer-grained binary code level rather than at the original 5000
ITQ_32
7000
SPH_32

integer Hamming distance level. The bit-level weight asso- 6000


4000
ciated with hash bit k is denoted as ωk . In the following, we 5000

will show that an effective bit-level weight is not only data- 3000
4000

dependent, but also query-dependent. Note that, our algo- 2000


3000

rithm is not to propose a new binary hashing method, but to 2000


1000
give a ranking algorithm to improve the search accuracy of 1000

most existing binary hashing methods. Some notations are 0


−4 −2 0 2 4
0
−4 −2 0 2 4
s(q,N(q)) s(q,N(q))
given below to facilitate our discussion.
Given a dataset X = {x(i) }N i=1 ∈ R , the neighbor set
d
Figure 1. Histograms of the differences between the unbinarized
of x is denoted as N (x). The paradigm of binary hashing hash values of a query and its neighbors, generated by ITQ [3] and
is to first use a set of linear or non-linear hash functions SPH [21]. Dataset for illustration is ANN-SIFT1M [5].
F = {fk : Rd → R}K k=1 to map x ∈ X to F (x) ∈ R ,
K
T
and then binarize F (x) = (f1 (x), · · · , fK (x)) by com-
paring each fk (x) with a threshold Tk to get a K-bit binary However, since hk (p) and hk (q) are binarized, too much
code H(x) ∈ {0, 1}K . Hence, the binary hash function is useful information is lost. An alternative is to use the dis-
hk (x) = sgn(fk (x) − Tk ). We call fk (x) the unbinarized tribution of the difference between their unbinarized hash
hash value. Each dimension of H(x) is called a hash bit, values, i.e. sk (p, q) = fk (p) − fk (q), to reveal the discrim-
and for a query q and its neighbor p, if the k-th bit of H(q) inating power. If sk (p, q) is distributed in a small interval
and H(p) is different, we call there is a bit-flipping on hash centered around 0, then the probability of hk (p) = hk (q) is
bit k. The weighted Hamming distance between two binary high, yielding a high discriminative hash bit k. Fig. 1 gives
codes h(1) and h(2) is denoted as DH w
(h(1) , h(2) ). the distribution of sk (p, q) for ITQ [3] and SPH [21] using
32 bits binary code. As can be seen from this figure, the
3.1. Data-Adaptive Weight distributions are all Bell-shaped, as all these binary hash-
ing methods try to minimize the distances between similar
We introduce a term discriminating power to denote the points after hashing. As a result, ωk is a function of the dis-
ability of a hash function hk (x) mapping similar data points tribution of sk , which is parameterized by the its mean μk
to the same bit (0/1). A hash function hk (x) is called dis- and standard deviation σk :
criminative if the probability of similar data points mapped
to the same bit by hk (x) is not small (> 0.5). The more ωk = g(μk , σk ) (1)
discriminative hk (x), the more discriminative hash bit k.
Obviously, the discriminating power of a hash function is Note that, hash bit k is more discriminative if σk is smaller,
dependent on the algorithm generates it and the dataset used therefore, ωk = g(μk , σk ) should be monotonically non-
for training. In many binary hashing methods, e.g. PCAH increasing w.r.t. σk . An illustration for ITQ is given in the
[20], SPH [21], ITQ [3] and AGH [12], the discriminating Right of Fig. 2, as shown, the probability of bit-flipping on
power of different hash function is intrinsically different. hash bit k increases with the standard deviation σk .
For a hash function with a stronger discriminating power,
it’s less likely for this hash function to generate different 3.2. Query-Sensitive Weight
bits for two neighbor points. In other word, for a query q Meanwhile, for a specified data point q, the probabili-
and two data points p(1) , p(2) sharing the same Hamming ty of its neighbor p mapped to a bit different from Hk (q)
distance (1) to q, where H(p(1) ) and H(p(2) ) are different by hash function hk (x) is also dependent on q itself. Intu-
with H(q) on hash bits k1 and k2 respectively. If hash bit k1 itively, if |fk (q) − Tk | is small, then after adding a random
is more discriminative than k2 , then p(1) is considered to be noise ñ to q, it’s more likely that fk (q + ñ) lies on the op-
less similar with q than p(2) , since the bit-flipping on hash posite side of Tk as compared to fk (q), which means the
bit k1 gives a higher confidence that p(1) is not a neighbor probability of hk (p) differs from hk (q) for p ∈ N (q) is
of q than that on k2 . To make DH w
(H(q), H(p(1) )) larger high. A simple example of this intuition is shown in the
(2)
than DH (H(q), H(p )), ωk1 should be larger than ωk2 ,
w
Left of Fig. 3. A query q is mapped to “101”, the binary
which means the more discriminative a hash bit k is, the codes of p1 , p2 and p3 are “001”, “111”, “110” respective-
larger the associated weight ωk is. ly. Based on the Hamming distance, the result list of q is
As the discriminating power of a hash bit k (i.e. hash “001”, “111”, “110”. However, it’s more suitable to rank
function hk (x)) is related to the probability of similar the hash code “110” and “111” before “001” because q is
points mapped to the same bit by hk (x), given a hash func- far from the threshold of hash function h1 , it’s less likely
tion hk (x), we can use the distribution of hk (p) − hk (q), for a near neighbor of q lies on the opposite side of h1 .
where p ∈ N (q), to reveal how discriminative hash bit k is. Moreover, as shown in the Left of Fig. 2, the probability of

1588
1586

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on April 18,2024 at 02:19:14 UTC from IEEE Xplore. Restrictions apply.
1
0.9
1
0.9
h2 h1 sk=fk(p)-fk(q)
0.8 0.8
111
p2
Pr(bit-flipping)

Pr(bit-flipping)
0.7 0.7
0.6 0.6 h3
0.5 0.5
0.4 0.4 p1 q p3
0.3 0.3
0.2 0.2
001 101 110
0.1 0.1 Pr(Δhk(q)≠0) Pr(Δhk(q)=0)
0 0
0 0.05 0.1 0.15 0.2 0.25 0.3
distance to threshold
0.35 0.02 0.04 0.06 0.08 0.1 0.12
standard variance
0.14
Tk fk(q) fk(x)

Figure 2. The probability of a query q’s neighbor, p, mapped to a Figure 3. The Left gives an example where Hamming distance
bit different from h(q) by hash function h(x) = sgn(f (x) − T ). causes ambiguity for binary code ranking. The Right illustrates
The abscissa of the Left is |f (q)−T |, and the abscissa of the Right the probability of a neighbor of q mapped to a different bit by
is the standard variance of the distribution of f (p) − f (q). hash function fk (x), Tk is the binary threshold.

bit-flipping increases with |fk (q) − Tk | for the most part. Assume all the hash bits are independent [22], we have:
As a result, ωk (1) is not only dependent on the hash func- 
tion hk (x), but also dependent on the specified data point q, Pr(h|H(q)) = Pr(hk = hk (q)) ∗
and on this account, ωk is also a function of q. As a result, k
eq. (1) can be rewritten as: 
Pr(hk = hk (q)) (4)
k
ωk (q) = g(μk , σk , q) (2)
where Pr(hk = hk (q)) (denoted by Pr(Δhk (q) = 0))
Moreover, the smaller |fk (q) − Tk |, the larger the probabil- is the probability of hash bit k of h flipped as compared
ity of bit-flipping on hash bit k, thus the smaller ωk . There- with that of H(q), and Pr(hk = hk (q)) (denoted by
fore, ωk (q) should also be monotonically non-decreasing Pr(Δhk (q) = 0)) is the probability of hash bit k of h not
w.r.t. |fk (q) − Tk |. flipped . Apparently, these two probabilities are dependent
on the specified query q and the hash function hk (x).
3.3. Dynamic Bit-level Weighting
Since the weighted Hamming distance is used for rank-
In the previous sections 3.1 and 3.2, we show that an ing, the ranking of each DH w
(H(q), h) is more crucial
effective bit-level weight is not only data-dependent, but than their actualvalues. Therefore, by dividing each
K
also query-dependent. In this section, we give a simple Pr(h|H(q)) by k=1 Pr(Δhk (q) = 0) without chang-
method to calculate the data-adaptive and query-sensitive ing the ranking of each DH w
(H(q), h), we get a modified
bit-level weight ωk (q) of each hash bit k for a given query weighted Hamming distance:
q, and we will show that ωk (q) satisfies the abovemen- 
tioned constraints theoretically. The intuition behind our w
DH (H(q), h) = λk (q) (5)
method is: given a query q and two binary codes h(1) , h(2) , k∈S
after adding a random noise ñ to q, if the probability of
H(q + ñ) = h(1) , denoted as Pr(h(1) |H(q)), is larger where S is the set of hash bits in h differ from H(q), and
than Pr(h(2) |H(q)), then the data points mapped to h(1)
are considered to be more similar neighbors of q rather than Pr(Δhk (q) = 0) 1 − Pr(Δhk (q) = 0)
λk (q) = log = log
those mapped to h(2) , which means the weighted Hamming Pr(Δhk (q) = 0) Pr(Δhk (q) = 0)
distance DH w
(H(q), h(1) ) is smaller than DH
w
(H(q), h(2) ). (6)
Therefore, given a query q and a binary code h, a func- Equation (6) is a monotonically decreasing function w.r.t.
tion parameterized by Pr(h|H(q)) is used as a probabilis- Pr(Δhk (q) = 0). The smaller Pr(Δhk (q) = 0), the s-
w
tic interpretation of DH (H(q), h). This function should be maller the probability of a data point p ∈ N (q) mapped to
monotonically non-increasing w.r.t. Pr(h|H(q)). Further- a different bit by hk (x), thus the more discriminative hash
more, if Pr(h|H(q)) ≈ 1, DH w
(H(q), h) should be small, bit k. Therefore, λk (q) satisfies the constraints for data-
and if Pr(h|H(q)) ≈ 0, DH (H(q), h) should be relative-
w adaptive weight introduced in Section 3.1.
ly large. A famous function satisfies these constraints is To calculate Pr(Δhk (q) = 0) or Pr(Δhk (q) = 0), the
the Information Entropy. As a result, given a query q, the distribution of hk (q + ñ) − hk (q) is essential. Based on
weighted Hamming distance between H(q) and a binary our discussion in Section 3.2, we can use the distribution of
code h is defined as follows: s(q + ñ, q) = fk (q + ñ) − fk (q) with density function
pdfk (s) to estimate Pr(Δhk (q) = 0). The Right of Fig. 3
w
DH (H(q), h) ≈ − log Pr(h|H(q)) (3) shows the probability of a neighbor of q, p, mapped to a bit

1589
1587

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on April 18,2024 at 02:19:14 UTC from IEEE Xplore. Restrictions apply.
different from hk (q). If fk (q) > Tk , we have: To learn the μk and σk of a hash function hk (x), we
construct a training set consists of s query points, each of
Pr(Δhk (q) = 0) = Pr(fk (p) < Tk ) which has m neighbors. The complexity of calculating the
= Pr(sk (p, q) ≤ Tk − fk (q)) unbinarized hash values of each query and its neighbors is
 Tk −fk (q) almost O(s(m + 1)d), and the complexity of calculating μk
= pdfk (s)ds (7) and σk is bounded by O(3sm). Therefore, the overall train-
−∞
ing complexity of our parameters learning stage is bounded
The Gaussian distribution assumption for pdfk is used by O(K ∗ s(md + d + 3m)) ≈ O(Ksmd).
for PCAH, LSH and ITQ in our experiments, since they are
all Gaussian-like distributions as shown in Fig. 1. There- 5. Experiments
fore, pdfk (s) = N (μk , σk ), and if fk (q) > Tk , we have:
5.1. Experimental Setup

1 Tk − fk (q) − μk Our experiments are carried out on two benchmark
Pr(Δhk (q) = 0) = 1 + erf( √ ) (8)
2 σk 2 datasets: MINST70K and ANN-SIFT1M. The MNIST70K
[10] consists of 70K 784-dimensional images, each of
Similarly, for fk (q) < Tk :
which is associated with a digit label from ‘0’ to ‘9’, and is
 split into a database set (i.e. training set, 60K) and a query
1 Tk − fk (q) − μk
Pr(Δhk (q) = 0) = 1 − erf( √ ) (9) set (10K). The ANN-SIFT1M [5] consists of 1M images
2 σk 2
each represented as a 128-dimensional SIFT descriptors
where erf is the Gauss error function. For SPH and AGH, [13]. It contains three vector subsets: learning set (100K),
we use the Laplace distribution assumption for pdf√ k , thus database set (1M) and query set (10K). The learning subset
pdfk = exp {|x − μk |/bk } /2bk , where bk = σk / 2. is retrieved from Flickr images and the database and query
In our experiments, we set ωk (q) = λk (q) and de- subsets are extracted from the INRIA Holidays images [4].
note this weighting scheme as WhRank. Given a query As stated in Section 3.3, our methods can be applied
q, first the unbinarized hash value fk (q) of each hash bit to different kinds of binary hashing methods. In our ex-
k is calculated. Then, the adaptive weight ωk (q) is calcu- periments, some representative hashing methods, Locality
lated using eq. (8)(9)(6). Apparently, the larger σk , the Sensitive Hashing (LSH) [1], PCA Hashing (PCAH) [20],
smaller ωk (q). Moreover, the smaller |fk (q) − Tk |, the Iterative Quantization (ITQ) [3], Spectral Hashing (SPH)
larger Pr(Δhk (q) = 0), thus the smaller ωk (q). There- [21] and Anchor Graph Hashing (AGH) [12], are chosen
fore, ωk (q) satisfies the constraints for data-adaptive and to evaluate the effectiveness of WhRank. The source codes
query-sensitive weight introduced in Section 3.1 and 3.2. generously provided by the authors and the recommended
For the Laplace distribution assumption and the Student’s t- parameters settings in their papers, are used in our experi-
distribution assumption used in our experiments, these dis- ments. For AGH, the number of anchors is set to 500 and the
cussions still hold. Another straightforward dynamic bit- number of nearest neighbors for anchor graph constructing
level weighting is setting ωk (q) = |Tk − f (q)| /σk . In is set to 2 for MINST70K and 5 for ANN-SIFT1M, respec-
our experiments, we use this weighting scheme as a natu- tively. Note that, the hash functions of LSH, PCAH and
ral baseline and denote it as WhRank1. Note that, since we ITQ are linear, while those of SPH and AGH are nonlin-
make no assumption about the hashing method used in the ear. Experimental results in Section 5.2 show that, WhRank
bit-level weights learning, our algorithm, WhRank, can be is applicable to both linear and nonlinear hashing method-
applied to different kinds of hashing methods. s. Moreover, we also compare our algorithm with QsRank
[22], a novel ranking algorithm for binary code. Since Qs-
4. Analysis Rank is developed only for PCA-based hashing methods,
the comparisons are carried out on PCAH and ITQ.
As shown in eq. (5), given a query q and a bina-
Given a query, by ranking with traditional Hamming dis-
ry code h, DH w
(H(q), h) can be calculated efficiently as:
tance and our weighted Hamming distance, the returned
ω (q)(H(q) ⊗ h), where ⊗ means the xor of two binary
T
top N nearest neighbors and the rankings are both dif-
codes and ω(q) = (ω1 (q), ω2 (q), · · · , ωK (q))T . While
ferent. The efficacy of WhRank can be measured by the
the weighted distances can now be calculated by inner-
Precision@N , Recall@N and the distance error ratio@N
product operation, it is actually possible to avoid this com-
[15] defined as:
putational cost by computing the traditional Hamming dis-
tance first, and then ranking the returned binary codes based the number of similar points in top N
on their weighted-Hamming distances to H(q). Therefore, Precision@N =
N
the ranking of the returned binary codes can be obtained the number of similar points in top N
with minor additional cost. Recall@N =
the number of all similar points

1590
1588

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on April 18,2024 at 02:19:14 UTC from IEEE Xplore. Restrictions apply.
(a) (b) distribution is used as the distribution assumption, while for
LSH PCAH

0.8
LSH−WhRank
0.8
PCAH−WhRank SPH and AGH, the Laplace distribution is used as the dis-
LSH−WhRank1 PCAH−WhRank1
SPH PCAH−QsRank
tribution assumption.
0.7 0.7
SPH−WhRank ITQ
Fig. 4 gives the Precision@N on MINST70K using 32
Precision@N

SPH−WhRank1 ITQ−WhRank
0.6 AGH 0.6 ITQ−WhRank1 bits binary code. For clarity, the results are shown in t-
AGH−WhRank ITQ−QsRank
0.5 AGH−WhRank1 0.5 wo parts. It is easy to find out that, by ranking with our
weighted Hamming distance (WhRank), all baseline hash-
0.4 0.4
ing methods achieve a better search performance. On aver-
0.3 0.3
age, we get a 5% higher precision for each hashing method.
0 1000 2000 3000
top−N
4000 5000 0 1000 2000 3000
top−N
4000 5000
For SPH and PCAH, the improvements are even higher (al-
most 10%). Meanwhile, as shown in this figure, each base-
Figure 4. Evaluations of Precision@N of WhRank, WhRank1 and line method combined with WhRank1 also achieves a rea-
QsRank on MINST70K using 32 bits binary code. As shown in sonable good performance improvement, and the improve-
this figure, by applying WhRank for query results ranking, the re- ment is a little inferior than that of WhRank (2% on aver-
trieve accuracy of each method is improved. age). In our subsequent experiments, this result still holds.
Therefore, the results of WhRank1 are not given in subse-
quent figures for the sake of clarity.
1   d(q, nk ) − d(q, n∗k )
N
error ratio@N = Fig. 5 gives the Precision@N on MINST70K under d-
N |Q| d(q, n∗k ) ifferent code lengths. Once again, we can easily find out
q∈Q k=1
that the performance of each baseline hashing method is
where q ∈ Q is a query, nk is the k-th nearest neighbor in
improved when combined with WhRank. Moreover, as can
the ranked results, and n∗k is the actual k-th nearest neighbor
be seen from Fig. 4 and Fig. 5(c), even with a relative-
of q in the database set. For MINST70K, a returned point
ly short binary code (32 bits), the retrieval accuracy of each
is considered as a true neighbor of a query if they share
baseline method combined with WhRank is almost the same
the same digit label. For ANN-SIFT1M, we use the same
as, sometimes better than, that of the baseline method itself
criterion as in [19]: a returned point is considered to be a
with a binary code of larger size (64 bits, 96 bits).
true neighbor if it lies in the top 1% points closest to the
query in terms of Euclidean distance in the original space. In the experiments on ANN-SIFT1M, for distribution pa-
rameters estimation, we randomly sample 100 points from
5.2. Experimental Results the query set as the training set, and for each training sam-
ple, we find its top 5,000 nearest neighbors in the database
To demonstrate the efficacy of applying our weighted set, measured by the Euclidean distance. For LSH, PCAH
Hamming distance for ranking, given a query, the returned and ITQ, we still use the Gaussian distribution as the dis-
results of each baseline hashing method are ranked by their tribution assumption. For SPH, the Laplace distribution is
traditional Hamming distance and the weighted Hamming used as the distribution assumption. For AGH, the Student’s
distance to the query respectively. The Precision@N and t-distribution is used as the distribution assumption.
distance error ratio@N of each ranked result list are re-
Since the neighborhood relationship of a data pair in
ported to show the efficacy of WhRank (the number of
ANN-SIFT1M is defined based on the Euclidean distance,
returned results is predefined in our experiments, a high-
we use Precision@N and distance error ratio@N to show
er Precision@N means a higher Recall@N , thus only the
the efficacy of ranking with our weighted Hamming dis-
Precision@N is reported.). Note that, ranking with the
tance. Fig. 6 and Fig. 7 give the evaluations of Recall@N
weighted Hamming distance is only performed to the result-
and Distance error ratio@N on ANN-SIFT1M under differ-
s returned by computing the traditional Hamming distance,
ent code lengths, respectively. As shown in these two fig-
so the additional computational cost is minor.
ures, when combined with WhRank, each methods achieves
Since MINST70K is fully annotated, we can use the
a 10% higher precision on average. Moreover, the distance
Precision@N and Recall@N to show the efficacy of
error ratio of each baseline method reduces 40% as com-
WhRank. The dataset is first embedded into Hamming s-
pared with the original. The experimental results demon-
pace using each baseline hashing method. After that, from
strate that applying WhRank to existing hashing methods
each digit class, we randomly sample 50 images from the
yielding a more accurate similarity search result.
query set constituting a subset contains 500 images. For
each training image, we find its 1,000 neighbors in the We also compare our algorithm with QsRank [22]. Since
dataset based on their digit labels. The training set and the QsRank is developed only for PCA-based hashing method,
corresponding neighbors are used for distribution parame- The comparisons are carried out on PCAH [20] and ITQ
ters estimation. The rest of the query set is used as queries [3]. As QsRank is designed for -neighbor search, in our
in our experiments. For LSH, PCAH and ITQ, the Gaussian experiments on MINST70K, given a query q and N , the

1591
1589

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on April 18,2024 at 02:19:14 UTC from IEEE Xplore. Restrictions apply.
(a) 48 bits (b) 64 bits (c) 96 bits

LSH
LSH−WhRank
0.8 0.8 0.8 PCAH
PCAH−WhRank
0.7 0.7 0.7 PCAH−QsRank
SPH

Precision@N
SPH−WhRank
0.6 0.6 0.6
ITQ
ITQ−WhRank
0.5 0.5 0.5 ITQ−QsRank
AGH
0.4 0.4 0.4 AGH−WhRank

0.3 0.3 0.3

0.2 0.2 0.2


0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
top−N top−N top−N

Figure 5. Evaluations of Precision@N of WhRank and QsRank on MINST70K. Code lengths: (a) 48 bits; (b) 64 bits; (c) 96 bits. As
shown, the retrieve accuracy of each baseline method is improved when combined with WhRank under different code lengths.

(a) 32 bits (b) 48 bits (c) 64 bits (d) 96 bits


1 1
LSH
0.8 0.9
0.9 0.9 LSH−WhRank
PCAH
0.7 0.8
0.8 0.8 PCAH−WhRank
PCAH−QsRank
0.6 0.7
Precision@N

0.7 0.7 SPH


SPH−WhRank
0.5 0.6
0.6 0.6 ITQ
ITQ−WhRank
0.4 0.5 0.5 0.5 ITQ−QsRank
AGH
0.3 0.4 0.4 0.4
AGH−WhRank

0.2 0.3 0.3 0.3

0 5000 10000 0 5000 10000 0 5000 10000 0 5000 10000


top−N top−N top−N top−N

Figure 6. Evaluations of Precision@N of WhRank and QsRank on ANN-SIFT1M. Code lengths: (a) 32; (b) 48; (c) 64; (d) 96. The
retrieve accuracy of each baseline method is improved when combined with WhRank under each code length setting. Moreover, the
retrieve accuracy of each method combined with WhRank is as good as, sometimes better than, that of combined with QsRank.

search radius is set to the mean of the distance between q ity. When applied to existing hashing methods, different
and its all neighbors. On ANN-SIFT1M, the radius is set to bit-level weights are assigned to different hash bits, and the
the distance between q and its actual N -th nearest neighbor returned results can be ranked at a finer-grained binary code
in the database set. The comparison results are reported in level rather than at the original integer Hamming distance
Fig. 4(b) to Fig. 7. As shown in these figures, the perfor- level. We demonstrate that an effective bit-level weight is
mance improvements of our algorithm are as good as, some- not only data-dependent but also query-dependent, and give
times better than, those of QsRank. One remarkable advan- a simple yet effective algorithm to learn the weights.
tage of WhRank over QsRank is that, the ranking model of
The experimental results on two large-scale image
WhRank is more general, thus WhRank is also applicable to
datasets containing up to one million high-dimensional data
other non-PCA-based hashing methods, e.g. SPH and AGH.
points demonstrate the efficacy of WhRank. The search per-
Furthermore, WhRank can be easily applied to -neighbor
formances of all evaluated hashing methods are improved
search, while QsRank is not very effective for nearest neigh-
when combined with WhRank. Moreover, as compared
bor search since the distance between a query and its nearest
with QsRank, a novel ranking algorithm for binary code,
neighbor is often unknown in practice.
the performance improvements of WhRank are as good as
6. Conclusion (sometimes better than) those of QsRank. There are two
remarkable advantages of WhRank over QsRank. First,
Most existing binary hashing methods rank the returned WhRank can be applied to various kinds of hashing method-
results of a query simply with the traditional Hamming s while QsRank is only developed for PCA-based hashing
distance, which poses a critical issue for similarity search methods. Second, as QsRank is developed for -neighbor
where ranking is important, since there can be many re- search, it’s not very effective for nearest neighbor search
sults sharing the same Hamming distance to the query. since the distance of a query to its nearest neighbor is un-
This paper proposes a weighted Hamming distance rank- known in practice. On the contrary, WhRank can be easily
ing algorithm (WhRank) to alleviate this ranking ambigu- applied to -neighbor search.

1592
1590

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on April 18,2024 at 02:19:14 UTC from IEEE Xplore. Restrictions apply.
(a) 32 bits (b) 48 bits (c) 64 bits (d) 96 bits
0.45
LSH
0.4 LSH−WhRank
0.3 0.3 0.3
PCAH
0.35
PCAH−WhRank
Distance error ratio@N
0.25 0.25 0.25
0.3 PCAH−QsRank
SPH
0.25 0.2 0.2 0.2
SPH−WhRank
0.2 ITQ
0.15 0.15 0.15
ITQ−WhRank
0.15 ITQ−QsRank
0.1 0.1 0.1
AGH
0.1
AGH−WhRank
0.05 0.05 0.05
0.05

0 0 0 0
0 5000 10000 0 5000 10000 0 5000 10000 0 5000 10000
top−N top−N top−N top−N

Figure 7. Evaluations of Distance error ratio@N of WhRank and QsRank on MINST70K and ANN-SIFT1M. Code lengths: (a) 32 bits;
(b) 48 bits; (c) 64 bits; (d) 96 bits.

Acknowledgment [9] B. Kulis and K. Grauman. Kernelized locality-sensitive


hashing for scalable image search. In International Confer-
This work is supported by National Nature Science ence on Computer Vision, pages 2130–2137, 2009.
Foundation of China (61273247, 61271428), National Key [10] Y. Lecun, L. E. Bottou, Y. Bengio, and P. Haffner. Gradient-
Technology Research and Development Program of Chi- based learning applied to document recognition. Proceed-
na (2012BAH39B02). This work was supported in part ings of The IEEE, 86:2278–2324, 1998.
to Dr. Qi Tian by ARO grant W911BF-12-1-0057, NSF [11] W. Liu, J. Wang, R. Ji, Y. Jiang, and S.-F. Chang. Supervised
IIS 1052851, Faculty Research Awards by Google, FXPAL, hashing with kernels. In CVPR, pages 2074–2081, 2012.
and NEC Laboratories of America, and UTSA START-R [12] W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with
Research Award (2012) respectively. This work was sup- graphs. In ICML, pages 1–8, 2011.
ported in part by NSFC 61128007 and NSFC under grant [13] D. G. Lowe. Distinctive image features from scale-invariant
61103059. This work was supported by the Importation keypoints. IJCV, 60:91–110, 2004.
and Development of High-Caliber Talents Project of Bei- [14] K. Min, L. Yang, J. Wright, L. Wu, X.-S. Hua, and Y. Ma.
Compact projection: simple and efficient near neighbor
jing Municipal Institutions (IDHT20130225), China Spe-
search with practical memory requirements. In Computer
cial Fund for Meteorological-scientific Research in the Pub- Vision and Pattern Recognition, pages 3477–3484, 2010.
lic Interest (GYHY201106044) and NSFC 61271435. [15] L. Qin, W. Josephson, Z. Wang, M. Charikar, and K. Li. Mul-
tiProbe LSH: Efficient Indexing for High-Dimensional Sim-
References ilarity Search. In VLDB, pages 950–961, 2007.
[16] M. Raginsky and S. Lazebnik. Locality-sensitive binary
[1] A. Andoni and P. Indyk. Near-optimal hashing algorithms for codes from shift-invariant kernels. In Neural Information
approximate nearest neighbor in high dimensions. In IEEE Processing Systems, pages 1509–1517, 2009.
Symp. Found. Comput. Sci., pages 459–468, 2006. [17] R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int.
[2] J. L. Bentley and B. Labo. K-d trees for semidynamic point J. Approx. Reason., 50:969–978, 2009.
sets. In Symp. Comput. Geom., pages 187–197, 1990. [18] C. Silpa-anan and R. Hartley. Optimised KD-trees for fast
[3] Y. Gong and S. Lazebnik. Iterative quantization: a pro- image descriptor matching. In CVPR, pages 1–8, 2008.
crustean approach to learning binary codes. In Computer [19] J. Wang, S. Kumar, and S.-F. Chang. Sequential Projection
Vision and Pattern Recognition, pages 817–824, 2011. Learning for Hashing with Compact Codes. In International
[4] H. Jégou, M. Douze, and C. Schmid. Improving bag-of- Conference on Machine Learning, pages 1127–1134, 2010.
features for large scale image search. International Journal [20] X. Wang, L. Zhang, F. Jing, and W. Ma. AnnoSearch: Image
of Computer Vision, 87:316–336, 2010. auto-annotation by search. In CVPR, 2006.
[5] H. Jégou, M. Douze, and C. Schmid. Product quantization [21] Y. Weiss, A. B. Torralba, and R. Fergus. Spectral hashing. In
for nearest neighbor search. IEEE Transactions on Pattern NIPS, pages 1753–1760, 2008.
Analysis and Machine Intelligence, 33:117–128, 2011. [22] L. Zhang, X. Zhang, and H.-Y. Shum. QsRank: Query-
sensitive hash code ranking for efficient -neighbor search.
[6] J. Ji, J. Li, S. Yan, B. Zhang, and Q. Tian. Super-bit locality-
In CVPR, pages 2058–2065, 2012.
sensitive hashing. In NIPS, pages 108–116, 2012.
[23] W. Zhou, Y. Lu, H. Li, Y. Song, and Q. Tian. Spatial coding
[7] Y. Jiang, J. Wang, and S.-F. Chang. Lost in binarization: for large scale partial-duplicate web image search. In ACM
query-adaptive ranking for similar image search with com- Multimedia, pages 511–520, 2010.
pact codes. In ICMR, page 16, 2011.
[24] W. Zhou, Y. Lu, H. Li, and Q. Tian. Scalar quantization for
[8] B. Kulis and T. Darrell. Learning to hash with binary recon- large scale image search. In ACM Multimedia, pages 169–
structive embeddings. In NIPS, pages 1042–1050, 2009. 178, 2012.

1593
1591

Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on April 18,2024 at 02:19:14 UTC from IEEE Xplore. Restrictions apply.

You might also like