0% found this document useful (0 votes)
46 views5 pages

1 Applications of Nearest Neighbor

The document discusses nearest neighbor search, which is the problem of finding the nearest or k-nearest neighbors of a query point from a set of points in high-dimensional space. It provides two examples of where this problem arises: 1) in image processing when comparing images represented as vectors, and 2) in text retrieval when comparing documents represented as vectors of word frequencies. It then describes how to solve the 1D nearest neighbor problem by sorting points and using binary search, and how Voronoi diagrams can be used to solve the 2D problem in similar time and storage costs.

Uploaded by

iimranmalik
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views5 pages

1 Applications of Nearest Neighbor

The document discusses nearest neighbor search, which is the problem of finding the nearest or k-nearest neighbors of a query point from a set of points in high-dimensional space. It provides two examples of where this problem arises: 1) in image processing when comparing images represented as vectors, and 2) in text retrieval when comparing documents represented as vectors of word frequencies. It then describes how to solve the 1D nearest neighbor problem by sorting points and using binary search, and how Voronoi diagrams can be used to solve the 2D problem in similar time and storage costs.

Uploaded by

iimranmalik
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CS 683: Advanced Algorithms Nearest Neighbor Search

DATE 04/17/2001 Scribe: Elliot Anshelevich, Anirban Dasgupta

1 Applications of Nearest Neighbor


We now discuss an important geometric problem — nearest-neighbor search — and an ap-
proach to solving it through an application of VC-dimension. Given a set of points of high
dimension, we want to find out the nearest (or the k nearest) neighbors of any given query
point. We give two areas where this application comes in handy.
Example 1.1 In image-processing, often digital images are stored as just a string of bits. One
possibility could be chopping the entire image into regions and computing the average values of the
colour intensities over each region and storing these averages for all the regions. So, the image
could now be represented as a vector over Rd where d is the number of regions, and is typically very
large. In this case, Euclidean distance between two image-vectors gives us some notion of similarity.

Example 1.2 This example relates to the vector space model of image retrieval developed by
Salton. Suppose we have a reasonable vocabulary of the English language. We represent documents
in English by a vector with one coordinate for each word in the vocabulary. Document i is mapped to
a vector vi ∈ Rd . Typically d is of the order of 50K to 100K for a reasonable information retrieval
system. The j th coordinate of vi stores the number of times the j th word from the vocabulary has
appeared in the document. Or we may take the vectors to be just boolean vectors, the j th coordinate
representing whether the j th word has at all appeared in the document. Again, in this case, the
notion of similarity of two vectors under the L1 or L2 norm gives us some idea of similarity
between documents.

1.1 Nearest Neighbor Search


Suppose we have a set of points P in Rd . Our aim is to preprocess it so that we can provide
answers to nearest neighbor queries in reasonable time. That is, given a query point q ∈ Rd ,
we shall return the nearest neighbor pi ∈ P achieving minp∈P d(p, q), or we might need the
k-nearest points to q. In our discussion we shall restrict ourselves to the nearest-neighbor
problem only.
The complexity of this problem consists of two parts: the preprocessing and storage time,
and the actual query time. For the case of d = 1, i.e. when all the points are on a line, all we
need to do is sort the points and store the midpoints between successive points. Then, given
a query point, all we need to do is do a binary search on the stored midpoints to find in which
interval it lies. This directly gives us the nearest neighbor. Thus the preprocessing time is
O(n log n) and the storage requirement is O(n). The query time complexity is O(log n).
For two dimensions, the analogue of the above algorithm is finding out Voronoi diagrams.
A Voronoi diagram of a set of points {p1 , . . . , pn } is defined as a partition of the plane into

1
1 00
11 q
0 11 0
1 00 1
0 11
00
00
11
00
11 00 0 11
1 0
1 11
00 00
11 11
00
p1 p2 p3 p4 p5

Figure 1: One dimensional nearest neighbor

cells (closed or open polygons) such that the point pi lies in cell i and all points in R2 lying
in cell-i are nearer to pi than to any other point from the given set. It is known that Voronoi
diagrams in R2 can be found in time O(n log n) and require linear storage. Also, given a
query point, we just need to find out which cell it lies in. This can possibly be done by
constructing projections on both the coordinates and doing binary search (not sure).
11111
00000
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
000000
111111 00000
11111
p3
00000
000000
111111 1
011111
00000
11111
000000
111111
000000 11111
111111 00000
000000
111111 00000
11111
00000
000000
111111 11111
00000
11111
000000
111111
000000 00000
11111 1p2
0
111111
000000
111111 00000
11111
1111
0000
p4 111
000 00000000
11111111
1 111
0 000 00000000
11111111
00000000
11111111
000
111
000
111 00000000
11111111
000
111 00000000
11111111
000
111 00000000
11111111
00000000
11111111
0000000
1111111 000
111
000 00000000
11111111
0000000
1111111000
111 111
000000
111111
000 0
111 00000000
11111111
1 p1
0000000
11111110
1
000
111
0
1 000000
111111 00000000
11111111
0
1 000000
111111 00000000
11111111
0
1 000000
111111
000000
111111 00000000
11111111
0
1 000000
111111 00000000
11111111
0
1 000000
111111 00000000
11111111
0
1
0 000000
111111
1 000000
111111
1
0
1 1
0 0
0
1 000000
111111
p5 0
1
0
p6
1
0
1

Figure 2: Voronoi diagrams

This technique does not scale well to higher dimensions. For larger d, Clarkson (1987)
improved on a long sequence of previous work with an algorithm for nearest-neighbor search
that has pre-processing and storage requirement O(n⌈d/2(1+ε)⌉ ) The query time is O(cd log n)
for a constant c. As d gets near or as high as log n, this query time is not really doing better
than the brute force search algorithm which takes no preprocessing time, linear storage and
linear query time.

1.2 High dimensional nearest neighbor search


We introduce the notion of approximate nearest neighbor.
Definition 1.1 Given a initial set of points P = {p1 , . . . , pn }, and a query point q, call pi
to be the ε-approximate nearest neighbor to q if d(pi , q) ≤ (1 + ε) minp∈P d(p, q).
Due to a result by Arya et al. we get that ε-approximate nearest neighbor can be solved in
preprocessing time O(n log n), storage space O(n log n), and query time O(cd ε−d log n). Our
goal is the following

2
1111111111111111111
0000000000000000000
0
1
0000000000000000000
1111111111111111111
x 0
1
0000000000000000000
1111111111111111111
0
1
0000000000000000000
0
1111111111111111111
1
0000000000000000000
1111111111111111111
0
1
0000000000000000000
1111111111111111111
0
1
0000000000000000000
0
1111111111111111111
1
0000000000000000000
1111111111111111111
0
1
0000000000000000000
0
1111111111111111111
1
0000000000000000000
1111111111111111111
0
1
0000000000000000000
1111111111111111111
0
1
0000000000000000000
0
1111111111111111111
1
0000000000000000000
1111111111111111111
0
1
0000000000000000000
1111111111111111111
0
1
0000000000000000000
1111111111111111111
0
1
0000000000000000000
1111111111111111111
0
1
0000000000000000000
0
1111111111111111111
1
0000000000000000000
1111111111111111111
0
1
0000000000000000000
1111111111111111111
θ 0
1
0000000000000000000
1111111111111111111
0
1
1111111111111111111111
0000000000000000000000
000000000000
111111111111
0000000000000000000
1111111111111111111
11
00 0
1
000000000000
111111111111 0
1
0
1
000000000000
111111111111
0
1 v
000000000000
111111111111
0
1
000000000000
111111111111
0
1
000000000000
111111111111
0
1
000000000000
111111111111
ϕ
0
1
000000000000
111111111111
0
1
000000000000
111111111111
0
1
000000000000
111111111111
0
1
000000000000
111111111111
0
1
000000000000
111111111111
0
1
000000000000
111111111111
0
1
000000000000
111111111111
0
1
000000000000
111111111111
0
1
000000000000
111111111111 y

Figure 3: Random projection

Goal. Suppose we are willing to pay high preprocessing and storage costs. Then can we
achieve a query time cost of O(poly( 1ε )poly(d) log n) ?
It turns out that this is possible by using an idea based on random projections (Kleinberg
1997). The intuition behind the algorithm is that relative distances between the query point
and the initial set P should preserve their relation under projection on random vectors.

1.3 Random Projections


Suppose we have points pi and pj from the initial set of points. Let q be the query point.
Let us take a random line L and project the vectors pi q and pj q on the line L. We need to
analyze the probability that the shorter vector actually has a shorter projection. We present
a general lemma for this.

Lemma 1.2 Suppose 0 < γ ≤ 12 , x, y ∈ Rd such that kxk(1 + γ) ≤ kyk (i.e. x is sufficiently
smaller than y), Now, we choose a vector v uniformly at random from the unit sphere S d−1
in Rd . Then P r[|v · x| < |v · y|] ≥ 12 + γ5 .

Proof. It is enough to think about the 2-dimensional space spanned by x and y. Let
kxk 1
r = kyk ≤ 1+γ . Suppose the angle between v and x is ϕ and the angle between x and y is θ.
So we need to know if cos2 (θ − ϕ) ≤ r2 cos2 ϕ. This being true is a bad outcome for us, and
hence we want to upper bound the probability for this event. It is easily seen that the worst
case occurs when the vectors x and y are orthogonal to each other, i.e. θ = π2 . Actually the
probability of the “bad” event occurring increases from 0 to π2 and decreases again from π2 to
π, and so on. So we need to see when cos2( π2 − ϕ) = sin2 ϕ ≤ r2 cos2 ϕ i.e. need upper bound
−1
P r[| tan ϕ| ≤ r]. Hence, P r[bad event] = P r[| tan ϕ| ≤ r] = 2 tanπ r < 12 as from Figure 4.
If we do a Taylor expansion of tan−1 r, we can show that that the above probability is less
that ( 21 − γ5 ). Hence, P r[bad event] < 12 − γ5 . Hence the lemma follows.

3
tan -1 r

1
0 0

Figure 4: Wedge for “bad” event

From the lemma, we get the following corollary.

Corollary 1.3 Given x and y, the set Wx,y of vectors from S d−1 , that give rise to the bad
event (i.e. the projection of x exceeds projection of y, under the conditions of the above
lemma), is a wedge of hyperplanes of probability measure < 12 − γ5 .

Definition 1.4 A distinguishing set is a finite set of points V on the unit sphere in Rd so
that no wedge Wx,y of measure < 12 − γ5 (for any x and y producing such a wedge) has at
least half of V .

The point of a distinguishing set is that V gives a correct length comparison for any
x, y ∈ Rd differing by ≥ 1 + γ factor, by majority vote.
Question: How big must V be? This is actually a VC-dimension question. The ground set
here is the unit sphere (denoted S d−1 ), and C are all the wedges.
Fact: An ε-sample for the infinite set system (S d−1 ,wedges) with ε = γ5 is a distinguishing
set. This is clear from the definitions of distinguishing set and ε-sample. So, if d′ =VC-
′ ′
dim(S d−1,wedges), then we can take a random sample from S d−1 of size |V | = O( γd2 log dγ +
1
γ2
log δ1 ), since this would form an γ5 -sample with probability ≥ (1 − δ). This is in terms of
d′ , however, so we need to bound d′ .

Claim 1.5 d′ = O(d log d)

To prove this, notice that all wedges are a result of a Boolean function of four halfspaces,
specifically the function f(A1, A2 , A3, A4) which takes four halfspaces and produces (A1 ∩
A2) ∪ (A3 ∩ A4). (The Boolean operations here are ∩ and ∪). Therefore, Claim 1.5 results
from the following Lemma:

4
Lemma 1.6 Let f be a Boolean function on h inputs, each input a set. Let (U, R) be
a set system of VC-dimension d. Let (U, Rf ) be the new set system, where Rf = f(all
combinations of R). Then the VC-dimension of (U, Rf )= O(dh log dh) if h = O(d).

Proof. If A ⊆ U has size n, at most di=0 ni subsets of A can be realized by intersections


P 

with members of R, by definition of VC-dimension. P How many different intersections of A


can there be with members of Rf ? We know that di=0 ni ≤ nd , so there are at most nd sets
d
in R which result in different intersections with A. There are ≤ nh ≤ ndh combinations of
these sets of R that can be given to f as inputs, so there are at most ndh sets of Rf which yield
different intersections with A. So if A is shattered by (U, Rf ), then ndh ≥ 2n , by definition
of “shattered”. This means that dh log n ≥ n, so n/ log n ≤ dh. Therefore, n = O(dh log dh)
would be enough for A not to be shattered, and so VC-dim(U, Rf )= O(dh log dh).

You might also like