0% found this document useful (0 votes)

41 views20 pages

Moore Tutorial

The document provides an introduction to kd-trees, including their structure, construction, and use for nearest neighbor searches. Kd-trees partition a k-dimensional space into hyperrectangular regions and allow efficient searching by pruning large areas that cannot contain a closer neighbor. The algorithm recursively traverses the kd-tree, checking if a closer point could exist in child nodes, to find the nearest neighbor point to a given target.

Uploaded by

Hamid Rahaee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views20 pages

Moore Tutorial

Uploaded by

Hamid Rahaee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

An intoductory tutorial on kd-trees

Andrew W. Moore Carnegie Mellon University [email protected]

Extract from Andrew Moore's PhD Thesis: E cient Memory-based Learning for Robot Control PhD. Thesis Technical Report No. 209, Computer Laboratory, University of Cambridge. 1991.

Chapter 6
Kd-trees for Cheap Learning
This chapter gives a speci cation of the nearest neighbour algorithm. It also gives both an informal and formal introduction to the kd-tree data structure. Then there is an explicit, detailed account of how the nearest neighbour search algorithm is implemented e ciently, which is followed by an empirical investigation into the algorithm's performance. Finally, there is discussion of some other algorithms related to the nearest neighbour search.

6.1 Nearest Neighbour Speci cation

Given two multi-dimensional spaces Domain = <kd and Range = <kr , let an exemplar be a member of Domain Range and let an exemplar-set be a nite set of exemplars. Given an exemplar-set, E, and a target domain vector, d, then a nearest neighbour of d is any. exemplar (d r ) 2 E such that None-nearer(E d d ). Notice that there might be more than one suitable exemplar. This ambiguity captures the requirement that any nearest neighbour is adequate. None-nearer is de ned thus:
0 0 0

, 8(d r ) 2 E j d ; d j j d ; d j (6.1) In Equation 6.1 the distance metric is Euclidean, though any other p -norm could have been used.
0 00 00 0 00

None-nearer(E d d )

vi=k u Xd u j d ; d j= t ( i ; i )2
0

i=1

(6.2)

where di is the ith component of vector d. In the following sections I describe some algorithms to realize this abstract speci cation with the additional informal requirement that the computation time should be relatively short. 6-1

Algorithm: Data Structures: domain-vector range-vector exemplar Input: Output: Preconditions: Postconditions:

Nearest Neighbour by Scanning. A vector of kd oating point numbers. A vector of kr oating point numbers. A pair: (domain-vector range-vector) exlist, of type list of exemplar dom, of type domain-vector nearest, of type exemplar exlist is not empty if nearest represents the exemplar (d r ), and exlist represents the exemplar set E, and dom represents the vector d, then (d r ) 2 E and None-nearer(E d d ).
0 0 0 0 0

Code:
1. 2. 3. 3.1 3.2 3.2.1 3.2.2

nearest-dist := in nity nearest := unde ned for ex := each exemplar in exlist dist := distance between dom and the domain of ex if dist nearest-dist then nearest-dist := dist nearest := ex
<

Table 6.1: Finding Nearest Neighbour by scanning a list.

6.2 Naive Nearest Neighbour

This operation could be achieved by representing the exemplar-set as a list of exemplars. In Table 6.1, I give the trivial nearest neighbour algorithm which scans the entire list. This algorithm has time complexity O(N ) where N is the size of E. By structuring the exemplar-set more intelligently, it is possible to avoid making a distance computation for every member.

6.3 Introduction to kd-trees

A kd-tree is a data structure for storing a nite set of points from a k-dimensional space. It was examined in detail by J. Bentley Bentley, 1980 Friedman et al., 1977]. Recently, S. Omohundro has recommended it in a survey of possible techniques to increase the speed of neural network learning Omohundro, 1987]. A kd-tree is a binary tree. The contents of each node are depicted in Table 6.2. Here I provide an informal description of the structure and meaning of the tree, and in the following subsection I 6-2

Field Name: dom-elt range-elt split left right

Field Type domain-vector range-vector integer kd-tree kd-tree

Description A point from kd -d space A point from kr -d space The splitting dimension A kd-tree representing those points to the left of the splitting plane A kd-tree representing those points to the right of the splitting plane

Table 6.2: The elds of a kd-tree node give a formal de nition of the invariants and semantics. The exemplar-set E is represented by the set of nodes in the kd-tree, each node representing one exemplar. The dom-elt eld represents the domain-vector of the exemplar and the range-elt eld represents the range-vector. The dom-elt component is the index for the node. It splits the space into two subspaces according to the splitting hyperplane of the node. All the points in the \left" subspace are represented by the left subtree, and the points in the \right" subspace by the right subtree. The splitting hyperplane is a plane which passes through dom-elt and which is perpendicular to the direction speci ed by the split eld. Let i be the value of the split eld. Then a point is to the left of dom-elt if and only if its ith component is less than the ith component of dom-elt. The complimentary de nition holds for the right eld. If a node has no children, then the splitting hyperplane is not required. Figure 6.1 demonstrates a kd-tree representation of the four dom-elt points (2 5), (3 8), (6 3) and (8 9). The root node, with dom-elt (2 5) splits the plane in the y -axis into two subspaces. The point (3 8) lies in the lower subspace, that is f(x y ) j y < 5g, and so is in the left subtree. Figure 6.2 shows how the nodes partition the plane.

6.3.1 Formal Speci cation of a kd-tree

The reader who is content with the informal description above can omit this section. I de ne a mapping
exset-rep : kd-tree ! exemplar-set

(6.3)

which maps the tree to the exemplar-set it represents:

exset-rep(empty) = exset-rep(< d r ; empty empty >) = f(d r)g exset-rep(< d r split treeleft treeright >) = exset-rep(treeleft) f(d r)g exset-rep(treeright)

(6.4)

6-3

[2,5] [6,3] [3,8] [8,9]

Figure 6.1 A 2d-tree of four elements. The splitting planes are not indicated. The 2,5] node splits along the y = 5 plane and the 3,8] node splits along the x = 3 plane.

[8,9] [3,8]

Figure 6.2
[2,5]

How the tree of Figure 6.1 splits up the x,y plane.

[6,3]

6-4

The invariant is that subtrees only ever contain dom-elts which are on the correct side of all their ancestors' splitting planes.
Is-legal-kdtree(empty): Is-legal-kdtree(< d r ; empty empty >): Is-legal-kdtree(< d r split treeleft treeright >) , 8(d r ) 2 exset-rep(treeleft) d split dsplit ^ 8(d r ) 2 exset-rep(treeright) d split > dsplit ^ Is-legal-kdtree(treeleft)^ Is-legal-kdtree(treeright)
0 0 0 0 0 0

(6.5)

6.3.2 Constructing a kd-tree

Given an exemplar-set E, a kd-tree can be constructed by the algorithm in Table 6.3. The pivotchoosing procedure of Step 2 inspects the set and chooses a \good" domain vector from this set to use as the tree's root. The discussion of how such a root is chosen is deferred to Section 6.7. Whichever exemplar is chosen as root will not a ect the correctness of the kd-tree, though the tree's maximum depth and the shape of the hyperregions will be a ected.

6.4 Nearest Neighbour Search

In this section, I describe the nearest neighbour algorithm which operates on kd-trees. I begin with an informal description and worked example, and then give the precise algorithm. A rst approximation is initially found at the leaf node which contains the target point. In Figure 6.3 the target point is marked X and the leaf node of the region containing the target is coloured black. As is exempli ed in this case, this rst approximation is not necessarily the nearest neighbour, but at least we know any potential nearer neighbour must lie closer, and so must lie within the circle centred on X and passing through the leaf node. We now back up to the parent of the current node. In Figure 6.4 this parent is the black node. We compute whether it is possible for a closer solution to that so far found to exist in this parent's other child. Here it is not possible, because the circle does not intersect with the (shaded) space occupied by the parent's other child. If no closer neighbour can exist in the other child, the algorithm can immediately move up a further level, else it must recursively explore the other child. In this example, the next parent which is checked will need to be explored, because the area it covers (i.e. everywhere north of the central horizontal line) does intersect with the best circle so far. Table 6.4 describes my actual implementation of the nearest neighbour algorithm. It is called with four parameters: the kd-tree, the target domain vector, a representation of a hyperrectangle in Domain, and a value indicating the maximum distance from the target which is worth searching. The search will only take place within those portions of the kd-tree which lie both in the hyper6-5

Algorithm: Input: Output: Pre: Post: Code:

1. 2.

Constructing a kd-tree exset, of type exemplar-set kd, of type kdtree None exset = exset-rep(kd) ^ Is-legal-kdtree(kd) If exset is empty then return the empty kdtree Call pivot-choosing procedure, which returns two values: ex := a member of exset split := the splitting dimension d := domain vector of ex exset' := exset with ex removed r := range vector of ex exsetleft := f(d r ) 2 exset' j d split dsplit g exsetright := f(d r ) 2 exset' j d split > dsplit g kdleft := recursively construct kd-tree from exsetleft kdright := recursively construct kd-tree from exsetright kd := < d r split kdleft kdright > By induction on the length of exset and the de nitions of exset-rep and Is-legal-kdtree.
0 0 0 0 0 0

3. 4. 5. 6. 7. 8. 9. 10.

Proof:

Table 6.3: Constructing a kd-tree from a set of exemplars.

Figure 6.3 The black dot is the dot which owns the leaf node containing the target (the cross). Any nearer neighbour must lie inside this circle.

6-6

Figure 6.4 The black dot is the parent of the closest found so far. In this case the black dot's other child (shaded grey) need not be searched.

8 min > hri if i hrmin i < if hrmin i hrmax i= > i min : hri if i i hrmax i i The objects intersect only if the distance between p and t is less than or equal to .
t p t < t < t r

rectangle, and within the maximum distance to the target. The caller of the routine will generally specify the in nite hyperrectangle which covers the whole of Domain, and the in nite maximum distance. Before discussing its execution, I will explain how the operations on the hyperrectangles can be implemented. A hyperrectangle is represented by two arrays: one of its minimum coordinates, the other of its maximum coordinates. To cut the hyperrectangle, so that one of its edges is moved closer to its centre, the appropriate array component is altered. To check to see if a hyperrectangle hr intersects with a hypersphere radius r centered at point t, we nd the point p in hr which is closest to t. Write hrmin as the minimum extreme of hr in the ith dimension and hrmax as the i i maximum extreme. pi , the ith component of this closest point is computed thus: (6.6)

The search is depth rst, and uses the heuristic of searching rst the child node which contains the target. Step 1 deals with the trivial empty tree case, and Steps 2 and 3 assign two important local variables. Step 4 cuts the current hyperrectangle into the two hyperrectangles covering the space occupied by the child nodes. Steps 5{7 determine which child contains the target. After Step 8, when this initial child is searched, it may be possible to prove that there cannot be any closer point in the hyperrectangle of the further child. In particular, the point at the current node must be out of range. The test is made in Steps 9 and 10. Step 9 restricts the maximum radius in which any possible closer point could lie, and then the test in Step 10 checks whether there is any 6-7

Algorithm: Nearest Neighbour in a kd-tree Input: kd, of type kdtree target, of type domain vector hr, of type hyperrectangle max-dist-sqd, of type oat Output: nearest, of type exemplar dist-sqd, of type oat Pre: Is-legal-kdtree(kd) Post: Informally, the postcondition is that nearest is a nearest exemplar to target which alsop both within the hyperrectangle hr lies p and within distance max-dist-sqd of target. dist-sqd is Code:
1. 2. 3. 4. the distance of this nearest point. If there is no such point then dist-sqd contains in nity. if kd is empty then set dist-sqd to in nity and exit. s := split eld of kd pivot := dom-elt eld of kd Cut hr into two sub-hyperrectangles left-hr and right-hr. The cut plane is through pivot and perpendicular to the s dimension. target-in-left := targets pivots if target-in-left then nearer-kd := left eld of kd and nearer-hr := left-hr further-kd := right eld of kd and further-hr := right-hr if not target-in-left then nearer-kd := right eld of kd and nearer-hr := right-hr further-kd := left eld of kd and further-hr := left-hr Recursively call Nearest Neighbour with parameters (nearer-kd,target, nearer-hr,max-dist-sqd), storing the results in nearest and dist-sqd max-dist-sqd := minimum of max-dist-sqd and dist-sqd A nearer point could only lie in further-kd if there were some p part of further-hr within distance max-dist-sqd of target. if this is the case then if (pivot ; target)2 < dist-sqd then nearest := (pivot range-elt eld of kd) dist-sqd := (pivot ; target)2 max-dist-sqd := dist-sqd Recursively call Nearest Neighbour with parameters (further-kd,target, further-hr,max-dist-sqd), storing the results in temp-nearest and temp-dist-sqd If temp-dist-sqd < dist-sqd then nearest := temp-nearest and dist-sqd := temp-dist-sqd Outlined in text Table 6.4: The Nearest Neighbour Algorithm 6-8

5. 6. 6.1 6.2 7. 7.1 7.2 8. 9. 10. 10.1 10.1.1 10.1.2 10.1.3 10.2 10.3 10.3.1

Proof:

space in the hyperrectangle of the further child which lies within this radius. If it is not possible then no further search is necessary. If it is possible, then Step 10:1 checks if the point associated with the current node of the tree is closer than the closest yet. Then, in Step 10.2, the further child is recursively searched. The maximum distance worth examining in this further search is the distance to the closest point yet discovered. The proof that this will nd the nearest neighbour within the constraints is by induction on the size of the kd-tree. If the cuto were not made in Step 10, then the proof would be straightforward: the point returned is the closest out of (i) the closest point in the nearer child, (ii) the point at the current node and (iii) the closest point in the further child. If the cuto were made in Step 10, then the point returned is the closest point in the nearest child, and we can show that neither the current point, nor any point in the further child can possibly be closer. Many local optimizations are possible which while not altering the asymptotic performance of the algorithm will multiply the speed by a constant factor. In particular, it is in practice possible to hold almost all of the search state globally, instead of passing it as recursive parameters.

6.5 Theoretical Behaviour

Given a kd-tree with N nodes, how many nodes need to be inspected in order to nd the proven nearest neighbour using the algorithm in Section 6.4?. It is clear at once that on average, at least O (log N ) inspections are necessary, because any nearest neighbour search requires traversal to at least one leaf of the tree. It is also clear that no more than N nodes are searched: the algorithm visits each node at most once. Figure 6.5 graphically shows why we might expect considerably fewer than N nodes to be visited: the shaded areas correspond to areas of the kd-tree which were cut o . The important values are (i) the worst case number of inspections and (ii) the expected number of inspections. It is actually easy to construct worst case distributions of the points which will force nearly all the nodes to be inspected. In Figure 6.6, the tree is two-dimensional, and the points are scattered along the circumference of a circle. If we request the nearest neighbour with the target close to the centre of the circle, it will therefore be necessary for each rectangle, and hence each leaf, to be inspected (this is in order to ensure that there is no point lying inside the circle in any rectangle). Calculation of the expected number of inspections is di cult, because the analysis depends critically on the expected distribution of the points in the kd-tree, and the expected distribution of the target points presented to the nearest neighbour algorithm. The analysis is performed in Friedman et al., 1977]. This paper considers the expected number of hyperrectangles corresponding to leaf nodes which will provably need to be searched. Such hyperrectangles intersect the volume enclosed by a hypersphere centered on the query point whose surface passes through the nearest neighbour. For example, in Figure 6.5 the hypersphere (in this 6-9

Figure 6.5 Generally during a nearest neighbour search only a few leaf nodes need to be inspected.

Figure 6.6 A bad distribution which forces almost all nodes to be inspected.

6-10

case a circle) is shown, and the number of intersecting hyperrectangles is two. The paper shows that the expected number of intersecting hyperrectangles is independent of N , the number of exemplars. The asymptotic search time is thus logarithmic because the time to descend from the root of the tree to the leaves is logarithmic (in a balanced tree), and then an expected constant amount of backtracking is required. However, this reasoning was based on the assumption that the hyperrectangles in the tree tend to be hypercubic in shape. Empirical evidence in my investigations has shown that this is not generally the case for their tree building strategy. This is discussed and demonstrated in Section 6.7. A second danger is that the cost, while independent of N , is exponentially dependent on k, the dimensionality of the domain vectors1. Thus theoretical analysis provides some insight into the cost, but here, empirical investigation will be used to examine the expense of nearest neighbour in practice.

6.6 Empirical Behaviour

In this section I investigate the empirical behaviour of the nearest neighbour searching algorithm. We expect that the number of nodes inspected in the tree varies according to the following properties of the tree:
N

, the size of the tree. the dimensionality of the domain vectors in the tree. This value is the k in kd-tree.

dom ,

the distribution of the domain vectors. This can be quanti ed as the \true" dimensionality of the vectors. For example, if the vectors had three components, but all lay on the surface of a sphere, then the underlying dimensionality would be 2. In general, discovery of the underlying dimensionality of a given sample of points is extremely di cult, but for these tests it is a straightforward matter to generate such points. To make a kd-tree with underlying dimensionality ddistrib, I use randomly generated kdom-dimensional domain vectors which lie on a ddistrib-dimensional hyperelliptical surface. The random vector generation algorithm is as follows: Generate ddistrib random angles i 2 0 2 ) where 0 i < ddistrib. Then let Q the j th component of the vector be ii=d 1 sin( i + ij ). The phase angles ij are de ned as =0 = 1 if the j th bit of the binary representation of i is 1 and is zero otherwise. ij 2
d
;

distrib ,

, the probability distribution from which the search target vector will be selected. I shall assume that this distribution is the same as that which determines the domain vectors. This is indeed what will happen when the kd-tree is used for learning control.
d

target

This was pointed out to the author by N. Maclaren.

6-11

Figure 6.7
12 10

Number of inspections required during a nearest neighbour search against the size of the kd-tree. In this experiment the tree was four-dimensional and the underlying distribution of the points was threedimensional.
0 2000 4000 6000 8000 1000

In the following sections I investigate how performance depends on each of these properties.

6.6.1 Performance against Tree Size

Figures 6.7 and 6.8 graph the number of nodes inspected against the number of nodes in the entire kd-tree. Each value was derived by generating a random kd-tree, and then requesting 500 random nearest neighbour searches on the kd-tree. The average number of inspected nodes was recorded. A node was counted as being inspected if the distance between it and the target was computed. Figure 6.7 was obtained from a 4d-tree with an distribution distribution ddistrib = 3. Figure 6.8 used an 8d-tree with an underlying distribution ddistrib = 8. It is immediately clear that after a certain point, the expense of a nearest neighbour search has no detectable increase with the size of the tree. This agrees with the proposed model of search cost|logarithmic with a large additive constant term.

6.6.2 Performance against the \k" in kd-tree

Figure 6.9 graphs the number of nodes inspected against kdom, the number of components in the kd-tree's domain vectors for a 10,000 node tree. The underlying dimensionality was also kdom. The number of inspections per search rises very quickly, possibly exponentially, with kdom. This behaviour, the massive increase in cost with dimension, is familiar in computational geometry.

6-12

Figure 6.8 Number of inspections against kd-tree size for an eight-dimensional tree with an eight-dimensional underlying distribution.

2000

4000

6000

8000

1000

600

500

Figure 6.9 Number of inspections graphed against tree dimension. In these experiments the points had an underlying distribution with the same dimensionality as the tree.
1 3 5 7 9 11 13 15

400

300

200

100

6-13

500

400

Figure 6.10 Number of inspections graphed against underlying dimensionality for a fourteen-dimensional tree.

300

200

100

6.6.3 Performance against the Distribution Dimensionality

This experiment con rms that it is ddistrib, the distribution dimension from which the points were selected, rather than kdom which critically a ects the search performance. The trials for Figure 6.10 used a 10,000 node kd-tree with domain dimension of 14, for various values of ddistrib. The important observation is that for 14d-trees, the performance does improve greatly if the underlying distribution-dimension is relatively low. Conversely, Figure 6.11 shows that for a xed (4-d) underlying dimensionality, the search expense does not seem to increase any worse than linearly with kdom.

6.6.4 When the Target is not Chosen from the kd-tree's Distribution
In this experiment the points were distributed on a three dimensional elliptical surface in tendimensional space. The target vector was, however, chosen at random from a ten-dimensional distribution. The kd-tree contained 10,000 points. The average number of inspections over 50 searches was found to be 8,396. This compares with another experiment in which both points and target were distributed in ten dimensions and the average number of inspections was only 248. The reason for the appalling performance was exempli ed in Figure 6.6: if the target is very far from its nearest neighbour then very many leaf nodes must be checked.

6.6.5 Conclusion
The speed of the search (measured as the number of distance computations required) seems to vary
:::

6-14

100

Figure 6.11 Number of inspections graphed against tree dimension, given a constant four dimensional underlying distribution.

only marginally with tree size. If the tree is su ciently large with respect to the number of dimensions, it is essentially constant.
::: :::

very quickly with the dimensionality of the distribution of the datapoints, ddistrib.

:::

linearly with the number of components in the kd-tree's domain (kdom), given a xed distribution dimension (ddistrib).

There is also evidence to suggest that unless the target vector is drawn from the same distribution as the kd-tree points, performance can be greatly worsened. These results support the belief that real time searching for nearest neighbours is practical in a robotic system where we can expect the underlying dimensionality of the data points to be low, roughly less than 10. This need not mean that the vectors in the input space should have less than ten components. For data points obtained from robotic systems it will not be easy to decide what the underlying dimensionality is. However Chapter 10 will show that the data does tend to lie within a number of low dimensional subspaces.

6.7 Further kd-tree Operations

In this section I discuss some other operations on kd-trees which are required for use in the SAB learning system. These include incrementally adding a point to a kd-tree, range searching, and selecting a pivot point.

6-15

6.7.1 Range Searching a kd-tree

range-search : exemplar-set Domain < ! exemplar-set

The abstract range search operation on an exemplar-set nds all exemplars whose domain vectors are within a given distance of a target point:
range-search(E d r) = f(d r ) 2 E j (d ; d )2 < r2g
0 0 0

This is implemented by a modi ed nearest neighbour search. The modi cations are that (i) the initial distance is not reduced as closer points are discovered and (ii) all discovered points within the distance are returned, not just the nearest. The complexity of this operation is shown, in Preparata and Shamos, 1985], to still be logarithmic in N (the size of E) for a xed range size.

6.7.2 Choosing a Pivot from an Exemplar Set

The tree building algorithm of Section 6.3 requires that a pivot and a splitting plane be selected from which to build the root of a kd-tree. It is desirable for the tree to be reasonably balanced, and also for the shapes of the hyperregions corresponding to leaf nodes to be fairly equally proportioned. The rst criterion is important because a badly unbalanced tree would perhaps have O (N ) accessing behaviour instead of O (log N ). The second criterion is in order to maximize cuto opportunities for the nearest neighbour search. This is di cult to formalize, but can be motivated by an illustration. In Figure 6.12 is a perfectly balanced kd-tree in which the leaf regions are very non-square. Figure 6.13 illustrates a kd-tree representing the same set of points, but which promotes squareness at the expense of some balance. One pivoting strategy which would lead to a perfectly balanced tree, and which is suggested in Omohundro, 1987], is to pick the splitting dimension as that with maximum variance, and let the pivot be the point with the median split component. This will, it is hoped, tend to promote square regions because having split in one dimension, the next level in the tree is unlikely to nd that the same dimension has maximum spread, and so will choose a di erent dimension. For uniform distributions this tends to perform reasonably well, but for badly skewed distributions the hyperregions tend to take long thin shapes. This is exempli ed in Figure 6.12 which has been balanced using this standard median pivot choice. To avoid this bad case, I choose a pivot which splits the exemplar set in the middle of the range of the most spread dimension. As can be seen in Figure 6.13, this tends to favour squarer regions at the expense of a slight imbalance in the kd-tree. This means that large empty areas of space are lled with only a few hyperrectangles which are themselves large. Thus, the number of leaf nodes which need to be inspected in case they contain a nearer neighbour is smaller than for the original case, which had many small thin hyperrectangles. 6-16

Figure 6.12 A 2d tree balanced using the `median of the most spread dimension' pivoting strategy.

Figure 6.13 A 2d tree balanced using the `closest to the centre of the widest dimension' pivoting strategy.

6-17

My pivot choice algorithm is to rstly choose the splitting dimension as the longest dimension of the current hyperrectangle, and then choose the pivot as the point closest to the middle of the hyperrectangle along this dimension. Occasionally, this pivot may even be an extreme point along its dimension, leading to an entirely unbalanced node. This is worth it, because it creates a large empty leaf node. It is possible but extremely unlikely that the points could be distributed in such a way as to cause the tree to have one empty child at every level. This would be unacceptable, and so above a certain depth threshold, the pivots are chosen using the standard median technique. Selecting the median as the split and selecting the closest to the centre of the range are both O (N ) operations, and so either way a tree rebalance is O (N log N ).

6.7.3 Incrementally Adding a Point to a kd-tree

Firstly, the leaf node which contains the new point is computed. The hyperrectangle corresponding to this leaf is also obtained. See Section 6.4 for hyperrectangle implementation. When the leaf node is found it may either be (i) empty, in which case it is simply replaced by a new singleton node, or (ii) it contains a singleton node. In case (ii) the singleton node must be given a child, and so its previously irrelevant split eld must be de ned. The split eld should be chosen to preserve the squareness of the new subhyperrectangles. A simple heuristic is used. The split dimension is chosen as the dimension in which the hyperrectangle is longest. This heuristic is motivated by the same requirement as for tree balancing|that the regions should be as square as possible, even if this means some loss of balance. This splitting choice is just a heuristic, and there is no guarantee that a series of points added in this way will preserve the balance of the kd-tree, nor that the hyperrectangles will be well shaped for nearest neighbour search. Thus, on occasion (such as when the depth exceeds a small multiple of the best possible depth) the tree is rebuilt. Incremental addition costs O(log N ).

6.7.4 Q Nearest Neighbours

This uses a modi ed version of the nearest neighbour search. Instead of only searching within a sphere whose radius is the closest distance yet found, the search is within a sphere whose radius is the Qth closest yet found. Until Q points have been found, this distance is in nity.

6.7.5 Deleting a Point from a kd-tree

If the point is at a leaf, this is straightforward. Otherwise, it is di cult, because the structure of both trees below this node are pivoted around the point we wish to remove. One solution would be to rebuild the tree below the deleted point, but on occasion this would be very expensive. My solution is to mark the point as deleted with an extra eld in the kd-tree node, and to ignore deletion nodes in nearest neighbour and similar searches. When the tree is next rebuilt, all deletion nodes are removed. 6-18

Bibliography
Bentley, 1980] J. L. Bentley. Multidimensional Divide and Conquer. Communications of the ACM, 23(4):214|229, 1980. Friedman et al., 1977] J. H. Friedman, J. L. Bentley, and R. A. Finkel. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. on Mathematical Software, 3(3):209{ 226, September 1977. Omohundro, 1987] S. M. Omohundro. E cient Algorithms with Neural Network Behaviour. Journal of Complex Systems, 1(2):273{347, 1987. Preparata and Shamos, 1985] F. P. Preparata and M. Shamos. Springer-Verlag, 1985.
Computational Geometry.

Bib-1

KD Tree
No ratings yet
KD Tree
41 pages
MultidimensionalSearchTrees
No ratings yet
MultidimensionalSearchTrees
100 pages
1907.00845
No ratings yet
1907.00845
31 pages
KD Tree Doc
No ratings yet
KD Tree Doc
20 pages
Moore Andrew 1991 1
No ratings yet
Moore Andrew 1991 1
20 pages
99 Geometric Search
No ratings yet
99 Geometric Search
56 pages
L17-18 QuadTrees PDF
No ratings yet
L17-18 QuadTrees PDF
45 pages
Fast and exact fixed-radius neighbor search based on sorting
No ratings yet
Fast and exact fixed-radius neighbor search based on sorting
17 pages
Quad Trees: CMSC 420
No ratings yet
Quad Trees: CMSC 420
45 pages
[PR 2024] Lec14 Unsupervised Learning II
No ratings yet
[PR 2024] Lec14 Unsupervised Learning II
32 pages
Computational Geomatory
No ratings yet
Computational Geomatory
212 pages
1108 6304v1 PDF
No ratings yet
1108 6304v1 PDF
20 pages
Machine Learning For Humans, Part 2.3 - Supervised Learning III - by Vishal Maini - Machine Learning For Humans - Medium
No ratings yet
Machine Learning For Humans, Part 2.3 - Supervised Learning III - by Vishal Maini - Machine Learning For Humans - Medium
25 pages
KNN and Naive Bayes
No ratings yet
KNN and Naive Bayes
61 pages
k-d trees
No ratings yet
k-d trees
19 pages
Scalable Nearest Neighbor Algorithms For High Dimensional Data
No ratings yet
Scalable Nearest Neighbor Algorithms For High Dimensional Data
16 pages
Machine learning Lecture 02
No ratings yet
Machine learning Lecture 02
25 pages
003 05 KNN - Enhancements W3L2
No ratings yet
003 05 KNN - Enhancements W3L2
10 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
similarity search-kd tree
No ratings yet
similarity search-kd tree
5 pages
lec02 (1)
No ratings yet
lec02 (1)
27 pages
1903.04936v1
No ratings yet
1903.04936v1
12 pages
L19.Kd Trees
0% (1)
L19.Kd Trees
19 pages
Multidimensional Search Trees
No ratings yet
Multidimensional Search Trees
119 pages
CPE 514-3 - Graphics Data Structure
No ratings yet
CPE 514-3 - Graphics Data Structure
20 pages
Part10 Quadtrees Etc
No ratings yet
Part10 Quadtrees Etc
69 pages
k-d_trees_and_knn_searches
No ratings yet
k-d_trees_and_knn_searches
9 pages
3RD Term J1 Computer Studies
100% (3)
3RD Term J1 Computer Studies
15 pages
An Intoductory Tutorial On Kd-Trees: Andrew W. Moore Carnegie Mellon University Awm@cs - Cmu.edu
No ratings yet
An Intoductory Tutorial On Kd-Trees: Andrew W. Moore Carnegie Mellon University Awm@cs - Cmu.edu
20 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
18.4 - K-Nearest Neighbours Geometric Intuition With A Toy Example - mp4
No ratings yet
18.4 - K-Nearest Neighbours Geometric Intuition With A Toy Example - mp4
3 pages
Week 7 Nearest Neighbours
No ratings yet
Week 7 Nearest Neighbours
21 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
42 pages
KD-Trees
No ratings yet
KD-Trees
7 pages
CSE 326: Data Structures Lecture #21 Multidimensional Search Trees
No ratings yet
CSE 326: Data Structures Lecture #21 Multidimensional Search Trees
42 pages
Developments_in_KD_Tree_and_KNN_Searches
No ratings yet
Developments_in_KD_Tree_and_KNN_Searches
8 pages
Assignment 3: Kdtree: Due June 4, 11:59 PM
No ratings yet
Assignment 3: Kdtree: Due June 4, 11:59 PM
19 pages
Garcia 2008 Cvgpu
No ratings yet
Garcia 2008 Cvgpu
6 pages
Reducing Computational Cost: - Nearest-Neighbors Has O (N) Complexity
No ratings yet
Reducing Computational Cost: - Nearest-Neighbors Has O (N) Complexity
20 pages
07 Kdtrees
No ratings yet
07 Kdtrees
17 pages
Nearest Neighbor Search
No ratings yet
Nearest Neighbor Search
9 pages
CS168: The Modern Algorithmic Toolbox Lecture #3: Similarity Metrics and Kd-Trees
No ratings yet
CS168: The Modern Algorithmic Toolbox Lecture #3: Similarity Metrics and Kd-Trees
6 pages
Commissioning Guide Lines
No ratings yet
Commissioning Guide Lines
53 pages
BST Range Search!
No ratings yet
BST Range Search!
17 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Trees For Semidynamic Point Sets: AT&T Bell Labo Ttories Murray Hill, NJ 07974
No ratings yet
Trees For Semidynamic Point Sets: AT&T Bell Labo Ttories Murray Hill, NJ 07974
11 pages
Lintel and Arches PDF
100% (2)
Lintel and Arches PDF
54 pages
Experiment 4: Aim/Overview of The Practical: Task To Be Done
No ratings yet
Experiment 4: Aim/Overview of The Practical: Task To Be Done
7 pages
henna hair dye formulation
No ratings yet
henna hair dye formulation
19 pages
Lecture Note - C4
No ratings yet
Lecture Note - C4
32 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Groove-Weld
No ratings yet
Groove-Weld
18 pages
Diagram Steering
100% (1)
Diagram Steering
3 pages
Lab 2 Brinell Hardness Testing
No ratings yet
Lab 2 Brinell Hardness Testing
14 pages
Class 6 Savita
No ratings yet
Class 6 Savita
7 pages
Test Point Insertion For Test Coverage Improvement in DFT - Design For Testability (DFT) .HTML
No ratings yet
Test Point Insertion For Test Coverage Improvement in DFT - Design For Testability (DFT) .HTML
75 pages
Kurzweil K2000 Compact Flash R/W Installation
100% (1)
Kurzweil K2000 Compact Flash R/W Installation
21 pages
Complete Solution Assignment-1 PDF
100% (1)
Complete Solution Assignment-1 PDF
15 pages
Numerical Methods I
No ratings yet
Numerical Methods I
44 pages
Strategy Templates - StrategyQuant
No ratings yet
Strategy Templates - StrategyQuant
9 pages
p117 Andoni
No ratings yet
p117 Andoni
6 pages
Water Requirements of Crop
No ratings yet
Water Requirements of Crop
65 pages
Rammed Earth
No ratings yet
Rammed Earth
12 pages
Analysis of Support Vector Machine-Based Intrusion Detection Techniques
No ratings yet
Analysis of Support Vector Machine-Based Intrusion Detection Techniques
13 pages
Evaluation of Permanent Deformation Characteristics of Bituminous Mixes
No ratings yet
Evaluation of Permanent Deformation Characteristics of Bituminous Mixes
49 pages
Illusionpin: Shoulder-Surfing Resistant Authentication Using Hybrid Images
No ratings yet
Illusionpin: Shoulder-Surfing Resistant Authentication Using Hybrid Images
14 pages
Software Vendor Comparison Matrix (v0801)
100% (1)
Software Vendor Comparison Matrix (v0801)
5 pages
TRseries Repair Manual2ENG Small
No ratings yet
TRseries Repair Manual2ENG Small
42 pages
Exp 2 CHM 420 Water Hydration
No ratings yet
Exp 2 CHM 420 Water Hydration
7 pages
Stomach: Arterial and Venous Blood Supply
No ratings yet
Stomach: Arterial and Venous Blood Supply
10 pages
CH 10
No ratings yet
CH 10
9 pages
Kami Export - Anna Hebert - 1st PD L9-10 Handout
No ratings yet
Kami Export - Anna Hebert - 1st PD L9-10 Handout
4 pages
Pipeline Soil Interaction
No ratings yet
Pipeline Soil Interaction
19 pages
pranay HS(marksheet +certificate)
No ratings yet
pranay HS(marksheet +certificate)
2 pages
2011 IMC Problems
No ratings yet
2011 IMC Problems
2 pages
Silabus EL2008 Programming
No ratings yet
Silabus EL2008 Programming
3 pages
Juntas Fluroflex N3 y N4
No ratings yet
Juntas Fluroflex N3 y N4
1 page
The Shape of Data: Geometry-Based Machine Learning and Data Analysis in R
From Everand
The Shape of Data: Geometry-Based Machine Learning and Data Analysis in R
Colleen M. Farrelly
No ratings yet
Conformal Mapping
From Everand
Conformal Mapping
Zeev Nehari
4/5 (1)
Foundations of Image Science
From Everand
Foundations of Image Science
Harrison H. Barrett
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Quadtree: Exploring Hierarchical Data Structures for Image Analysis
From Everand
Quadtree: Exploring Hierarchical Data Structures for Image Analysis
Fouad Sabry
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet

Moore Tutorial

Uploaded by

Moore Tutorial

Uploaded by

An intoductory tutorial on kd-trees

Andrew W. Moore Carnegie Mellon University [email protected]

6.1 Nearest Neighbour Speci cation

Table 6.1: Finding Nearest Neighbour by scanning a list.

6.2 Naive Nearest Neighbour

6.3 Introduction to kd-trees

Field Name: dom-elt range-elt split left right

Field Type domain-vector range-vector integer kd-tree kd-tree

6.3.1 Formal Speci cation of a kd-tree

which maps the tree to the exemplar-set it represents:

[2,5] [6,3] [3,8] [8,9]

How the tree of Figure 6.1 splits up the x,y plane.

6.3.2 Constructing a kd-tree

6.4 Nearest Neighbour Search

Algorithm: Input: Output: Pre: Post: Code:

Table 6.3: Constructing a kd-tree from a set of exemplars.

6.5 Theoretical Behaviour

6.6 Empirical Behaviour

This was pointed out to the author by N. Maclaren.

6.6.1 Performance against Tree Size

6.6.2 Performance against the \k" in kd-tree

6.6.3 Performance against the Distribution Dimensionality

6.7 Further kd-tree Operations

6.7.1 Range Searching a kd-tree

6.7.2 Choosing a Pivot from an Exemplar Set

6.7.3 Incrementally Adding a Point to a kd-tree

6.7.4 Q Nearest Neighbours

6.7.5 Deleting a Point from a kd-tree

You might also like