Moore Tutorial
Moore Tutorial
Extract from Andrew Moore's PhD Thesis: E cient Memory-based Learning for Robot Control PhD. Thesis Technical Report No. 209, Computer Laboratory, University of Cambridge. 1991.
Chapter 6
Kd-trees for Cheap Learning
This chapter gives a speci cation of the nearest neighbour algorithm. It also gives both an informal and formal introduction to the kd-tree data structure. Then there is an explicit, detailed account of how the nearest neighbour search algorithm is implemented e ciently, which is followed by an empirical investigation into the algorithm's performance. Finally, there is discussion of some other algorithms related to the nearest neighbour search.
, 8(d r ) 2 E j d ; d j j d ; d j (6.1) In Equation 6.1 the distance metric is Euclidean, though any other p -norm could have been used.
0 00 00 0 00
None-nearer(E d d )
vi=k u Xd u j d ; d j= t ( i ; i )2
0
i=1
(6.2)
where di is the ith component of vector d. In the following sections I describe some algorithms to realize this abstract speci cation with the additional informal requirement that the computation time should be relatively short. 6-1
Algorithm: Data Structures: domain-vector range-vector exemplar Input: Output: Preconditions: Postconditions:
Nearest Neighbour by Scanning. A vector of kd oating point numbers. A vector of kr oating point numbers. A pair: (domain-vector range-vector) exlist, of type list of exemplar dom, of type domain-vector nearest, of type exemplar exlist is not empty if nearest represents the exemplar (d r ), and exlist represents the exemplar set E, and dom represents the vector d, then (d r ) 2 E and None-nearer(E d d ).
0 0 0 0 0
Code:
1. 2. 3. 3.1 3.2 3.2.1 3.2.2
nearest-dist := in nity nearest := unde ned for ex := each exemplar in exlist dist := distance between dom and the domain of ex if dist nearest-dist then nearest-dist := dist nearest := ex
<
Description A point from kd -d space A point from kr -d space The splitting dimension A kd-tree representing those points to the left of the splitting plane A kd-tree representing those points to the right of the splitting plane
Table 6.2: The elds of a kd-tree node give a formal de nition of the invariants and semantics. The exemplar-set E is represented by the set of nodes in the kd-tree, each node representing one exemplar. The dom-elt eld represents the domain-vector of the exemplar and the range-elt eld represents the range-vector. The dom-elt component is the index for the node. It splits the space into two subspaces according to the splitting hyperplane of the node. All the points in the \left" subspace are represented by the left subtree, and the points in the \right" subspace by the right subtree. The splitting hyperplane is a plane which passes through dom-elt and which is perpendicular to the direction speci ed by the split eld. Let i be the value of the split eld. Then a point is to the left of dom-elt if and only if its ith component is less than the ith component of dom-elt. The complimentary de nition holds for the right eld. If a node has no children, then the splitting hyperplane is not required. Figure 6.1 demonstrates a kd-tree representation of the four dom-elt points (2 5), (3 8), (6 3) and (8 9). The root node, with dom-elt (2 5) splits the plane in the y -axis into two subspaces. The point (3 8) lies in the lower subspace, that is f(x y ) j y < 5g, and so is in the left subtree. Figure 6.2 shows how the nodes partition the plane.
(6.3)
(6.4)
6-3
Figure 6.1 A 2d-tree of four elements. The splitting planes are not indicated. The 2,5] node splits along the y = 5 plane and the 3,8] node splits along the x = 3 plane.
[8,9] [3,8]
Figure 6.2
[2,5]
6-4
The invariant is that subtrees only ever contain dom-elts which are on the correct side of all their ancestors' splitting planes.
Is-legal-kdtree(empty): Is-legal-kdtree(< d r ; empty empty >): Is-legal-kdtree(< d r split treeleft treeright >) , 8(d r ) 2 exset-rep(treeleft) d split dsplit ^ 8(d r ) 2 exset-rep(treeright) d split > dsplit ^ Is-legal-kdtree(treeleft)^ Is-legal-kdtree(treeright)
0 0 0 0 0 0
(6.5)
Constructing a kd-tree exset, of type exemplar-set kd, of type kdtree None exset = exset-rep(kd) ^ Is-legal-kdtree(kd) If exset is empty then return the empty kdtree Call pivot-choosing procedure, which returns two values: ex := a member of exset split := the splitting dimension d := domain vector of ex exset' := exset with ex removed r := range vector of ex exsetleft := f(d r ) 2 exset' j d split dsplit g exsetright := f(d r ) 2 exset' j d split > dsplit g kdleft := recursively construct kd-tree from exsetleft kdright := recursively construct kd-tree from exsetright kd := < d r split kdleft kdright > By induction on the length of exset and the de nitions of exset-rep and Is-legal-kdtree.
0 0 0 0 0 0
3. 4. 5. 6. 7. 8. 9. 10.
Proof:
Figure 6.3 The black dot is the dot which owns the leaf node containing the target (the cross). Any nearer neighbour must lie inside this circle.
6-6
Figure 6.4 The black dot is the parent of the closest found so far. In this case the black dot's other child (shaded grey) need not be searched.
8 min > hri if i hrmin i < if hrmin i hrmax i= > i min : hri if i i hrmax i i The objects intersect only if the distance between p and t is less than or equal to .
t p t < t < t r
rectangle, and within the maximum distance to the target. The caller of the routine will generally specify the in nite hyperrectangle which covers the whole of Domain, and the in nite maximum distance. Before discussing its execution, I will explain how the operations on the hyperrectangles can be implemented. A hyperrectangle is represented by two arrays: one of its minimum coordinates, the other of its maximum coordinates. To cut the hyperrectangle, so that one of its edges is moved closer to its centre, the appropriate array component is altered. To check to see if a hyperrectangle hr intersects with a hypersphere radius r centered at point t, we nd the point p in hr which is closest to t. Write hrmin as the minimum extreme of hr in the ith dimension and hrmax as the i i maximum extreme. pi , the ith component of this closest point is computed thus: (6.6)
The search is depth rst, and uses the heuristic of searching rst the child node which contains the target. Step 1 deals with the trivial empty tree case, and Steps 2 and 3 assign two important local variables. Step 4 cuts the current hyperrectangle into the two hyperrectangles covering the space occupied by the child nodes. Steps 5{7 determine which child contains the target. After Step 8, when this initial child is searched, it may be possible to prove that there cannot be any closer point in the hyperrectangle of the further child. In particular, the point at the current node must be out of range. The test is made in Steps 9 and 10. Step 9 restricts the maximum radius in which any possible closer point could lie, and then the test in Step 10 checks whether there is any 6-7
Algorithm: Nearest Neighbour in a kd-tree Input: kd, of type kdtree target, of type domain vector hr, of type hyperrectangle max-dist-sqd, of type oat Output: nearest, of type exemplar dist-sqd, of type oat Pre: Is-legal-kdtree(kd) Post: Informally, the postcondition is that nearest is a nearest exemplar to target which alsop both within the hyperrectangle hr lies p and within distance max-dist-sqd of target. dist-sqd is Code:
1. 2. 3. 4. the distance of this nearest point. If there is no such point then dist-sqd contains in nity. if kd is empty then set dist-sqd to in nity and exit. s := split eld of kd pivot := dom-elt eld of kd Cut hr into two sub-hyperrectangles left-hr and right-hr. The cut plane is through pivot and perpendicular to the s dimension. target-in-left := targets pivots if target-in-left then nearer-kd := left eld of kd and nearer-hr := left-hr further-kd := right eld of kd and further-hr := right-hr if not target-in-left then nearer-kd := right eld of kd and nearer-hr := right-hr further-kd := left eld of kd and further-hr := left-hr Recursively call Nearest Neighbour with parameters (nearer-kd,target, nearer-hr,max-dist-sqd), storing the results in nearest and dist-sqd max-dist-sqd := minimum of max-dist-sqd and dist-sqd A nearer point could only lie in further-kd if there were some p part of further-hr within distance max-dist-sqd of target. if this is the case then if (pivot ; target)2 < dist-sqd then nearest := (pivot range-elt eld of kd) dist-sqd := (pivot ; target)2 max-dist-sqd := dist-sqd Recursively call Nearest Neighbour with parameters (further-kd,target, further-hr,max-dist-sqd), storing the results in temp-nearest and temp-dist-sqd If temp-dist-sqd < dist-sqd then nearest := temp-nearest and dist-sqd := temp-dist-sqd Outlined in text Table 6.4: The Nearest Neighbour Algorithm 6-8
5. 6. 6.1 6.2 7. 7.1 7.2 8. 9. 10. 10.1 10.1.1 10.1.2 10.1.3 10.2 10.3 10.3.1
Proof:
space in the hyperrectangle of the further child which lies within this radius. If it is not possible then no further search is necessary. If it is possible, then Step 10:1 checks if the point associated with the current node of the tree is closer than the closest yet. Then, in Step 10.2, the further child is recursively searched. The maximum distance worth examining in this further search is the distance to the closest point yet discovered. The proof that this will nd the nearest neighbour within the constraints is by induction on the size of the kd-tree. If the cuto were not made in Step 10, then the proof would be straightforward: the point returned is the closest out of (i) the closest point in the nearer child, (ii) the point at the current node and (iii) the closest point in the further child. If the cuto were made in Step 10, then the point returned is the closest point in the nearest child, and we can show that neither the current point, nor any point in the further child can possibly be closer. Many local optimizations are possible which while not altering the asymptotic performance of the algorithm will multiply the speed by a constant factor. In particular, it is in practice possible to hold almost all of the search state globally, instead of passing it as recursive parameters.
Figure 6.5 Generally during a nearest neighbour search only a few leaf nodes need to be inspected.
Figure 6.6 A bad distribution which forces almost all nodes to be inspected.
6-10
case a circle) is shown, and the number of intersecting hyperrectangles is two. The paper shows that the expected number of intersecting hyperrectangles is independent of N , the number of exemplars. The asymptotic search time is thus logarithmic because the time to descend from the root of the tree to the leaves is logarithmic (in a balanced tree), and then an expected constant amount of backtracking is required. However, this reasoning was based on the assumption that the hyperrectangles in the tree tend to be hypercubic in shape. Empirical evidence in my investigations has shown that this is not generally the case for their tree building strategy. This is discussed and demonstrated in Section 6.7. A second danger is that the cost, while independent of N , is exponentially dependent on k, the dimensionality of the domain vectors1. Thus theoretical analysis provides some insight into the cost, but here, empirical investigation will be used to examine the expense of nearest neighbour in practice.
, the size of the tree. the dimensionality of the domain vectors in the tree. This value is the k in kd-tree.
dom ,
the distribution of the domain vectors. This can be quanti ed as the \true" dimensionality of the vectors. For example, if the vectors had three components, but all lay on the surface of a sphere, then the underlying dimensionality would be 2. In general, discovery of the underlying dimensionality of a given sample of points is extremely di cult, but for these tests it is a straightforward matter to generate such points. To make a kd-tree with underlying dimensionality ddistrib, I use randomly generated kdom-dimensional domain vectors which lie on a ddistrib-dimensional hyperelliptical surface. The random vector generation algorithm is as follows: Generate ddistrib random angles i 2 0 2 ) where 0 i < ddistrib. Then let Q the j th component of the vector be ii=d 1 sin( i + ij ). The phase angles ij are de ned as =0 = 1 if the j th bit of the binary representation of i is 1 and is zero otherwise. ij 2
d
;
distrib ,
, the probability distribution from which the search target vector will be selected. I shall assume that this distribution is the same as that which determines the domain vectors. This is indeed what will happen when the kd-tree is used for learning control.
d
target
6-11
Figure 6.7
12 10
Number of inspections required during a nearest neighbour search against the size of the kd-tree. In this experiment the tree was four-dimensional and the underlying distribution of the points was threedimensional.
0 2000 4000 6000 8000 1000
In the following sections I investigate how performance depends on each of these properties.
6-12
80
70
60
Figure 6.8 Number of inspections against kd-tree size for an eight-dimensional tree with an eight-dimensional underlying distribution.
50
40
30
20
10
2000
4000
6000
8000
1000
600
500
Figure 6.9 Number of inspections graphed against tree dimension. In these experiments the points had an underlying distribution with the same dimensionality as the tree.
1 3 5 7 9 11 13 15
400
300
200
100
6-13
500
400
Figure 6.10 Number of inspections graphed against underlying dimensionality for a fourteen-dimensional tree.
300
200
100
11
13
6.6.4 When the Target is not Chosen from the kd-tree's Distribution
In this experiment the points were distributed on a three dimensional elliptical surface in tendimensional space. The target vector was, however, chosen at random from a ten-dimensional distribution. The kd-tree contained 10,000 points. The average number of inspections over 50 searches was found to be 8,396. This compares with another experiment in which both points and target were distributed in ten dimensions and the average number of inspections was only 248. The reason for the appalling performance was exempli ed in Figure 6.6: if the target is very far from its nearest neighbour then very many leaf nodes must be checked.
6.6.5 Conclusion
The speed of the search (measured as the number of distance computations required) seems to vary
:::
6-14
100
80
Figure 6.11 Number of inspections graphed against tree dimension, given a constant four dimensional underlying distribution.
60
40
20
10
12
14
only marginally with tree size. If the tree is su ciently large with respect to the number of dimensions, it is essentially constant.
::: :::
very quickly with the dimensionality of the distribution of the datapoints, ddistrib.
:::
linearly with the number of components in the kd-tree's domain (kdom), given a xed distribution dimension (ddistrib).
There is also evidence to suggest that unless the target vector is drawn from the same distribution as the kd-tree points, performance can be greatly worsened. These results support the belief that real time searching for nearest neighbours is practical in a robotic system where we can expect the underlying dimensionality of the data points to be low, roughly less than 10. This need not mean that the vectors in the input space should have less than ten components. For data points obtained from robotic systems it will not be easy to decide what the underlying dimensionality is. However Chapter 10 will show that the data does tend to lie within a number of low dimensional subspaces.
6-15
The abstract range search operation on an exemplar-set nds all exemplars whose domain vectors are within a given distance of a target point:
range-search(E d r) = f(d r ) 2 E j (d ; d )2 < r2g
0 0 0
This is implemented by a modi ed nearest neighbour search. The modi cations are that (i) the initial distance is not reduced as closer points are discovered and (ii) all discovered points within the distance are returned, not just the nearest. The complexity of this operation is shown, in Preparata and Shamos, 1985], to still be logarithmic in N (the size of E) for a xed range size.
Figure 6.12 A 2d tree balanced using the `median of the most spread dimension' pivoting strategy.
Figure 6.13 A 2d tree balanced using the `closest to the centre of the widest dimension' pivoting strategy.
6-17
My pivot choice algorithm is to rstly choose the splitting dimension as the longest dimension of the current hyperrectangle, and then choose the pivot as the point closest to the middle of the hyperrectangle along this dimension. Occasionally, this pivot may even be an extreme point along its dimension, leading to an entirely unbalanced node. This is worth it, because it creates a large empty leaf node. It is possible but extremely unlikely that the points could be distributed in such a way as to cause the tree to have one empty child at every level. This would be unacceptable, and so above a certain depth threshold, the pivots are chosen using the standard median technique. Selecting the median as the split and selecting the closest to the centre of the range are both O (N ) operations, and so either way a tree rebalance is O (N log N ).
Bibliography
Bentley, 1980] J. L. Bentley. Multidimensional Divide and Conquer. Communications of the ACM, 23(4):214|229, 1980. Friedman et al., 1977] J. H. Friedman, J. L. Bentley, and R. A. Finkel. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. on Mathematical Software, 3(3):209{ 226, September 1977. Omohundro, 1987] S. M. Omohundro. E cient Algorithms with Neural Network Behaviour. Journal of Complex Systems, 1(2):273{347, 1987. Preparata and Shamos, 1985] F. P. Preparata and M. Shamos. Springer-Verlag, 1985.
Computational Geometry.
Bib-1