0% found this document useful (0 votes)
23 views24 pages

Spatial Indexing For Large Urban Data Management

The document reviews various spatial indexing techniques essential for managing large urban datasets, particularly in light of increasing urbanization and the shift towards 3D spatial data. It discusses the limitations of traditional 2D indexing methods and explores alternative approaches, including grid files, Quadtrees, k-d Trees, and space-filling curves, to enhance data retrieval efficiency. The paper aims to provide spatial professionals with insights into applicable methods for effective urban data management.

Uploaded by

zhuxiaomei.xm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
23 views24 pages

Spatial Indexing For Large Urban Data Management

The document reviews various spatial indexing techniques essential for managing large urban datasets, particularly in light of increasing urbanization and the shift towards 3D spatial data. It discusses the limitations of traditional 2D indexing methods and explores alternative approaches, including grid files, Quadtrees, k-d Trees, and space-filling curves, to enhance data retrieval efficiency. The paper aims to provide spatial professionals with insights into applicable methods for effective urban data management.

Uploaded by

zhuxiaomei.xm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 24
ResearchGate Review of Spatial Indexing Techniques for Large Urban Data Management e 6205 ite So stents oe sere ‘Aorertoloning is age masuleaed by Francs 09 6n28 Jar 2016 Review of Spatial Indexing Techniques for Large Urban Data Management Suhaibah Azzi, Uznir Ujang, Frangois Anton, Darka Mioc and Alias Abdul Rakman Abstract Pressure on land development in urban areas causes progressive efforts in spatial planning and management. The physical growth expansion of urban ar ces to accommodate rural migration gives a massive impact to social, economic and polities of major cities. Most of the models used in managing urban areas are moving towards sustainable urban development which to fulfill current necessities while preserving the resources for future generation. However in order to manage large amounts of urban spatial data, an efficient spatial data constellation method is, needed. With the ease of three dimensional (3D) spatial data usage in urban areas as a new source of data input, practical spatial data indexing is necessary to im- prove data retrieval and management, Current two dimensional (2D) spatial index- ing approaches seem not applicable to the current and future spatial developments ‘Therefore, the objective of this paper is to review existing spatial data indexing ap- proaches to managing large urban area datasets. Each approach will be reviewed and discussed according to the current spatial data scenarios. In addition, a 3D spatial data indexing method will be discussed as an alternative for organizing 3D spatial data, ‘Subba Ave SD GIS Research Lab, Universiti Teknologi Malaysia, Malaysia, e-mail: norsuhasbahe Urnir Vang SD GIS Research Lab, Universiti Teknologi Malaysia, Malaysia, e-mails mduzn ic@utm my Frangois Anton Natonal Space Intute, Technical University of Denmark, Denmark and Universit Teknologi Malaysia, Malaysia, e-mail: Faspac Darks Mioe Natonal Space Intute, Technical University of Denmark, Denmatk, e-mail: ioc? space Alias Abdul Rabman SD GIS Research Le, Universiti Teknologi Malaysia, Malaysia e-mail: [email protected] 2 Suhaibab Az, Uanie Ujang, Frangois Anton, Darka Mice and Alias Abdul Rahman 1 Introduction ‘The urbanization phenomenon and the growth of mega-cities are widespread around the world. The rapid process of urbanization and the growing number of mega- cities cause a lot of economical, social and ecological issues and risks. Affected by this phenomenon, planners or spatial professionals are challenged for making policies and strategies in managing urban cities towards sustainable development. ‘The increasing prominence of urban growth is parallel with the increase of spatial information, Spatial information has become indispensable for numerous aspects of turban development and management. The increasing prominence of spatial data has been due to recent developments in spatial data capturing such as remote sensing, slobal positioning system, laser scanning and drones for aerial surveying. Geographic Information Systems (GIS) offer a platform to store, manage, ana- lyze and help decision taking using urban data such as buildings. road networks surface and subsurface utilities, etc. However, the bottleneck of GIS is the manage- ‘ment of large amounts of spatial data, Some of the issues that arise are related to software and hardware limitations in handling large spatial datasets. Furthermore, spatial data come from various sources of varying sizes such as digital maps, satel= lite images, topographic maps and aerial photographs. Moreover, for urban data ‘management, total data sizes could raise up to terabytes due to the large seale, area and object density. In urban data, spatial objects such as buildings could be builtin very dense areas with densities of up to thousands of buildings per square kilome- ter, In addition, preserving the historical development of an urban arca will require a temporal data management and thus, increase the disk storage size as described in Ujang et al (2013). This issue becomes more crucial when the trends of spatial data are evolving to three-dimensional (3D) spatial data and more specific application frameworks are moving towards using 3D data as a new data input Sharkawi et al (2008); Mohamad Yusoff et al (2010). Nowadays, acquisition of 3D spatial data is much more con- venient with various scales and resolution due to acquisition technology such as ‘TLS (Terrestrial Laser Scanning), LIDAR (Light/Laser Detection and Ranging) and drone based surveying. For a practical application, these datasets need to be man- aged and maintain within a 3D spatial database, Spatial query isa basic function in GIS. In fat, spatial analysis itself originates from spatial query Kun etal (2005). For large areas of urban data management, optimizing spatial queries is needed since the large volume of data will affect the duration of data retrieval. In spatial databases, spatial indexing is one of the key techniques in improving and optimizing spatial query and analysis. From this paper. user especially spatial professional will get a basic knowledge ‘on which methods is applicable and practical in managing large dataset for differ- ent situations or cases especially for 2D and 3D dataset, There are a lot of spatial indexing structures proposed for spatial query operation. However, not all are suited to 3D application in urban data management, Sometimes the processing of 3D data is based on the framework of 2D index structure Which lead to data retsieval inef- ficiency as mentioned by Gu et al (2011). Thus, in this paper several methods will Review of Spatial Indexing Techniques for Lay ban Data Management 3 be investigated to see the potential or applicability of each structure in handling » huge amount of urban data. The effort towards 3D spatial index structure also will be discussed in this paper including several extension of index structure based on the existing R-Tree structure, From the review, some issues of the uncompleted part towards true 3D index structure will be raised. 2 Spatial Indexing Spatial indexing is a technique to optimize query processing in spatial databases by organizing records in memory space. Query performance is increased especially when dealing with many rows in a table, Number of rows will be reduced once the appropriate indexes are created and after that indexes will be executed for query processing. In traditional database system, data is ordered (or sorted) for efficient searching by using the ordering approach such as B-Tree. But this approach is lim- ited to one dimensional data such as text, numbers, strings and etc. For higher di- ‘mensional spatial data, memory space is utilized for organizing and sorting records in the database, According to Rigaux et al (2002), there are two main groups of index structures for spatial databases, which ate space-driven structures and data- driven structures. In the next section, we will discuss and illustate spatial index structures based on these two main groups. 2.1 Space Driven Structures In space driven structure, objects in 2D space ate partitioned into rectangular cells. Objects are mapped according to geomewic criterion. The simplest space driven structure, fixed grid was proposed by Bentley and Friedman (1979). The fixed grid structure divides the space to fixed number of cell with an equal size. Each cell rep- resents a block of secondary storage. This means every point in the cell is stored in the same block of secondary storage, The structure of fixed grid also could be {implemented by using an array or hashing function. From time to time the structure of fixed grid has been extended to overcome some limitation of original fixed grid ‘The original fixed grid is limited to non-uniformly distributed data which lead to overflowed cell that being used by others. In this section several space driven struc- tures will be successively presented such as grid file which is an extension of fixed arid file, Quadiree, k-d Tree and Space Filling Curve of Hilberts Curve, Grid Fite Grid file is initially designed from fixed grid for indexing points Nievergelt et al (1984). In the fixed grid, the space is decomposed into rectangular cells and the 4 Suhaibah Azri, Uanie Ujang, Frangois Anton, Darks Mioe and Alias Abdul Rahman resulting grid is an n, x, array of equal size cells. Each one of the cells « is asso- ciated with a disk page, Objects or points mapped in cel care sequentially stored in the page associated with ‘The same method applied in a grid file, Every cell eis associated with one disk page. The difference is when a cell overflows, it will split into two cells. From that split, points are consigned based on the cell they fall into. This causes the size of the partition and cells to adapt to the point distribution. The construction of grid file is best described using Figure 1 —______ y Fig. 1 Grid fle for point distribution Grid files are not limited only to point indexing. They could be utilized for rect- angle indexing. In this case, rectangles are mapped by assigning the rectangles to the cell that overlaps the rectangle with the condition of contains, intersect or in contained in it. Grid files have also been extended in different ways as reported by Gaede Gaede et al (1998) such as Extendible Cell or EXCELL Taraminen (1982), the Two-Level Grid File Hinrichs (1985), and the Twin Grid File Hutflesz et al (1988) “The grid file structure seems to meet the requirement in terms of time complex- fay and space complexity. However, there are several problems that remain, Object duplication of neighbor cells will increase the index size and removing duplicates is expensive when the result size is large. For large datasets, especially large scale Review of Spatial Indexing Techniques for Large Urban Data Management s urban datasets, the management of this data becomes complex and this will directly degrade the query response time Quadtree ‘The quadiuee is a data structure used to subdivide the 2D plane into four quad- rants, The quadrants are named North West (NW), North East (NE), South West (SW) and South East (SE). In the Quadtree structure (Figure 2), the search space is recursively decomposed into quadrants until the number of rectangles overlapping, each quadrant is less than the page capacity. The Quadiree could be utilized for data representation such as point, area, line and curve. Fig. 2 Quadicce strata aken from Gomes etal (2009) A lot of applications used Quadtrees for detecting collisions, view frustum culling of terrain data, fractal image analysis and more, In previous research, 6 Suhaiba Az, Uanie Ujang, Frangois Anton, Darka Mioe and Alias Abdul Rahman (Quadtrees were applied to binary images Klinger and Dyer (1976), computation of ‘geometric properties Ranade and Shneier (1981), edge enhancement Ranade (1981) and distance transformation Samet (1982). Based on this, several scholars Samet et al (1984), Peuquet (1984), Mark and Lauzon (1985) and Mark (1986) have pro- posed the use of Quadiee in GIS. Recent work by Zhang et al (2011), has adopted the Quadiree data structure for large scale geospatial data with a new design of a Quadtree structure termed as Bitplane Quadtrce (BQ-Tree). 16-bits of the NASA MODIS image were tested in Zhang et al (2011) with 22,658 x 15,586 cells in 190 milliseconds, which is 1.86 billion cells per second, While handling raster data seems promising for Bit- planeQuadiree, a limitation on geometry and topology is reported in Mark (1986); Tobler and Chen (1986). These papers show that the Quadtree has been experi- ‘mented for large areas by partitioning the space into a set of mutually exclusive and collectively exhaustive patches or frames. The concept of the quadrant is applied within the patches of space which is very easy to accommodate with GIS systems. However, from this experiment, the Quadtree does encounter a problem when it ‘was applied on large portions or Earths surface, This problem is due to the problem of tessellated squares on a sphere surface, Thus to overcome this problem, Mark (1986)suggested for continuous or path-connected area of surface the modification concept of Quadiree is necessary. The same problem is also reported by Tobler and Chen (1986). In their experiment of they were using Quaduee for global storage information. From the report, itis mentioned that if one attempts to adapt this tech- nique on ellipsoidal surface, there will be some complications on the data storage. This is due to the facet edges which are curves on the sphere is not coincident with the latitude-longitude lines and sub-facet edges are not even great circles kd Tree According to Bentley (1975), the k-d Tree stores k-dimensional points in subdi- vided rectangular parallelepipeds. Bach rectangular parallelepiped corresponds to one node of the k-d Tree. The whole region of interest is the root ofthe tree. The internal nodes are subdivided by rectangles orthogonal to one of the d axes of co- ordinates (for each level i of the k-d-tree, the axis of coordinates orthogonal to the dividing rectangle is the axis of coordinate x) into two rectangular parallelepipeds. ‘The points of the original rectangular parallelepiped that have the x; coordinate lower than or equal to the x coordinate of the dividing rectangle are represented by the left subtree, while the others are represented by the right subtree (see Figure 3). For each node in the k-d Tree, it will consist of records and two pointers. Pointers are either null or point to other nodes, The insertion of a new point will result into the splitting of an existing node into two new nodes, Range searching starts from the root, and then checks whether the stored node overlaps with the right or the left region of the subtree. For the overlapping search region, the procedute is repeated until the leaf level ban Data Management 7 Fig. 3 ked Tee by Gomes etal (2009), kd Trees are efficient in performing nearest neighbor searches such as list of points at distance lower than D from point X ot find the nearest points of P from X. When executing such a query, the corresponding leaf node of point X will be detected by traversing the k-d-tree from the root until the leaf, The point stored in the leaf node will be processed and nearby leaf will be scanned. When the distance from X to the leaf is higher than the point found so far, it is the time to stop seazch- ing. Algorithm of k-d Tree is the most widely used algorithm for nearest neighbor search Friedman et al (1977). However, the search implementation is ineffective and degrading when space dimensionality is increased Oosterom (1999). Sample et al (2001) and Nene and Nayar (1997), Another problem of k-d Tree mentioned by Beerentzen et al (2002) is tha its structure depends on the order of inserted points In the case of k points require k levels; a searching will took a linear time to com= plete the process. A solution to this problem is by balancing the ree structure using adaptive k-d Tree. Adaptive k-d Tyee splits each set of points in the whole region of 8 Suhaiba Azi, Uanie Ujang, Frangois Anton, Darka Mice and Alias Abdul Rahman interest into two equal regions, By having this division, it is ensure that the adaptive k-d Tree isa balance tree (see Figure 4) However, the balancing enforcement by the adaptive k-d Tree took the flexibility of the structure to insert or delete points. This is due tothe prohibitions of standard balancing strategies based on rotation, The r0- tations could not be used because it would destoy the rule ata given depth of tee However, most of the recent extension of k-d Tree such as generalized k-d Tree has loosens the rule. Therefore, insertion or point deletions in adaptive generalized k-d ‘Tree are allowed and in the same time satisfy the balancing property. The algorithm fof k-d Tree has been improved from time to time to suit applications such as retriev- ing images from a medical image collection Robinson et al (1996) and mobile robot navigation Li etal (2008), > Se Foe DCAH Fig. An Adaptive ke Tre by Barentzen etal (2002) Space filling curve A space filling curve is a continuous function with domain the unit interval (0, 1] and a range that could be lying in a nD topological, uniform or Euclidean space. Space filling curves allow a proximity problem to be reduced from higher-dimensional spaces to the one-dimensional real line. Such a problem usually involves a query search and sorting in one-dimensional space. Many applications can benefit from space filling curves, but a common application of space filling curves is the stor- ‘age and retrieval of multidimensional data in the database. Proximity problems for which space filling curves have been used regularly, are approximate nearest neigh- bor search or finding closest points from one location. Until now, there are alot of Review of Spatial Indexing Techniques for Large Urban Data Management 9 variants of space filling curves such as the Hilbert Curve, Dragon Curve, Gosper Curve, Moore Curve, Z-Order curve and many more. ‘The space filling curve was discovered by Giuseppe Peano in 1890 and called the Peanos Curve Peano (1890). From his finding, a curve must pass through ev- ery point on a plane at least once. But, before his finding, in 1878, George Can- tor has demonstrated a one-to-one correspondence between unit interval and unit square Thiele and Wos (2002). In 1879, Netto added that any such mapping could not be continuous Bocarra (2007). In 1891, David Hilbert discovered a Hilbert Curve structure which isa simpler variant of the same idea of Peanos Kevin (2008) ‘The Hilbert Curve subdivides squares into smaller squares instead of Peanos, nine smaller squares. Figure 5 shows four iterations of Peanos Curve, Fig. § First four iterations of Peano Curve When David Hilbert discovered a variation of Peanos Curve, he divided each side of the unit square into two parts equally. This yields the square into four smaller squares, From the divided squares, each of them is divided into another four smaller squares, and so on, For each part ofthis division, a curve will raverse all the squares. Hilberts Curve normally starts at the lower left division and ends at the lower right division. According to its movement and process, the Hilbert Curve could be ex- pressed in base 2. ‘The Hilbert Space Filling Curve could be described as an underlying grid, an NN x N artay of cells where NV = 2". Hilberts enumeration of the squares is shown in Figure 6 for W = 1,2,3. Note that the first square is always at the lower left comer and the last square is always in the lower right comer. Fig. 6 The fst hee sages in generating Hilber Curve 10 Suhaibab Azi, Uanie Ujang, Frangois Anton, Darks Mioe and Alias Abdul Rahman From the divided square, we assume that the Nx N cells coordinates start at (0,0) in the lower left comer. From the assumptions, the Hilbert Curve has comers at integers ranging from 0 to 2" — 1 in both x and y, And if we took a positive direction along the curve the location from (x,y) = (0,0) 10 (2 ~ 1, 0) Ina lage data management Hilberts Curve responded positively on the perfor- ‘mance.Castro and Burns (2007) have experimented with the Hilberts Curve with the visualization of a large datasets. From their finding, it indicates that Hilberts curve produces acceptable quality of mapping and in the same time, achieving faster vi- sualization especially for online visualization of very large data sets. Apart from Hilberts Curve performance, its limitation should be considered in developing ap- plications. For an instance Anders (2009) experimented a data vector with Iength of 18M to demonstrate the use of Hilberts Curve. In this experiment, the data vector plotted with its whole length condensed to width of plot. The obvious drawback of Hilberts Curve in this experiment is that itis rather hard to relate a position on the plot back to a position of the vector. This limits applicability when absolute positions are of interest but is not an issue if one is interested in judging relative positions, ive, spacing, homogeneity, ec. Another drawbacks of Hilberts Curve mentioned by Chung et al (2007); Kamata et al (1996) is the representation of Hilberts Curve Based on its representation some homogeneous segments may be rather long. In their experiments, seven bits are needed for encoding the length one single segment. 2.1 Data-Driven Structures Data Driven structures is a structure that based on partitioning the set of objects and thus adapt to the distribution of these objects. In this section R-Tree data structure has been chosen as an example of data driven structures. The R-Tree structure itself is adapted to the rectangle distribution inthe plane which is a concept of data driven structures RTree The R-Tree is a balanced tee found by Guttman (1984), The structure of R-Tree is applicable to arbitrary spatial objects by aggregating minimum bounding regions {rnbr) and organizes the aggregates ina wee structure. Each node in the R-Tre sruc- ture corresponds to a disk page. Every leaf node contains an array of leaf entuies. ‘And every leaf entry must bea pic (by, object) mbr sof the forma (a1T7- 4%), where d isthe dimension of the search space. Non-leaf nodes contain an array of node enties. According to Guttman (1984) R-Tree structures must satisfy the fol- lowing properties: ‘© Every leaf node in the tee contains between m and M index records, where m € (0,M/2) except for the root Review of Spatial Indexing Techniques for Large Urban Data Management u ‘© Each index record in a leaf node contain entries of the form (I, tuple-identiier), where Tis the smallest bounding reclangular parallelepiped of the n-dimensional data object ‘© Non-leaf nodes contain entries of the form (f, child-pointer), where I is a n- dimensional rectangular parallelepiped. Every non-leaf node has between m and ‘M children unless itis the root, © For each leaf entry (1 child-pointer), [is the minimal bounding box or smallest rectangle of object stored in the child node. ‘© The root node has at Ieast two children unless itis a leaf, ‘© Allleaves appear on at most two different levels, That means the tree is balanced. Fig. 7 R-Tice rpretentaton from Manolopouls a 2006) Figure 7 illustrates the R-Tree structure. The three minimum bounding rectangles of the root node Ra, Rb and Re are represented by thick lines. Traditionally, R-Tree performance has been evaluated for range query such as NN (Nearest Neighbor) queries (Roussopoulos et al, 1995) and join query Brinkhoff etal (1993). From the eport of Papadopoulos and Manolopoulos (1997)), the R-Tree structure shows the efficiency in answering Nearest Neighbor queries. In real world applications, NN queries are applied in various fields such as pattern recognition, clustering analy sis, content based image retrieval and more. Unfortunately, just like other indexing structures, R-Trees have their own limitations due to their poor scaling characteris tic, According to Manolopoulos et al (2006), the original R-Tree has two important disadvantages: 1. For a point query, the R-Tree structure may execute an investigation from root to leaf level which leads to performance degradation especially when MBRs over- lap. 2, Large rectangles (MBR) may increase the overlap area which leads to perfor- ‘mance deterioration during query execution, ‘These limitations or disadvantages were addressed i research works towards R- ‘Tree variants, such as R' Tree by Sells etal (1987), R*Tree by Beckmann et al 2 ‘Subaibah Azz, Unie Ujang, Frangois Anton, Darka Mioe and Alias Abdul Rahman (1990), Hilbert-R Tree by Kamel and Faloutsos (1994) and R k-d Tree by Anand- hhakumar et al (2010). Every variant is altempting (o tackle some limitation of the original R-Tree. For instance, the R~ Tree variant by Sells et al (1987) is dedicated to increase search performance for point query by reducing the number of disk ac- cess and attempts to have zero overlapping. This variation of R-Tree has been dis- cussed by Hwang et al (2003), Manolopoulos et al (2006) and Balasubramanian and Sugumaran (2012). Several other R-Tree variants Were introduced such as the Ri Tree, R*Tree, Packed R-Tree, Hilbert R-Tree, R*Q Tree and Historical R-Tree, Lakshmi et al, 2012 classified the structure of R-Tree variance in three categories: process change, hybridization and extension. The classification is shown in Figure 8 BS ea Fig. 8 ReTree Variants by Balasubramanian and Sugumaran 2012) ‘The R-Tree structure has been widely used in spatial database management sys ems due (o its efficiency in organizing spatial objects of arbitrary dimension, Be- sides that, using R-Tree index will save both disk space and time to build and main- tain the index. Bottom-up build performance also will be improved. This is due to reducing memory and temporary database space usage. Commercial RDBMSs such as Oracle and Informix, also adopting an R-Tree structure in their spatial database according to Ravada and Sharma (1999); Kothuti et al (2002). Due to this reputa- tion, the R-Tree structure and its variants attracted a lot of attention from researchers Review of Spatial Indexing Techniques for Large Urban Data Management B and commercial database corapanies. The simplicity of the structure and its ability to handle spatial objects efficiently are (wo tempting reasons for incorporating the structure of R-Tree in a spatial database. In modern computing intensive applica tions, an efficient method for handling spatial data is of great importance Ravada and Sharma (1999) 3 3D Spatial Indexing ‘The state-of-the-art of 3D spatial indexing evolved when 3D data are widely used in various fields and applications. For example, spatial professionals can simulate the noise, heat and pollution spreading in virtual cities; architects can use a 3D dataset to design buildings and visualize the sceneries with real textures and wility man- agement companies use 3D spatial databases for the integration of above ground, underground, indoor and outdoor objects. In order to retrieve those data efficiently, a 3D spatial index is required in managing, retrieving and optimizing records in spatial databases. The need of 3D spatial index for 3D data management has been ‘mentioned in Wang et al (2010); Zhu et al (2007); Deren et al (2004); Arens et al (2005); Zlatanova (2000); Korfter et al (2000). As we discussed in the previous section, there are many different indexes in han- dling 2D spatial objects such as k-d Tree, quadtree and grid file. However, apart from the k-d tree, they are not suitable to 3D applications. Thus, most of the research on. 83D spatial index is done just by extending the 2D spatial index such as R-Tree to 3D R-Tree and Quadiree to Octree, The transition of 2D case to the 3D case is not straightforward, We could not assume objects in 2D are the same as in 3D space. For example, the geometry of objects in 2D is different when it transforms into 3D. Because of that, the relationships between objects in 2D sometimes aze different in 3D. When it comes to spatial indexing, 3D data that was processed based on the expansion of 2D spatial index may affect the data retrieval during queries Gu et al (2011), As we know, a query is very significant to the efficient management of the database especially for large datasets of large scale arcas. For example, in a commercial database management system such as Oracle, 3D R-Tree is implemented when dealing with 3D data Murray (2009); Ravada and Sharma (1999). The concept of 3D R-Tree is not much different from the origi- nal R-Tree, What makes it capable of handling 3D objects is its extension to the third dimension of the MBR to the MBV (Minimum Bounding Volume). In the 3D ReTree, the geometry of candidates that satisfy a query window will be identified through the volume box (MBV), or in other words, a 3D rectangular parallelepiped. Figure 9 shows a MBV in a 3D R-Tree. Just like the original R-Tree, the tee shape is autonomously optimized during data insertion and the leaf nodes are balanced to make sure the performance is stable. However, when the R-Tree is extended into 3D space, the MBVs of sibling nodes tend to frequently overlap, and MBVs among. nodes can even contain each other. Overlapping of MBVs is the main reason for the low efficiency of query performance due to multi-path queties. Therefore, using 3D 4 ‘Suhaibah Az, Umie Ujang, Frangois Anton, Daska Mioe and Alias Abdul Rahman ReTrees for 3D applications is not successfully promising. It needs to be optimized for modified according to features of 3D space. Several research works such as Zhu et al (2007); Jun and Shengnan (2011); Gong (2011) try to enhance the 3D R-Tree in order to minimize the overlapping among siblings. 3 cope ¢ tae Fig. 9. MBV in R-Tree Suuctue in Gong eta (2009) Due to critical overlap of sibling nodes and uneven size of nodes, Zhu etal (2007) attempts to minimize the overlap and optimize the clustering algorithm by introduc ing k-means clustering algorithm to put forward an improved 3D R-Tree. From his experiments, the performance of 3D R-Tree is increased by using an improved al- gorithm where the overlap is minimized drastically while balancing the volume of nodes. Figure 10 shows & comparison of the original algorithm of 3D R-Tree and this improved algorithm, Whilst, Jun Jun and Shengnan (2011) suggested splitting inegular 3D objects into several small parts in order to improve the overlapping. node issue. In 2009, Gong et al (2009) reported two sub-algorithms of R-Tree that make the 3D R-Tree to be well-shaped, The two sub-algorithms are namely node- choosing and node-split procedures. Node-choosing is dedicated to finding a leat node for new inserted objects and node-split is responsible for spitting overflow nodes into smaller ones. Using his algorithm, fewer R-Tree nodes are accessed in ‘memory, which increases the query performance. Experimental results of Gongs work are shown in Figure 11. From the experiment, the improved algorithm is far better over the original R-Tree and R*-Tree Although improving static 3D R-Trees increases the efficiency in query perfor- ‘mance, there are several factors that could not be ignored and must be considered in the implementation of spatial indexes, 3D R-Trees for different applications will result in different performances. For example, applications with large amounts of data still face low retrieval efficiency even wien improved 3D R-Tree is applied Review of Spatial Indexing Techniques for Large Urban Data Management Is Fig. 10 Comparison of 3D Ree methods (a) Orginal algorithm of 3 R-Tree and (b) Improved algorithm in Zu el (2007) Comparison of 3D R-tree 2 400 200 cecal Ae fi-teae a eer lrr-tzee m4 43 dist lostinsl tees) 6 86 Fig. I Improved 3D R-Tee algorithm over Orginal or Cassie R-Teee and R*-Tree by Gong eal m9) According to Gu etal (2011), using a single index structure for 3D spatial data is not an ideal method, Thus, the combination of several index structure is proposed by many scholars (see Wang et al 2010); Gu etal (2011); Song etal (2006); He et al (2009)). A combination of two or more index stractue is also known as hybrid indexing. By combining each other limitations and strengths, index structures in by- brid indexing would push the limits of data retrioval ficiency. Some examples of hybrid indexing of 3D objects are ORSI Gu etal (2011) and LOD.SKDR Tree He etal (2009), ORSI Gu etal (2011) is built based on an Octree and a R-Tree index structures. The principle of ORS: 1. The area of interest is divided by Octree. Then the first level index is established by setting up a threshold for subspace k, where k is the number of objects in sub-space S. When the number of objects in subspace S is greater than k, Octtee division is repeated, 6 ‘Subaibah Az, Unie Ujang, Prangois Anton, Darka Mioe and Alias Abdul Rahman 2, Secondary index is build based on R-tree, Each subspace is pointed by one pointer, If there is no R-Tree in the subspace, the pointer is empty. In an R-Tree rode, adjacent or close spatial objects are gathered to improve query efficiency. wm fanc[ 1 [|r | count | Lever | cr,B, (CPyeMBBy count | Lever [or,sime, | 07.5, OP MB, Fig. 12 ORSI scucture by Guet al (2011) Figure 12 shows the ORSI index structure Gu et al (2011). Based on the oper- ational performance of ORSI on R-Tree, average access on inserting, deleting and retrieving indicates that the performance of ORSI is overall better than classic R- ‘Trees. Figure 13 shows the comparison of ORSI and classic R-Trees. et Fig. 18 Comparison of ORS and R-Tree in Gu etal (2011) As mentioned in previous paragraph, different structures may be used for dif- ferent applications. For the applications such as underground utility management, Review of Spatial Indexing Techniques for Large Urban Data Management "7 geological analysis and other underground space entity object may need a different index structure, Any current index structure could not effectively deal with large scale complicated geological data, The index structure of multi-level spatial index or BCSR-Tree in He etal (2009) and 3D RR-Tree in Wang et al (2010) is dedicated to handling the retrieval of 3D underground objects. For large scale underground data, Liu etal (2010) proposed LOD-SKDR Tce structure for management of large scale database, The LOD-SKDR Tyee is a combination of SKD-Tree, R-Tree and information of object’s LOD (Level of Detail). According to Liu et al (2010) the LOD-SKDR Tree utilizes the object’s number as a bifurcation bound target to im- prove the balance of SKD-Tree. While in the meantime, R-Tree and texture schedul- ing of SKD-Tree content are restricted to reduce time on insertion or deletion. A. LOD object model is also used to improve real time rendering in the 3D scene. Figure 14 shows a complicated 3D underground scenaio - utility pipeline in urban Fig. 14 Complicated Underground Scenes: Underground Pipeline in Urban Area Tn managing large raw datasets acquired using LADAR, spatial indexing plays an important ole by managing points in spatial databases. With the rapid development of LiDAR acquisition, a variety of spatial data will continue to be accumulated The aim of database management in manipulating those data is by having efficient high-quality rendering of point clouds. Many scholars and researchers attempt to investigate suitable indexing data structures for handling these huge massive point clouuls. The hybrid indexing approach on LIDAR data proposed in Liu et al (2008) is specialized for point cloud visualization. In their work, Liu etal. 2008 integrate a ked Tiee and an Octtee for a 3,200,079 point cloud on display time. Wang and Guo (2012) has presented the QMBB spatial index which has both a 2D and 3D index strctures. QMBB Tee represents Quadiuee segmentation as 2D index based on.a 2D range image. Each one ofthe corresponding point image of MBR is then calculated as a MBV in 3D space. Based on the reslt, the rendering speed is up ‘© 10 milion points per second. There are more research developments of hybrid indexes for point clouls such as Gong (2011); Koo and Shin (2005); Sehiin etal 8 Suhaiba Azi, Uanie Ujang, Frangois Anton, Darka Mice and Alias Abdul Rahman (2009, 2013). These researches proved the capability of hybrid indexing in efficient data management for point clouds. 4 Conclusion In this paper, we presented several index structure based on space driven structure and data driven structure, We briefly described their structure and their implements- tion along with large volume data application such as, large urban data management and point clouds data management. Limitation and advantages of each structure are also revealed. Based on the several methods of 2D and 3D spatial indexing, itis clearly shown that spatial indexing is capable in handling a large amount of dataset especially for urban data management, However, the structuse of each spatial index- ing is unique and only suits for certain situation or case, For an instance, continuous area of surface is not suitable for tessellated square based index structure such as ‘Quadtrce, Among all the data structure R-Tree is the most widely used spatial index: structure in commercial database which serving many applications and multidimen- sional data. In this paper, several extension of R-Tree index structure based on 3D. hhas been discussed in the last section of this paper. It could be seen that most of the structute are developed and tested for point dataset and lack of the example and effort for volume based data, Based on the discussions and some example given, it indicates that there is more space for 3D spatial index to be improved especially in the aspect of complex and large data management, A survey on spatial index struc ture in this paper may give some view and idea on future development of spatial indexing for large data management. Acknowledgements Major funding for this esearch was provided by the Ministry of Higher Zé eatin Malaysia and patally funded by the Land Surveyors Boaed of Malaysia. This research vas done while te third author was a Visting Pull Professor atthe Faculty of Geoinformation and eal Estate at Universiti Teknologi Malaysia for which the thd authors very thank References Anandhakumar P, Priyadarshini J, Monisha C, Sugirtha K, Raghavan $ (2010) Location based hybrid indexing structure - r k-d tree, In; 2010 First Interna- tional Conference on Integrated Intelligent Computing (ICHC), pp 140-145, DOL 10.1109iciie 2010.56 Anders $ (2009) Visualization of genomic data with the hilbert curve. Bioin- formatics 25(10):1231-1235, DOI 10.1093/bioinformatics/onp152, URL nformat ics .oxfordjournals.org/content/25/ cract, http! //bioinformatics .oxfordjournals content /25/10/1231.full.paf htm Review of Spatial Indexing Techniques for Large Urban Data Management 9 Arens C, Stoter J, van Oosterom P (2005) Modelling 34 spatial ob- jects in a geo-dbms using a 3D primitive. Computers & Geosciences 31Q}:165-177, DOI _hitp:/dx.doi.org/10.1016f.cageo.2004.05.013, URL hee: / /awe cience/article/p: 5008830040400192x Balasubramanian L, Sugumaran M (2012) Article: A state-of-art in R-Tree variants {or spatial indexing. Intemational Journal of Computer Applications 42(20):35— 41, published by Foundation of Computer Science, New York, USA. Beerentzen J, Gravesen J, Anton F, Aanszs H (2012) Spatial Data Indexing and Point Location, In: Guide to Computational Geometry Processing, Springer, London, pp 213-225, DOT 10,1007/978-1-4471-4075-7.12, URL»: org/10.1007/978-1-4471-4075-7_12 Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-Tree: an efficient ‘and robust access method for points and rectangles. SIGMOD Ree 19(2):322-331 Bentley JL (1975) Multidimensional binary search trees used for associative search- ing. Commun ACM 18(9):509-517 Bentley JL, Friedman JH (1979) Data structures for range searching. ACM Comput 397-409, DOT 10,1145/356789.356797, URL http: //doi acm. 45/356789. 35678 s//dx.doi sms. In: Essentials of Mathematica, vol 1, ger New York, pp 407-416, DOI 10.1007/978-0-387-49514-9.23 Brinkoif T, Kriegel HP. Sceger B (1993) Efficient processing of spatial joins using trees, SIGMOD Rec 22(2):237-246 Castro J, Burns $ (2007) Online Data Visualization of Multidimensional Databases ‘Using the Hilbert SpaceFilling Curve, Lecture Notes in Computer Science, vol 4370, Springer Berlin Heidelberg, chap 9, pp 92-109 Chung KL, Huang YL, Liu YW (2007) Efficient algorithms for coding hilbert ‘curve of arbitrary-sized image and application to window query. Informa- tion Sciences 177(10):2130 ~ 2151, DOT http://dx doi.org/10.1016/.ins.2006.12. 003, URL http: //nww.scien e/article/ pii/s0020025506003732 Deren L, Qing Z. Qiang L, Peng X (2004) From 2d and 34 gis for eybercty. Geo- spatial Information Science 7(1):1-5 Friedman JH, Bentley JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3):209-226, DOT 10. 1145/385744.355745, URL http: //doi.acm.org/10.1145/355744 355745 Gaede V, G Oliver (1998) Multidimensional access methods. ACM Comput Surv 30(2):170-231 Gomes ALP, Voiculescu, Irina, Jorge, Joaquim, Wyvill, Brian, Galbraith, Cal- Jum (2009) 2 - Spatial Data Structures. In: Implicit Curves and Surfaces Mathematics, Data Structures and Algorithms, Springer, London, pp 41 — 62, DOT 10.1007/978-1-84882-406-S\.2, URL hetp://ax.doi .org/ 1007/978-1-84882-406-5_2 Jirect.com/sei 20 Suhaibab Azi, Uanie Ujang, Frangois Anton, Darka Mioe and Alias Abdul Rahman Gong J, Ke 8, Li X, Qi $ (2009) A hybrid 3D spatial access method based on quadirees and R-rees for globe data. In: International Symposium on Spatial Analysis, Spatial-Temporal Data Modeling, and Data Mining, DOT 10,1117/12. 837594, URL http: //dx-doi.org/10.1117/12. 837594 Gong J, Zhu Q, Zhang H, Li X, Zhou D (2011) An adaptive control method of lods for 3D scene based on ree index. Acta Geodaetica et Cartographica Sinica (Assue 4):Page 531-534 Gu W, Wang J, Shi H, Liu ¥ (2011) Research on a hybrid spatial index structure Journal of Computational Information Systems 7(11}:3972-3978 Guttman A (1984) R-trees: a dynamic index structure for spatial searching. SIG- MOD Ree 14(2):47-57 He Z, Liu G, Weng Z, Wa C (2009) Three-dimensional spatial indexing method of ‘complicated geological scene. In: 17th International Conference on Geoinformat- ies, 2009, IBEE, 1, pp 1-4 Hinrichs K (1985) Implementation of the grid fle: Design concepts and experience. BIT Numerical Mathematics 25(4):569-$92 Hutflesz. A, Six HW, Widmayer P (1988) Twin grid files: space optimizing ac- ‘cess schemes. SIGMOD Ree 17(3):183-190, DOT 10.1145/971701,50222, URL http://doi .2em.org/10.1145/971 701 50222 Hwang $, Kwon K, Cha S, Lee B (2003) Performance Evaluation of Main-Memory Retree Variants, Lecture Notes in Computer Science, vol 2750, Springer Berlin Heidelberg, chap 2, pp 10-27 Jun G, Shengnan K (2011) 3D spatial query implementation method based on tree. In: International Conference on Remote Sensing. Environment and Transporta- tion Engineering (RSET), 2011, IEBE, 1, pp 2828-2831 Kamata S, Niimi M, Kawaguchi E (1996) A gray image compression using a hilbert sean, In: Proceedings of the 13th International Conference on Pattern Recogni- tion, 1996., 3, pp 905-909, DOT 10.1109/iepr.1996.547299 Kamel I, Faloutsos C (1994) Hilbert R-Tree: An improved R-Tree using fractals Kevin B (2008) Organizing point sets. PAD thesis, Freie Universitat Berlin Klinger A, Dyer CR (1976) Experiments on picture representation us- ing regular decomposition. Computer Graphics and Image Processing 5(1):68-105, DOT http://dx doi org/10.1016/S0146-664X(76)80006-8, URL jencedirect .com/science/article/pii/ 148664X76800058 Koo YM, Shin BS (2005) An Efficient Point Rendering Using Octree and Texture Lookup, Lecture Notes in Computer Science, vol 3482, Springer Berlin Heidel- berg, chap 127, pp 1187-1196 Kothurt RKV, Ravada S, Abugoy D (2002) Quadttee and rtree indexes in oracle spatial: a comparison using gis data Kun Z, Xiu-guo L, Hui ¥ (2005) Research on Spatial Index Structure of LOD-OR ‘Tree in 3D GIS. Bulletin of Surveying and Mapping S(1):27-29 Li MH, Hong BR, Cai ZS, Piao SH, Huang QC (2008) Novel indoor mobile robot navigation using monocular vision, Engineering Applications of Attficial In- telligence 21(3):485 — 497, DOT hup:/idx.doi.org/10.1016/).engappai.2007.05. Review of Spatial Indexing Techniques for Large Urban Data Management FE 003, URL hetp://www.sciencedirect pii/80852197607000607 Liv H, Huang 7, Zhan Q, Lin P (2008) A database approach to very large LIDAR data management. In: International Society of Photogranometry and Remote Sens- ing QSPRS), ISPRS Beijing XXXVIL463-468 Liu ¥, Liu G, He Z (2010) Spatial index technology for multi-scale and large scale spatial data In: 18th International Conference on Geoinformatics, 2010, IEEE, 1. ppt Kofler M., Gervauty M. (2000) R-rees for organizing and visualizing 3d gis databases. The Journal of Visualization and Computer Anima- tion 11G)128-143, DOT doi:10.1002/1099-1778(200007)11:3\&'\#060; 129: ‘AID-VIS227 \&e\#062,3.0.C0;2-T, URL http: / /wmw. ingentaconnect con/ content / jus /vis/2000/00000011/00000003/art00227 Manolopoulos ¥, Nanopoulos A, Papadopoulos AN, Theodordis ¥ (2006) R-Tiees: theory and applications. Springer Mark DM (1986) The use of quactrees in geographic information systems and spa tial data handling, Hardware, Data Capture and Management Techniques, Auto- Carto London 1:517-526 Mark DM, Lauzon JP (1985) Approaches for quadtree-based geographic informa- tion systems at continental or global scales. In: Proceedings of Auto-Carto, 7, pp 355-365 Mohamad Yusoff I, Uznie Ujang M, Abdul Rabman A (2010) 3D Volumetsic Soft ‘Geo-objects for Dynamic Urban Runoff Modeling, Developments in 3D Geo- Information Sciences, vol 1, Springer Berlin Heidelberg, chap 18, pp 200-219 Murray € (2009) Oracle spatial developer's guide, L1g release 1 (11.1) Nene SA, Nayar SK (1997) A simple algorithm for nearest neighbor scar in high dimensions. Patter Analysis and Machine Intelligence, IEEE Transactions on 19(9):989-1008, Nievergelt J, Hinterberger HY, Seveik K.C. (1984) The Grid File: An Adaptable, Symmetiic Multikey File Structure. ACM Trans. Database Syst 9(1):38-71 DOI 10.1145/348.318586, URL htcp://do!.aem.org/10.1145/348. 318586 osterom PV (1999) Spatial access methods edited by longley, goodchild, maguire en thind, john wileypages385-400 (vol. 1), 1999. Geographical information sys- tems 1:385~400 Papadopoulos A, Manolopoulos Y (1997) Performance of nearest ncigh- bor queties in ruees. In; Database Theory ICDT "97, Lecture Notes in Computer Science, vol 1186, Springer Berlin Heidelberg, pp 34108, DOL 10,1007/3-540-62222-5.59, URL http: //ax.dos .org/10.1007/ 3-$40-62222-5_59 Peano G (1890) Sur une courbe, qui remplit toute une aire plane. Mathematische Annalen 36(1):157-160 Peuguet DY (1984) Data structures fora knowledge-based geographic information system. In: Proc. Spatial Data Handling 2 Suhaiba Azri, Uanie Ujang, Frangois Anton, Darka Mioe and Alias Abdul Rahman Ranade $ (1981) Use of quadtrees for edge enhancement. IEEE Transactions on ‘Systems Man and Cybernetics 11:370-373 Ranade S, Shneier M (1981) Using quadtrees to smooth images. TE ‘on Systems Man and Cybernetics 11:373-376 Ravada S, Sharma J (1999) Oracle8i Spatial: Experiences with Extensible Databases, Lecture Notes in Computer Science, vol 1651, Springer Berlin Hei- delberg, chap 21, pp 355-359 Rigaux P, Scholl M, Voisard A (2002) 6 - spatial access methods In: Spatial Databases, Morgan Kaufmann, San Francisco, pp 201- 266, DOT _ hittp://dx doi.org/10.1016/B978- 155860588-6/50008-7, URL http://www. sciencedi rect .com/science/article/p p9781558605886500087 Robinson G, Tagare H, Duncan J, Jaffe C (1996) Medical image collection indexing: ‘Shape-based retrieval using kd-tres. Computerized Medical lmaging and Graph- ies 20(4):209-217, DOL hittp://dx.doi.org/10.1016/S0895-6111(96)00014-6, URL http://www. sciencedix Jarticle/pii/ 80895511196000146 Roussopoulos N, Kelley S, Vincent E (1995) Nearest neighbor queries. SIGMOD Ree 24(2):71-79 ‘Samet HT (1982) Distance transform for images represented by quadtrees. Pattern ‘Analysis and Machine Intelligence, IEE Transactions on (3):298-303 Samet H, Rosenfeld A, Shaffer CA, Webber RE (1984) A geographic information system using quadtrees. Pattern Recognition 17(6):647~656, DOI hitp://dx.doi ‘org/10,1016/0031-3203(84)90018-9, URL nctp: //uww. scleacedizect com/science/article/pii/0031320384900185 ‘Sample N, Haines M, Arnold M, Purcell T (2001) Optimizing search strategies in kd trees. In: Fifth WSES/IEEE World Multiconference on Circuits, Systems, Com- ‘munications & Computers (CSCC 2001) Schén B, Bertolotto M, Laefer DF, Morrish $ (2009) Storage, manipulation, and visualization of lidar data, In: Proceedings of the 3rd ISPRS International Work- shop 3D-ARCH 2009:" 3D Virtual Reconstruction and Visualization of Complex Architectures”, International Society of Photogrammetry and Remote Sensing, XXXVII-SAWL ‘Schon B, Mosa ASM, Laefer DF, Bertolotto M (2013) Octree-based indexing for 34 pointclouds within an oracle spatial dbms, Computers & Geosciences 51(0):430— 438 Sellis TK, Roussopoulos N, ‘ulti-dimensional objects Sharkawi KH, Ujang MU, Abdul-Rahman A (2008) 3D navigation system for vis tual reality based on 3D game engine. In: The International Archives of the Pho- togrammetry, Remote Sensing and Spatial Information Sciences, 2008, ISPRS. XXXVI, pp 513-518, Song Xy, Zhow Xw, Wang Yh (2006) Research on spatial index structure of hybrid twee in 3D GIS. Journal of Shenyang Jianzhu University (Natural Seience) 3:030 ‘Transactions con/scie! loutsos C (1987) The R+-Tree: A dynamic index for Review of Spatial Indexing Techniques for Large Urban Data Management 2 ‘Tamminen M (1982) The extendible cell method for closest point problems. BIT ‘Numerical Mathematics 22(1):27-41 ‘Thiele R, Wos L (2002) Hilber’s twenty-fourth problem, Journal of Automated Reasoning 29(1):67-89 ‘Tobler W, Chen Zt (1986) A quadtree for global information storage. Geographical “Analysis 18(4):360-371 ‘Ujang U, Abdul-Rahman A (2013) Teraporal Three-Dimensional Ontology for Geo- graphical Information Science (GIS) - A Review: Journal of Geographic Informa- tion System 5(3):314-323, DOT 10.4236/jgis.2013.53030, URL http://www. scirp.org/ journal /PaperDown load. aspx?paperID=33280 Wang Y, Guo M (2012) An integrated spatial indexing of huge point im- age model. ISPRS - International Archives of the Photogrammetry, Re- ‘mote Sensing and Spatial Information Sciences XXXIX-B4:397-401. DOI 10.5194/sprsarchives-XXXIX-B4-397-2012, URL attp://www. int~arch-photogranm-remote~sens-spatial-inf-scei.net/ XXXIX-B4/397/2012/ Wang Y, Sheng ¥, Zhou L, Guo F, Zhao L. (2010) An underground space object- oriented three-dimensional hybrid spatial indexing method, In: 18th International Conference on Geoinformatics, 2010, IEEE, 1, pp I-S ‘Zhang J, You S, Gruenwald L (2011) Parallel quadtree coding of large-scale raster geospatial data on gpepus, In: Proceedings of the 19th ACM SIGSPATIAL In- ternational Conference on Advances in Geographic Information Systems, ACM, New York, NY, USA, GIS "11, pp 457-460, DOT 10.1145/2093973,2094047, URL ht

You might also like