6.database-Oriented Sotrage Based On LMDB and Linear Octree For Massive Block Model
6.database-Oriented Sotrage Based On LMDB and Linear Octree For Massive Block Model
Abstract: Data organization requires high efficiency for large amount of data applied in the digital mine system. A new method of
storing massive data of block model is proposed to meet the characteristics of the database, including ACID-compliant, concurrency
support, data sharing, and efficient access. Each block model is organized by linear octree, stored in LMDB (lightning
memory-mapped database). Geological attribute can be queried at any point of 3D space by comparison algorithm of location code
and conversion algorithm from address code of geometry space to location code of storage. The performance and robustness of
querying geological attribute at 3D spatial region are enhanced greatly by the transformation from 3D to 2D and the method of 2D
grid scanning to screen the inner and outer points. Experimental results showed that this method can access the massive data of block
model, meeting the database characteristics. The method with LMDB is at least 3 times faster than that with etree, especially when it
is used to read. In addition, the larger the amount of data is processed, the more efficient the method would be.
Key words: block model; linear octree; lightning memory-mapped database; mass data access; digital mine; etree
Foundation item: Projects (41572317, 51374242) supported by the National Natural Science Foundation of China; Project (2015CX005) supported by the
Innovation Driven Plan of Central South University, China
Corresponding author: Lin BI; Tel: +86-731-88877665; E-mail: [email protected]
DOI: 10.1016/S1003-6326(16)64377-7
Lin BI, et al/Trans. Nonferrous Met. Soc. China 26(2016) 2462−2468 2463
The size of the node at k level is 2(31−k) basic units. A
block model with 13 bytes as the location code is
sufficient to express any scale of mine and meets the
accuracy requirements. For example, for a mine of
1000 km, the basic unit length can be 0.5 mm.
Fig. 7 Preorder traversal of octree: (a) Comparison algorithm diagram; (b) Preorder traversal; (c) Location code
2466 Lin BI, et al/Trans. Nonferrous Met. Soc. China 26(2016) 2462−2468
robustness of the polyhedron rasterization are important
factors. The important technology of discretization is the 5 Experiments and analysis
fast and accurate judgment of the relation between the
points and the polyhedron. BI et al [4] optimized the ray The storage method was realized by VC++, running
method to judge the internal and external relations with on windows7 64 bits OS, CPU was Intel (R) Core (TM)
the OBB tree and node intersection test, and the i5-4750 @ 3.20 GHZ CPU, 4G memory, hard drives are
performance has improved, but the robustness is poor. HDD, 7200 RPM. The scope of one mining was
JING et al [20] improve the robustness with Feito− (3450.0 m, 3862.0 m, 812.0 m), its structure model was
Torres method, but performance has declined. LI shown in Fig. 8(a) and its block model was shown in
et al [21] used flood-fill to to reduce the time complexity Fig. 8(c). The query region was shown in Fig. 8(b), and
to O(n); however, a large number of triangle intersection its size was (1361.0 m, 1462.0 m, 715.0 m), and its block
operations need to be carried out. For solving those model was shown in Fig. 8(d). A comparative analysis
problems, the decision problem of point in polyhedron is was carried out between LMDB and etree according to
transformed into a point in polygon in which the 3D different three scenarios: the same fields but different
problem is transformed into 2D problems. As a result, number of blocks, the same number of blocks but
the problem is greatly simplified. The ray method is a different fields and different key-comparator functions.
common method to judge the relationship between the
5.1 Experiment 1: Same fields but different numbers
point and polygon [22]. Based on the idea of ray method,
of blocks
we propose a new method, 2D grid scanning to screen
The fields remained unchanged, 3 doubles and one
the inner or outer points. The process is as follows:
string of 32 bytes, as well as the number of blocks was
1) Rasterize the target of 3D space based on the unit
increased by 8 times (the level of octree were
size of block model and create 3D grid;
incremented by 1). The two storage methods was
2) Cut the polyhedron with plane through the center
compared by analyzing the data size, consuming time to
of each layer and create 2D contour lines;
create, consuming time to query, and so on. The results
3) Screen the inner or outer of pixels for every layer
were shown in Table 1.
by the raster scan method.
As can be seen, LMDB occupies more storage space
Among them, the raster scanning method is very
than etree. The reason is that LMDB can save
important. Assuming the row, column, layer of the 3D
variable-length records, and needs to hold the length of
grid is N (N=2i, i is a non-zero positive integer), its
key and value. Nonetheless, the feature of variable-
process is as follows:
length key and variable-length value is lamb’s advantage
1) Initialize all elements “outside” mark;
that can be used to store data collection of multiple data
2) Produce a ray Rxj from x=xj parallel to the y-axis; types. LMDB is much faster than etree, and the more the
xj is x-component of coordinate of column j, number of blocks is, the more obvious the tendency is.
3) Compare xj and x-component (x1, x2) of all of the The main reason is that the entire database of LMDB is
end points of contour lines on k layer. There is an mapped into virtual memory and all data fetched are
intersection between line segment and Rxj, if x1<xj<x2 performed via direct access to the mapped memory
(without loss of generality, assuming x1<x2); instead of through intermediate buffers and copies. In
4) Calculate the y-component value of the addition, the screening is time-used less and does not
intersection using formula (2): increase dramatically as the number of blocks increases
y=y1+(y2−y1)×((x−x1)/(x2−x1)) (2) because of the raster scanning method.
Find all the intersection and calculate all the 5.2 Experiment 2: Same number of blocks but
y-component values sorted in ascending; different fields
5) Make “intersection pairs” for the y-component In this experiment, the fields of different lengths
values as (y0, y1), (y2, y3), …, (yc−2, yc−1): (yi, i=0, 1, …, (56, 112, 168 bytes) were adopted and the number of
c−1) (c is number of intersections)); blocks model remained unchanged. We performed the
6) Set “internal” flag for the elements which center same comparative analysis as above. The results are
is located between one “intersection pairs”; shown in Table 2.
7) Execute (2) until all layers are processed. The results showed that the time-consuming of
As can be seen from the above, the algorithm does reading and writing increased with the increase of the
not have to calculate the intersection, just make a simple amount of data. The time-consuming of screening
judgment. As a result, the performance and robustness of remained about the same because of screening is only
the algorithm are improved greatly. related to number of block to be queried.
Lin BI, et al/Trans. Nonferrous Met. Soc. China 26(2016) 2462−2468 2467
Fig. 8 Mining and query region: (a) Structure model of ore; (b) Structure model of query region; (c) Block model of ore; (d) Block
model of query region
Table 1 Results of same fields but different numbers of blocks octants of an octree in increasing location code and the
Number of DB size/ Time consumed/s order of octants is exactly the same as that processed by
DB the preorder traversal of the octree, and the preorder
block MB Create Query Screen
traversal of the octree can track “z” pattern (z-order)
Etree 4.169 0.022 0.011
42517 0.637 [23,24], as shown in Fig. 9, and z curve has a good space
LMDB 4.968 0.033 0.005 accumulation [25].
Etree 32.689 0.549 0.184
339265 0.687
LMDB 39.136 0.316 0.061 Table 3 Result of different key-comparator functions
Etree 261.421 7.691 2.373 Comparison Time consumed/s
2715545 0.928 DB DB size/MB
LMDB 312.968 3.654 0.601 type Create Query
Note: N of block, the number of block; DB, the storage methods Etree 913.009 20.374 4.176
Memcmp
LMDB 942.449 26.750 0.840
Table 2 Results of same number of blocks but different fields
Etree 913.009 12.319 0.834
Field DB size/ Time consumed/s Integer-wise
DB LMDB 942.449 15.275 0.272
length/byte MB Create Query Screen
Etree 261.421 7.813 2.404
56 0.890
LMDB 312.968 3.772 0.602
Etree 484.305 12.338 3.327
112 0.887
LMDB 540.600 10.174 0.679
Etree 702.705 15.858 4.217
168 0.889
LMDB 774.476 13.213 0.749
摘 要:为满足数字矿山系统应用中对海量数据高效存取的技术要求,提出一种块段模型海量数据存储新方法。
该存储技术满足数据库的特点:ACID 兼容、并发支持、数据共享及高效访问;采用线性八叉树的方法组织块段
模型,并存将其储于 LMDB(快速内存映射数据库)中;通过定位码比较算法及从几何空间地址码到存储空间定位
码的转换算法,可高效地对三维空间任意点的地质属性进行查询;采用三维到二维的转换及内外点二维网格扫描
筛选法,使地质属性查询的三维问题转化为二维问题,其性能和鲁棒性得到显著提高。实验结果表明,这种方法
能够高效存取块段模型海量数据,并满足数据库的特点;相比于采用 etree 方法,采用 LMDB 方法至少快 3 倍,
特别是在读取数据时效率更高,且数据量越大,效果越明显。
关键词:块段模型;线性八叉树;快速内存映射数据库(LMDB);海量数据访问;数字矿山;etree
(Edited by Yun-bin HE)