Bloom Filter Based Index For Query Over Encrypted Character Strings in Database
Bloom Filter Based Index For Query Over Encrypted Character Strings in Database
304
into elements. In this way, we minimize the number of String s=“ a 1a2… an”
elements in subset u and r.
Subset r is most often used, so it is important to get Triple Construction
a proper r for a character string. We split the string in
succession. For adjacency characters with the same
length of len will be taken as an element of the set r,
∑ length(wi ) − len + 1 . The
compress u &r with
and the size of r is Nr= Bloom Filter Alogrithm
wi ∈w m-bits array filled with
bigger len is, the better relationship of characters is ‘ 0’ or‘ 1’
expressed, but size of len restricts the match pattern, in Turn into numeric
other words, we can’t execute a query with a match
pattern smaller than len. When len is 1, u=r, we can get m
305
⎛ 1⎞
n
−n
function has f = 1 − ⎜1 − ⎟ ≈ 1 − e m , k = 1 . The
more k gets close to (ln 2) ⋅ m ( n ) , the smaller of false
⎝ m⎠ positive probability gets. As we also see from the two
( )
more (ln 2) ⋅ m is close to 1, the better of the false
n
graphs, the length of character string in PART is
smaller than that in ORDERS; the false positive
positive probability is. So if n is large, a larger m will proportion is smaller given the same query condition
be needed, or else the false positive probability will be including the size of m, k and match string.
high. The second group experiments, we compare our
method to other methods. Figure6 is query about
5. The experiment and the analysis ORDER.o_comment, four points on the x-axis from left
to right in turn representing ‘request’, ‘unusual’,
Purpose of our experiments is to verify the effect of ‘special’ and ‘furiously’. We compare the method of
the bloom filter parameters and to test the query pairs coding and bloom filter when k=1, which shows
performance. According to TPC-H benchmark, the the method of splitting a string into words is effective.
database is automatically created at scale=1 by Also we compare our method to extended pairs coding
utilizing the tool of Benchmark Factory for Database, method and traditional method. The experiment is done
we use table ORDERS and PART as experimental data on column PART.p_container and six points
source. The encryption algorithm is DES, length of the repectively represent ‘BAG, ‘BOX, ‘PKG’ , ‘CAN,
Key is 128; programming language is Java; the ‘PACK’ and ‘DRUM’, the result shows the
environments are windows XP, P4 2.66 CPU, 1G improvement of the query performance as shown in
RAM and database is Oracle 10g. Figure7.
In first group experiments, we test how does the effect of m,k=1 %request%
different bloom filter parameters, including the length 1 %unusual%
of index m, number of hash functions k, length of the 0.9 %special%
false query proportion
recorded. 0.8
We verify the relation between f Q and m; it is 0.7
0.6
executed on the o_comment column of ORDERS table, 0.5
as we can see the f Q get smaller when the m gets larger 0.4 %request%
0.3
as shown in Fig3. 0.2
%unusual%
Fig4 is about query on the encrypted table of %special%
0.1
%furiously%
ORDERS, and the encrypted column is o_comment, we 0
evaluate the average length of this column less than 50, k=1 k=2 k=3 k=4
we can see the false query probability is the smallest
when k=1. Figure5 is about query over PART table, Figure 4. The effect of k to false positive
and the p_container column is encrypted, which has proportion (1)
average length less than 10, we can see that, when k=2,
f Q approaches zero. Validate the conclusion that the
306
0.7 according to different length of sensitive data to get the
the effect of k,m=31,part.p_container
0.6
minimal false positive probability. Experimental
results show the performance improved compared with
false query proportion
K=1
0.5
K=2
conventional queries and the pairs coding method.
0.4 K=3
0.3
K=4 References
0.2
[1] Gang Chen, Ke Chen and J. X. Dong, “A Database
0.1 Encryption Scheme for Enhanced Security and Easy
Sharing”, Proceedings of the 10th international
0
1 2 3 4 5 6
Conference on Computer Supported Cooperative Work
like %string% in Design, Nanjing, China, 2006, pp.1-6.
[2] H. Hacigumus, B. Iyer and S. Mehrotra, “Providing
Figure 5. The effect of k to false positive Database as a Service”, Proceedings of the International
proportion (2) Conference on Data Engineering (ICDE), San Jose,
USA, 2002, pp. 29-38.
k=1,m=31 bloom filter [3] H. Hacigumus, B. Iyer, Chen Li and S. Mehrotra,
0.9
pairs coding
“Executing SQL over encrypted data in the database
0.8 service provider model”, In ACM SIGMOD Conference,
0.7 New Jork, USA, 2002, pp. 216-227.
false query proportion
0.8
0.7 [8] Hong Zhu, Jing Cheng and Renchao Jin, “Execution
0.6 Query over Encrypted Character Strings in Databases”,
0.5 Frontier of Computer Science and Technology, Wuhan,
0.4 China, pp. 90-97, 2007.
0.3 [9] Yasuhiro Ohtaki, “Partial Disclosure of Searchable
0.2 Encrypted Data with Support for Boolean Queries”,
0.1 Availability, Reliability and Security (ARES08),
0 Barcelona, Spain, 2008, pp. 1083-1090.
1 2part.container
3 like4 %string%5 6
[10] Yong Zhang, Wei-xin Li and Xia-mu Niu, “A Method
of Bucket Index over Encrypted Character Data in
Figure 7. Comparison of different index methods Database”, Intelligent Information Hiding and
Multimedia Signal Processing, Kaohsiung, 2007, pp.
186-189.
6. Conclusion [11] Jehoshua Bruck, Jie Gao and Anxiao Jiang, “Weighted
Bloom Filter”, Information Theory, Seattle, WA, 2006,
We have proposed a bloom filter based index to pp. 2304-2308.
support the fuzzy query over encrypted character [12] Michael Mitzenmacher, “Compressed Bloom Filter”,
string. Firstly, which make the query over the whole IEEE/ACM Transactions on Networking, Vol. 10, No. 5,
encrypted database performed with a small 2002, pp. 604-612.
computation cost. Thirdly, we analyzed the parameters
of bloom filter, and determined the parameters
307