2018 Camaleao Busca Nuvem
2018 Camaleao Busca Nuvem
2018 Camaleao Busca Nuvem
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2810297, IEEE Access
Abstract—Searchable symmetric encryption (SSE) is a widely search functionality, hindering the effective utilization of the
popular cryptographic technique that supports the search func- outsourced data and degrading the service experience.
tionality over encrypted data on the cloud. Despite the usefulness, To address the dilemma of data privacy and data utilization,
however, most of existing SSE schemes leak the search pattern,
from which an adversary is able to tell whether two queries are the cryptographic technique of secure searchable encryption
for the same keyword. In recent years, it has been shown that has been proposed and has been widely studied in the literature
the search pattern leakage can be exploited to launch attacks to [3], which enables the client to perform search over the out-
compromise the confidentiality of the client’s queried keywords. sourced encrypted data. Secure searchable encryption schemes
In this paper, we present a new SSE scheme which enables can be either symmetric-key-based or public-key-based. Com-
the client to search encrypted cloud data without disclosing
the search pattern. Our scheme uniquely bridges together the pared with public-key-based searchable encryption, searchable
advanced cryptographic techniques of chameleon hashing and symmetric encryption (SSE) which builds encrypted index
indistinguishability obfuscation. In our scheme, the secure search presents much more practical cost efficiency [4], and has
tokens for plaintext keywords are generated in a randomized attracted wide attention in recent years (e.g., [5]–[7], to just
manner, so it is infeasible to tell whether the underlying plaintext list a few).
keywords are the same given two secure search tokens. In this
way, our scheme well avoids using deterministic secure search However, most of existing SSE schemes are typically built
tokens, which is the root cause of the search pattern leakage. We with security trade-offs of access pattern leakage and search
provide rigorous security proofs to justify the security strengths pattern leakage. Roughly speaking, the access pattern refers
of our scheme. In addition, we also conduct extensive experiments to the search result which indicates which files contain the
to demonstrate the performance. Although our scheme for the queried keyword, while the search pattern reveals whether
time being is not immediately applicable due to the current
inefficiency of indistinguishability obfuscation, we are aware that two queries are for the same keyword. Unfortunately, in
research endeavors on making indistinguishability obfuscation recent years, it has been shown that the leakages of access
practical is actively ongoing and the practical efficiency im- pattern and search pattern can be exploited to compromise the
provement of indistinguishability obfuscation will directly lead confidentiality of the outsourced dataset and queried keywords
to the applicability of our scheme. Our work is a new attempt (e.g., [8]–[12], to just list a few). Therefore, it is of critical
that pushes forward the research on SSE with concealed search
pattern. importance to address the access pattern leakage and search
pattern leakage when SSE is used for encrypted search.
Index Terms—Searchable symmetric encryption, cloud com- While the access pattern leakage can be well mitigated via
puting, search pattern leakage, chameleon hashing, indistin-
guishability obfuscation. introducing dummy data points to the dataset and encrypted
index, hiding the search pattern is more challenging and little
work has been done before. Prior works act as valuable data
I. I NTRODUCTION points in the design space of hiding the search pattern in SSE,
Nowadays it is widely popular to outsource data stor- yet they require heuristic parameter tuning for security [12], or
age to cloud services such as Google Drive, Dropbox, and suffer from security issues [13], or work under a multi-cloud
more. Despite the well-understood benefits, however, data architecture [14]. Detailed discussion will be given in Section
outsourcing to the cloud also naturally raises critical privacy VI. To the best of our knowledge, hiding the search pattern in
concerns [1]. Indeed data breaches occur frequently in cloud SSE is still challenging and remains to be fully explored.
services [2]. For data protection, a plausible approach is to In this paper, we present a new SSE scheme which enables
encrypt the data before outsourcing. However, simply applying the client to search the encrypted cloud data without disclosing
data encryption will invalidate the fundamentally important the search pattern. As indicated by existing works, the search
pattern leakage essentially originates from that the secure
J. Yao is with the School of Electronic and Information Engineering, Xi’an search tokens for the queried keywords are generated in a
Jiaotong University, Xi’an 710049, China, and also with the Department
of Computer Science, City University of Hong Kong, Hong Kong (e-mail: deterministic manner. Hence, our first main idea is to generate
[email protected]). the secure search tokens in a randomized way. This means
Y. Zheng and C. Wang are with the Department of Computer Science, City that the secure search tokens are randomized ones even for
University of Hong Kong, Hong Kong, China, and with the City University
of Hong Kong Shenzhen Research Institute, Shenzhen 518057, China (e-mail: the same keyword at different queries. Consequently, given
[email protected], [email protected]). two secure search tokens, the cloud is not able to tell whether
X. Gui is with the School of Electronic and Information Engineering, Xi’an their underlying plaintext keywords are the same. With this
Jiaotong University, Xi’an 710049, China (e-mail: [email protected]).
Corresponding authors: C. Wang ([email protected]); X. Gui (xl- main idea, the challenge is then how to ensure that we can
[email protected]). still get the correct search result when secure randomized
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2810297, IEEE Access
search tokens are used for encrypted search. To tackle this file f contains the keyword w. Given a file collection F , we
challenge, our idea is to devise a secure mapping mechanism order the files to uniquely identify a file by specifying fi for
which should be able to securely map the randomized secure i ∈ [1, n] and order the total keywords W by specifying wi for
search tokens to the corresponding deterministic versions that i ∈ [1, m] after removing the duplicate keywords. For the i-th
can be used to correctly search the encrypted index. query, we denote Tj as the encryption of the queried keyword
Specifically, our scheme uniquely bridges together the ad- wij . Here wij means that the queried keyword at the i-th query
vanced cryptographic primitives of chameleon hashing [15] is the j-th one in W . Therefore, the set of tokens generated
and indistinguishabilty obfuscation (iO) [16], [17]. At a high over a period of time is denoted as T = {T1 , · · · , Tq }. The
level, we mainly rely on the chameleon hashing technique search result for a keyword w is denoted as C (w), which
for the generation of secure randomized search tokens. Then, refers to a set of ciphertexts C of files that contain the query
on the cloud side, we rely on the iO technique to securely keyword w.
map the randomized tokens to deterministic ones for correct
encrypted search. In our scheme it is assured that the cloud B. Chameleon Hashing
server is oblivious to both the mapping procedure and the
A chameleon hashing function is a collision-resistant hash-
search procedure. This means that the cloud server is not
ing function with a key pair (sk, hk) [15]. The key hk is used
able to obtain the deterministic tokens mapped from the
to compute the hash value for a message, while the key sk
randomized tokens, and also does not observe which entries of
can be used to find a collision for that message. Formally,
the encrypted index are accessed during search. Note that the
a chameleon hashing function consists of the following algo-
search pattern may also be leaked if the cloud can observe
rithms:
which entries of the index have been accessed over time
P aramGen(1λ ): This algorithm takes as input a security
[12]. Therefore, our scheme achieves strong protection for the
parameter λ and outputs the system parameters SP .
search pattern. In the end, the cloud obtains the search result
KGen SP : This algorithm takes as input the system
and returns it to the client. We provide formal security proofs
parameters SP and outputs a key pair (sk, hk).
to rigorously justify the security guarantees of our scheme.
CHH hk, m, r : This algorithm takes as input the key hk,
In addition, we conduct extensive experiments to demonstrate
a message m, and a random integer r, and outputs the hash
the performance. It is shown that the generation of secure
value h = CHH (hk, m,r).
randomized search tokens in our proposed scheme is very
CHF sk, m1 , r1 , m2 : This algorithm takes as input the
efficient, just tens of milliseconds. In our experiments, we
secret key sk, a message m1 , a random integer r1 , and another
also demonstrate the performance of iO. Although currently
message m2 , and outputs another integer r2 that makes the
the iO technique is inefficient, related research endeavors on
following equation hold:
practical iO are actively ongoing. The performance of our
scheme relies on the underlying cryptographic technique iO.
CHH (hk, m1 , r1 ) = CHH (hk, m2 , r2 )
So, the practical performance improvement of iO will directly
lead to the applicability of our scheme. A chameleon hashing function satisfies the following two
The remainder of this paper is organized as follows. Section properties:
II presents some preliminaries. Section III gives our problem • Collision resistance: Without the key sk, there exist no
statement. Section IV elaborates on the details of our scheme. efficient algorithms for finding collision.
Section V presents the experiment results. Section VI discusses • Semantic security: For all pairs of message m1 and
the related work. Section VII concludes the whole paper. m2 , the random hash values CHH (hk, m1 , r1 ) and
CHH (hk, m2 , r2 ) are computationally indistinguishable.
II. P RELIMINARIES
A. Notations C. Indistinguishability Obfuscation
th
a matrix A,th Ai denotes theth i row in A, and
Given An indistinguishability obfuscator for a circuit class {Cirλ }
Ai,j denotes the j element in the i row of A. Let |A| is a PPT uniform algorithm satisfying the following conditions
denote the number of elements in the matrix A. Similarly, [16], [17]:
given a vector V , Vi denotes ith element in V , and |V | denotes • iO (λ, Cir) preserves the functionality of Cir. That is,
0
number of elements in the vector V . The concatenation of a for any Cir ∈ Cirλ , if we compute Cir = iO (λ, Cir)
0
string a and a string b is denoted as a k b. Let λ be a security , then Cir (x) = Cir (x) for all inputs x.
parameter. We say that a function υ : N → N is negligible in • For any λ and any two circuits Cir0 , Cir1 ∈ Cirλ
λ, if for any positive polynomial p (·) with sufficiently large with the same functionality, the circuits iO (λ, Cir) and
1 0
λ, υ (λ) < p(λ) . We take negl (λ) as a negligible function iO λ, Cir are indistinguishable.
in λ. Given a subset S ⊆ χ and a permutation ϕ : χ → χ,
ϕ (S) = {ϕ (i) |i ∈ S} means each element i ∈ S is replaced
III. P ROBLEM S TATEMENT
by ϕ (i).
In addition, we introduce some notations related with A. Searchable Symmetric Encryption Definition
searchable symmetric encryption. We informally treat a file as A SSE scheme is a collection of six polynomial-time
a set of keywords. Therefore, we write w ∈ f to denote that a algorithms:
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2810297, IEEE Access
IndexGen K, F : This index building algorithm takes as 1.Encrypted files + Encrypted index Cloud Server
input the secret key material K and the file collection F , and
index matrix γ.
outputs an inverted
2. Encrypted search token
3. Search
T oken K, w : This token generation algorithm takes as Client program
input the secret key material K and the query keyword w, 4. Search result
and outputs a search token T.
Search T, γ, EDB : This search algorithm takes as input
a search token T , the encrypted index γ, and the encrypted Fig. 1. Basic paradigm of applying SSE to encrypted search.
database EDB, and outputs all the encrypted files containing
the query keyword, i.e., C (w).
B. System Model
Enc K, F : This encryption algorithm takes as input the
secret key material K, the file collection F = {f1 , · · · , fn }, The basic paradigm of applying SSE to enable search over
and outputs an encrypted database EDB = {i, ci }, where encrypted data is shown in Fig. 1. Typically there are two
i ∈ [1, n], and ci is the ciphertext of file fi . types of entities, client and cloud server. The client has a set
Dec K, ci : This decryption algorithm takes as input the of files to be outsourced to the cloud server. However, the
secret key material K, a ciphertext ci and outputs a file fi . client does not want the cloud server to know the content of
A SSE scheme is correct, if and only if for all key the files. Therefore, in a setup phase, the client encrypts the
material K output by KeyGen, encrypted index γ output by files and produces an encrypted index supporting search by
IndexGen, encrypted database EDB output by Enc, and any some secret key material K. Then, the client outsources the
w ∈ W , the following equation should hold: ciphertext collection C and an encrypted index γ to the cloud
server. To search the encrypted outsourced files, the client
Search (T oken (K, w) , γ, EDB) = C (w) generates search queries by calling a token generation function
and w ∈ / Dec (K, C) \Dec (K, C (w)) i.e., w should not be which takes as input the secret key material K and a keyword
contained in the files which are not contained in the search w, and outputs a search token T . Upon receiving the search
result C (w). token T , the cloud server conducts search based on the token
We now introduce the leakage definitions in SSE. Before and the encrypted index γ, and returns the search result to
giving the leakage definition, we introduce some auxiliary the client, without knowing any information about either the
notions which are used to define the leakages. We first define content of files or the queried keyword. So, after search, the
a history which records the interaction between the client and client receives a collection of encrypted files which contain
the cloud server. Since the entity in each interaction phase the queried keyword w, and performs decryption to obtain the
mainly includes a file collection and a sequence of keywords matched files.
submitted to T oken and both of them need to be concealed,
we use both of them to form the history. C. Threat Model
Definition 1. (History) Let F be a file collection. A q-query Consistent with most of existing works on encrypted search
history over F is a tuple H = (F, w) which indicates the file (e.g., [5], [18], [19] to just list a few), we consider the
collection with a q-query word vector w = wi1 , · · · , wiq cloud server as a semi-honest adversary. This means that the
cloud server honestly follows the designated operations in
Definition 2. (Access pattern) Let F be a file collection. The
SSE, yet tries to infer the private file content and queried
access pattern is a tuple AP (H) = F (wi1 ) , · · · , F wiq
keywords of the client, based on what it observes along
based on the history H = (F, w).
the workflow. In particular, the cloud server may exploit the
Definition 3. (Search pattern) Let F be a file collection. The access pattern leakage and search pattern leakage which are
search pattern, based on a q-query history H = (F, w), is inherent in most of existing SSE constructions. Considering
a symmetric binary matrix SP (H). For 1 ≤ i, j ≤ q, the that attacks based on the access pattern leakage can be well
element in the ith row and j th column is 1, if wi = wj , and mitigated by directly introducing dummy data points [9], we
0, otherwise. will focus on the threat posed by exploiting the search pattern
leakage. In particular, the adversary observes a set of q tokens
Let F be a file collection. We define the leakage function L
T = {T1 , · · · , Tq } submitted by the client, and attempts to
that describes the leakages in building the index and searching
recover the queried keywords. It has been shown in prior works
over index based on the history H = (F, w).
[12] that if the adversary observes search tokens generated
Definition 4. (Leakages) L1 (γ, F ): With as input the index through some deterministic algorithm, the adversary is able to
γ and the file collection F , the leakage function outputs the recover the plaintext keywords based on auxiliary information
|W |, |F | and |fi |. L2 (γ, C, T ): With as input the encrypted like public query logs. Therefore, we aim to deliver a new
index γ, the ciphertext collection C and a sequence of tokens SSE construction in which tokens are not generated in a
T , the leakage function outputs the search pattern SP (H) deterministic way, so that the search pattern can hardly be
and the access pattern AP (H). exploitable for potential attacks.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2810297, IEEE Access
for securely mapping the randomized tokens to deterministic 14: δi = Enc (CHH (hk, G (k2 , wi ) , ri ), δi )
15: end for
ones at the cloud side for encrypted search. In particular, our n 0
o
16: γi = πϕ(i) , δϕ(i)
construction mainly includes two phases. The first phase is
about how to build an encrypted searchable index, and the
second phase is about how to generate the tokens at the client
side and perform search at the cloud side. Note that like most
the client first looks up the local memory to find the ran-
of existing searchable encryption designs (e.g., [5], [18], [19]),
dom number ri , and uses it to generate two new random
the encryption of files can be done by any standard encryption
numbers ri1 = CHF (sk, G (k1 , wi ) , ri , G (k1 , wi k cti )) and
technique, e.g., AES, which is independent to the building of
ri2 = CHF (sk, G (k2 , wi ) , ri , G (k2 , wi k cti )) by using the
the encrypted index and the search procedure. So, we do not
chameleon collision-finding function, where sk is a secret key
explicitly handle file encryption in our construction. In what
for chameleon hash function and wi k cti is the concatenation
follows, we give the details of our construction.
of keyword wi with a counter denoting its searching frequency.
Then, the client forms the search token for the query key-
A. The Proposed Scheme word wi as (t1 , ri1 , t2 , ri2 ), where t1 = G (k1 , wi k cti ) and
Phase 1. Algorithm 1 presents the details of building the t2 = G (k2 , wi k cti ).
encrypted index. For practical consideration, we follow the Upon receiving the search token, the cloud server then
framework in [8] to build an encrypted index that supports performs search based on the encrypted searchable index and
keyword search with sub-linear search time. Specifically, we a iO program. The iO program is used for obfuscating the
take each keyword as the primary key and associate each search procedure. In particular, it is used to (i) securely map
keyword with files by a binary vector. Each element in the random search token to the form that can be matched
binary vector represents the relationship between a keyword against the encrypted index, and (ii) hide which entry of the
and a file. We use “1” to represent that the file contains encrypted index is accessed during search. Throughout the
the keyword, and “0” otherwise. For each keyword wi , the search procedure, the cloud server is oblivious to the recovered
client encrypts it as πi = CHH (hk, G (k1 , wi ) , ri ), where token and also the entry of index that has been accessed. Note
k1 is the private key for a pseudorandom funcion G (·) and that the search pattern may also be leaked if the cloud server
ri is a random number. Then, for each binary vector δi can observe which entries of the index have been accessed over
associated with the keyword wi , the client encrypts it as time [12]. Therefore, our design achieves strong protection for
0
δi = Enc (CHH (hk, G (k2 , wi ) , ri ), δi ), where k2 is also the search pattern.
a secret key and and Enc(·) is a symmetric-key encryption Algorithm 3 presents the details in searching encrypted data
algorithm. At last, for strong protection, the client applies a at the cloud side. The cloud server calls the iO program,
permutation function to the rows in the encrypted index. In and obtains the search result. Inside the iO program, the
addition, the client also builds a iO program for later use in key hk is embedded as a constant in advance by the client.
the search phase. This iO program first uses the chameleon hash function to
Phase 2. In this phase, given a query keyword wi , the revert the hash value HV1 = CHH hk, t1 , ri1 and hash value
client generates the search token and sends it to the cloud HV2 = CHH hk, t2 , ri2 . Here, HV1 = CHH hk, t1 , ri1
for encrypted search. In order to generate randomized to- is the recovered token which is able to be matched against
kens even for the same keyword in each query, we take the encrypted index. Recall that the encrypted version of
concatenation of the query keyword and a counter ct repre- the keyword in the index is wi is CHH (hk, G (k1 , wi ) , ri ),
senting the search frequency of the keyword as the input in and CHH (hk, G (k1 , wi ) , ri ) = CHH hk, t1 , ri1 based on
token generation algorithm. Algorithm 2 presents the details the property of the chameleon hash function. Therefore, the
in token generation. Given the keyword wi to be queried, recovered token HV1 = CHH hk, t1 , ri1 can locate the right
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2810297, IEEE Access
Algorithm 2 Generating secure search token secure search. Given the leakage function L, we define the
Input: Secret keys: (k1 , k2 , sk); Query keyword: wi ; Counter: following experiments with an adversary A and a simulator
cti . S.
Output: Secure search token: T . RealA (λ): The client runs KGen to generate the private
1: t1 = G (k1 , wi k cti ) keys. The adversary A selects a set of files, and asks the client
2: ri1 = CHF (sk, G (k1 , wi ) , ri , G (k1 , wi k cti )) to generate the index, the iO program, and the ciphertexts via
3: t2 = G (k2 , wi k cti ) the algorithm Build. Then A performs a polynomial number
4: ri2 = CHF (sk, G(k2 , wi ) , ri , G (k2 , wi k cti )) of adaptive q queries, and asks the client for the secure search
5: T = t1 , ri1 , t2 , ri2 tokens and the resulting file ciphertexts via the algorithm
Search. Finally, A produces a bit as the output.
Algorithm 3 Searching encrypted data at the cloud side SimA,S (λ): The adversary A selects a set of files, and S
simulates an index, an iO program, and the ciphertexts for
Input: Token: T = t1 , ri1 , t2 , ri2 ; Index: γ. A based on L1 . Then, A performs a polynomial number of
Output: Search result: C (wi ). adaptive q queries. From L2 , S returns simulated tokens and
1: Execute a iO program with inputs T and γ
file ciphertexts. Finally, A produces a bit as the output.
iO program (with constant hk): We say that Π is L-secure against adaptive chosen keyword
2: HV1 = CHH hk, t1 , ri1 attacks if for all polynomial-time adversaries A, there exists a
3: HV2 = CHH hk, t2 , ri2 simulator S such that | Pr[RealA (λ) = 1] − Pr[SimA,S (λ) =
4: if πt = HV1 then
1]| ≤ negl(λ), where negl(λ) is a negligible function in λ.
0
5: Output Dec HV2 , δt . We now prove that our proposed scheme is secure against
6: else adaptive chosen keywords attacks with respect to the charac-
7: Output ⊥. terized leakages.
8: end if
Theorem 1. The proposed encrypted search scheme Π is L-
secure against against adaptive chosen-keyword attacks, if ϕ
entry in the index. Subsequently, HV2 = CHH hk, t2 , ri2 is
is a secure pseudorandom permutation, CH is a secure hash
0 function, iOSearch is a secure indistinguishability obfuscator,
used to decrypt the corresponding δt . Finally, the iO program
and Enc is a PRF-based encryption.
outputs the recovered file identifiers and the cloud server
returns the corresponding file ciphertexts to the client. Proof. Given the leakage L1 , the simulator S generates the
Note that the security of iO ensures that the cloud server is simulated index γ 0 , which is indistinguishable from the real
oblivious to the search procedure and the embedded key hk index γ. In particular, the real index γ and the simulated index
is also kept confidential. So, the cloud server only obtains the γ have the same size. The bit length of a real index entry and
search result through the search procedure. Recall that leakage simulated one is the same. However, S generates random bit
from the search result is access pattern leakage and can be well strings for each entry. For the file ciphertexts, S also generates
mitigated by directly introducing dummy data points [9]. And random bit strings with the same size as the file ciphertexts.
our focus is on defending against search pattern leakage. In addition, S generates the iO program iOSearch0 , which is
indistinguishable from iOSearch.
Now, we need to show how to simulate q adaptive queries
B. Security Analysis
{Qi }qi=1 made by the adversary. For each query Qi , the simu-
We now provide formal proofs to show the security of our lator responds with the simulated search token T ∗ , where T ∗
scheme. In particular, we follow the security framework in is a 4-tuple of random bit strings, i.e., T ∗ = (t∗1 , r1∗ , t∗1 , r2∗ ).
prior works [12], [20] for analysis. The security framework Then, S operates the chameleon hashing function CH (inside
proposed therein is based on a leakage function L. While the iOSearch0 program) as a random oracle to first ran-
revealing the leakage function to an adversary, the security domly point at an entry of the simulated index, so HV1 =
framework ensures that the adversary should not learn any CHH hk ∗ , t∗1 , r1∗ . Meanwhile, the simulator sets HV2 to
further information beyond the leakage function itself when CHH hk ∗ , t∗2 , r2∗ . Then, S operates a random oracle so that
the adversary observes a sequence of tokens submitted adap- Dec(HV2 , δ 0 ) = δ, where δ is given by the leakage L2 . This
tively. Since our proposed construction doest not disclose the indicates the search result is identical to that from searching
search pattern, we modify the security framework to fit our the real index. We note that this can be achieved if the en-
design. The leakage function L = {L1 , L2 } in our proposed cryption algorithm Enc is instantiated as (r, H(HV2 ||r) ⊕ δ),
scehem is given as follows: where r is a random bit string and H is a random oracle.
This is actually a widely adopted technique for achieving the
L1 = (|W | , |F | , |fi |) adaptive security for SSE [19].
L2 = AP (H) The above analysis shows that the real index and the
simulated one, the real tokens and the simulated ones, and
Definition 5. Let Π = (KGen, Build, Search) be our secure the encrypted files and the simulated ciphertexts are computa-
index-based search scheme which uses chameleon hashing to tionally indistinguishable. Meanwhile, the search results from
generate secure search tokens and relies on iO for cloud-side the real index and the simulated one are identical, so they are
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2810297, IEEE Access
indistinguishable as well. Therefore, it can be concluded that index setup phase. For another, since we need to build the
the adversary is unable to distinguish the outputs of the real index iteratively (i.e., the result of inserting the keywords of
experiment and the ideal experiment. the first email file into the index will be used as a reference
for the insertion of the keywords of the second email file.)
V. E XPERIMENTS and Spark can store the intermediate results in the memory,
the setup phase can be accelerated by using Spark. We set
A. Experiments Setup
the cluster with 1 server as master and 3 servers as workers.
All experiments were conducted on a cluster and a desktop For index setup job, we set it with 9 executers and each
with 16 cores and 64 GB of memory running Unbantu version executer has 5 cores and 20G of memory. We use a pair of
16.04. The cluster contains 1 server as master and 3 servers as RowKey/Data to store the inverted index. “RowKey” stores the
slaves. Each slave has 40 cores and 400GB of memory running encrypted keyword and “value” in “Data” stores an encrypted
Linux version CentOS 7.3. We use python version 2.7.6, jdk “binary vector” ( i.e., the relationship between keyword with
version 1.8, spark version 2.1.0, HBase version 1.2.5, and file collection ). It first parses each word with space in file
Hadoop version 2.7.3. The dataset we use in our experiments is collection and used Stanford’s database to remove the stop
the Enron email dataset 1 , which is collected and prepared by word to establish the inverted index in bitmap style. Then,
the CALO Project. It contains 150 different users’ email files in order to generate secure index, we use AES encryption
including sent emails, contacts, delete items, etc. We choose with chameleon hash function to encrypt “RowKey” and
three sub-datasets from all the email files of all the users to generate the private key for symmetric encryption used
as the experiments datasets. The first dataset contains 10,000 to encrypt the “Data”. The construction of chameleon hash
documents. The second dataset contains 100,000 documents. we used is the one in [15], as shown in Algorithm 4 and
The third set contains all the documents in Enron dataset. 5. Since the client generates the token by CHF function
Hereafter, for ease of presentation, we will refer to these three which needs to take G (w) and its corresponding r as input
datasets as dataset I, dataset II, and dataset III, respectively. to compute the collision, we need to store the relationship
In the experiments, we develop a distributed storage system between G (wi ) and ri at the client side when building an
to implement our proposed scheme. Note that we implement index, where i = 1, · · · , m. It is worth noting that to reduce
the index setup phase on the cluster side, as the number of the storage size at the client side we use the same ri in both
the email files is huge and it is a bit hard to use desktop to CHH (G (k1 , wi ) , ri ) and CHH (G (k2 , wi ) , ri ). Therefore,
handle the full dataset which has 517, 401 files and 348,935 we only need to store the relationship between each keyword
keywords after removing the stop word. and ri . In order to accelerate searching for ri in token
generation phase, we use hash table to store their relationship.
B. System Implementation Last, it encrypts the “value” with a symmetric encryption
Storage system. Storage system is written in Java and is AES which takes the “binary vector” as the input message
implemented by HBase. We choose HBase for the following and CHH (G (k2 , wi ) , ri ) as the secret key. To prevent the
reasons. Firstly, it is too large to store the encrypted index adversary from looking up the hash value dictionary to get
and the Enron dataset centrally. HBase however partitions the the private key k1 , k2 , we use ”salt” in secret key generation
data into regions controlled by a cluster of RegionServer’s. for the above mentioned encryptions.
Secondly, when searching over the index, if we store index Token generation. Token generation is written in java with
centrally, it will take O (m) to do text matching operation. javax.crypto.Cipher to generate AES encrytion function and
When using HBase, however, it takes far less time than that jpbc library to generate cyclic group and runs on desktop to
to do text matching. This is mainly due to the addressing mode generate token. It firstly combines the submitted keyword wi
with 3 levels in HBase (i.e., the root table, the meta table, and with the retrieval times ct. Then, it uses AES to encrypt string
region) and the sorting method based on lexicographic order concatenation G (wi k ct) with different key k1 and k2 and
between records. Thirdly, when user submits multi-keyword searches the local hash table with pairs of (w, r) to find the
to cloud, HBase can deal with each token Simultaneously. ri corresponding to wi . And then, it uses CHF to forge the
File encryption/decryption. Since the security of file encryp- random ri1 with the private key x, the concatenation cipher
tion does not affect the security of the token, its security issue G (k1 , wi k ct) and a set of data (G (k1 , wi ) , ri ). Similarly,
is not within our scope. To highlight our important issue, we it also forges the random ri2 with the same private key x,
only use ECB encryption with PKCS7 padding mode in the the concatenation cipher G (k2 , wi k ct) and a set of data
jar of javax.crypto.Cipher to encrypt the files. (G (k2 , wi ) , ri ). Finally, it combines the output of step 2
Index setup. Index setup is written in Java with javax.crypto with G (k1 , wi k ct) and G (k2 , wi k ct) to generate
the token
library and JPBC library and runs on Spark to build an inverted T = G (k1 , wi k ct) , ri1 , G (k2 , wi k ct) , ri2 .
index to speed up the document retrieval. We choose Spark for Retrieval. Retrieval is written in Java and Python and
the following reasons. For one hand, we need to pre-process performed over HBase. It retrieves the files containing the
each email file to parse each keyword and remove the stop keyword in token. We set the cluster for 1 HMaster and 3
word. Fortunately, Spark can do these operation with each HRegionServer. Each HRegionServer has 40 cores and 400G
file in parallel. Therefore, it can dramatically accelerate the of memory. It first computes
the hash value HV1 by inputting
G (k1 , wi k ct) , ri1 to chameleon hash function. It then looks
1 Enron Email Dataset: http://www.cs.cmu.edu/∼./enron/ up the Zookeeper to find the region containing the ”RowKey”
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2810297, IEEE Access
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2810297, IEEE Access
TABLE I
COST OF THE ENCRYPTED INDEX
270
260
Dataset Setup time (s) Index size (MB)
Time (ms)
I 121.818 8.4 250
II 1123.656 120 240
III 1638.115 584
230
220
80
210
I II III
75 Dataset
Time (ms)
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2810297, IEEE Access
4000 1500
0 0
2,1 2,2 3,2 2,3 3,3 4,3 2,1 2,2 3,2 2,3 3,3 4,3
No of inputs & No. of gates No. of inputs & No. of gates
Fig. 4. Memory usage in branching program generation. Fig. 7. File size of a branching program.
5
×10
2 each query. In [13], Gajek et al. propose to use constrained
functional encryption to hide the search pattern. Since the
1.5 proposed scheme relies on operations in bilinear group to
Time (ms)
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2810297, IEEE Access
10
extensive experiments for evaluation. The performance of our [19] X. Yuan, H. Cui, X. Wang, and C. Wang, “Enabling privacy-assured
scheme relies on the underlying cryptographic technique iO. similarity retrieval over millions of encrypted records,” in Proc. of
ESORICS, 2015.
Although currently the iO technique is inefficient, related [20] R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky, “Searchable
research endeavors on practical iO are actively ongoing [41]. symmetric encryption: Improved definitions and efficient constructions,”
The practical performance improvement of iO will directly Journal of Computer Security, vol. 19, no. 5, pp. 895–934, 2011.
[21] S. Garg, E. Miles, P. Mukherjee, A. Sahai, A. Srinivasan, and
lead to the improvement and applicability of our scheme. We M. Zhandry, “Secure obfuscation in a weak multilinear map model.”
emphasize that our scheme presents a new attempt that pushes in Theory of Cryptography Conference, 2016.
forward the research on SSE with concealed search pattern. [22] S. Banescu, M. Ochoa, N. Kunze, and A. Pretschner, “Idea: Benchmark-
ing indistinguishability obfuscation–a candidate implementation,” in
International Symposium on Engineering Secure Software and Systems.
ACKNOWLEDGEMENTS Springer, 2015, pp. 149–156.
[23] S. Badrinarayanan, E. Miles, A. Sahai, and M. Zhandry, “Post-zeroizing
This work was supported in part by the Research Grants obfuscation: New mathematical tools, and the case of evasive circuits,”
Council of Hong Kong under Project CityU 11276816, Project in Proc. of EUROCRYPT, 2016.
CityU 11212717, and Project CityU C1008-16G, the Innova- [24] C. Guan, K. Ren, F. Zhang, F. Kerschbaum, and J. Yu, “Symmetric-key
based proofs of retrievability supporting public verification,” in Proc. of
tion and Technology Commission of Hong Kong under Project ESORICS, 2015.
ITS/168/17, the National Natural Science Foundation of China [25] Y. Zhang, C. Xu, X. Liang, H. Li, Y. Mu, and X. Zhang, “Efficient
under Project 61572412 and Project 61472316, and the Key public verification of data integrity for cloud storage systems from
indistinguishability obfuscation,” IEEE Trans. on Information Forensics
Research and Development Projects of Shaanxi Province under and Security, vol. 12, no. 3, pp. 676–688, 2017.
Project 2017ZDXM-GY-011. [26] E. Stefanov, C. Papamanthou, and E. Shi, “Practical dynamic searchable
encryption with small leakage,” in Proc. of NDSS, 2014.
R EFERENCES [27] P. Golle, J. Staddon, and B. Waters, “Secure conjunctive keyword search
over encrypted data,” in Proc. of ACNS, 2004.
[1] K. Liang, C. Su, J. Chen, and J. K. Liu, “Efficient multi-function data [28] J. H. Cheon, K. Han, C. Lee, H. Ryu, and D. Stehlé, “Cryptanalysis of
sharing and searching mechanism for cloud-based encrypted data,” in the multilinear map over the integers.” Proc. of EUROCRYPT, 2015.
Proc. of ACM AsiaCCS, 2016. [29] J.-S. Coron, C. Gentry, S. Halevi, T. Lepoint, H. K. Maji, E. Miles,
[2] J. Hughes, “Data breaches in the cloud: Who’s re- M. Raykova, A. Sahai, and M. Tibouchi, “Zeroizing without low-level
sponsible?” Online at http://www.govtech.com/security/ zeroes: New mmap attacks and their limitations,” in Annual Cryptology
Data-Breaches-in-the-Cloud-Whos-Responsible.html, 2014. Conference. Springer, 2015, pp. 247–266.
[3] C. Bösch, P. H. Hartel, W. Jonker, and A. Peter, “A survey of provably [30] Y. Hu and H. Jia, “Cryptanalysis of ggh map,” in Proc. of EUROCRYPT,
secure searchable encryption,” ACM Computing Surveys, vol. 47, no. 2, 2016.
pp. 18:1–18:51, 2014. [31] Z. Brakerski, C. Gentry, S. Halevi, T. Lepoint, A. Sahai, and M. Ti-
[4] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-preserving bouchi, “Cryptanalysis of the quadratic zero-testing of ggh.” IACR
multi-keyword ranked search over encrypted cloud data,” IEEE Trans. Cryptology ePrint Archive, vol. 2015, p. 845, 2015.
on Parallel and Distributed Systems, vol. 25, no. 1, pp. 222–233, 2014. [32] S. Halevi, “Graded encoding, variations on a scheme.” IACR Cryptology
[5] R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky, “Searchable ePrint Archive, vol. 2015, p. 866, 2015.
symmetric encryption: improved definitions and efficient constructions,” [33] J. H. Cheon, P.-A. Fouque, C. Lee, B. Minaud, and H. Ryu, “Cryptanal-
in Proc. of ACM CCS, 2006. ysis of the new clt multilinear map over the integers,” in Annual Inter-
[6] S. Kamara, C. Papamanthou, and T. Roeder, “Dynamic searchable national Conference on the Theory and Applications of Cryptographic
symmetric encryption,” in Proc. of ACM CCS, 2012. Techniques. Springer, 2016, pp. 509–536.
[7] D. Cash, S. Jarecki, C. Jutla, H. Krawczyk, M.-C. Roşu, and M. Steiner, [34] B. Minaud and P.-A. Fouque, “Cryptanalysis of the new multilinear map
“Highly-scalable searchable symmetric encryption with support for over the integers.” IACR Cryptology ePrint Archive, vol. 2015, p. 941,
boolean queries,” in Proc. of CRYPTO, 2013. 2015.
[8] M. S. Islam, M. Kuzu, and M. Kantarcioglu, “Access pattern disclosure [35] B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, S. Vadhan,
on searchable encryption: Ramification, attack and mitigation.” in Proc. and K. Yang, “On the (im) possibility of obfuscating programs,” Journal
of NDSS, 2012. of the ACM, vol. 59, no. 2, p. 6, 2012.
[9] D. Cash, P. Grubbs, J. Perry, and T. Ristenpart, “Leakage-abuse attacks [36] D. Apon, Y. Huang, J. Katz, and A. J. Malozemoff, “Implementing
against searchable encryption,” in Proc. of ACM CCS, 2015. cryptographic program obfuscation,” Cryptology ePrint Archive, Report
[10] M. Naveed, S. Kamara, and C. V. Wright, “Inference attacks on property- 2014/779, 2014. http://eprint. iacr. org, Tech. Rep., 2015.
preserving encrypted databases,” in Proc. of ACM CCS, 2015, pp. 644– [37] M. Sauerhoff, I. Wegener, and R. Werchner, “Relating branching pro-
655. gram size and formula size over the full binary basis.” in Proc of STACS,
[11] D. Pouliot and C. V. Wright, “The shadow nemesis: Inference attacks 1999.
on efficiently deployable, efficiently searchable encryption,” in Proc. of [38] D. Boneh and M. Zhandry, “Multiparty key exchange, efficient traitor
ACM CCS, 2016, pp. 1341–1352. tracing, and more from indistinguishability obfuscation,” in Proc. of
[12] C. Liu, L. Zhu, M. Wang, and Y. A. Tan, “Search pattern leakage CRYPTO, 2014.
in searchable encryption: Attacks and new construction,” Information [39] D. Boneh, D. J. Wu, and J. Zimmerman, “Immunizing multilinear maps
Sciences, vol. 265, no. 5, pp. 176–188, 2014. against zeroizing attacks.” IACR Cryptology ePrint Archive, vol. 2014,
[13] S. Gajek, “Dynamic symmetric searchable encryption from constrained p. 930, 2014.
functional encryption.” in Cryptographers Track at the RSA Conference, [40] E. Miles, A. Sahai, and M. Zhandry, “Annihilation attacks for multilinear
2016. maps: Cryptanalysis of indistinguishability obfuscation over ggh13,” in
[14] J. Li, D. Lin, A. C. Squicciarini, J. Li, and C. Jia, “Towards privacy- Proc. of Crypto, 2016.
preserving storage and retrieval in multiple clouds,” IEEE Trans. on [41] P. Ananth, D. Gupta, Y. Ishai, and A. Sahai, “Optimizing obfuscation:
Cloud Computing, vol. 5, no. 3, pp. 499–509, 2017. Avoiding barrington’s theorem,” in Proc. of ACM CCS, 2014.
[15] H. Krawczyk and T. Rabin, “Chameleon signatures,” in Proc. of NDSS,
2000.
[16] S. Garg, C. Gentry, S. Halevi, M. Raykova, A. Sahai, and B. Waters,
“Candidate indistinguishability obfuscation and functional encryption
for all circuits,” in Proc. of FOCS, 2013.
[17] A. Sahai and B. Waters, “How to use indistinguishability obfuscation:
Deniable encryption, and more,” in Proc. of ACM TC, 2014.
[18] C. Wang, N. Cao, K. Ren, and W. Lou, “Enabling secure and efficient
ranked keyword search over outsourced cloud data,” IEEE Trans. Par-
allel Distrib. Syst., vol. 23, no. 8, pp. 1467–1479, 2012.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2810297, IEEE Access
11
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.