IEEE2023 Data Secure De-Duplication and Recovery Based On Public Key Encryption With Keyword Search
IEEE2023 Data Secure De-Duplication and Recovery Based On Public Key Encryption With Keyword Search
ABSTRACT In the current era of information explosion, users’ demand for data storage is increasing, and
data on the cloud has become the first choice of users and enterprises. Cloud storage facilitates users to
backup and share data, effectively reducing users’ storage expenses. As the duplicate data of different users
are stored multiple times, leading to a sudden decrease in storage utilization of cloud servers. Data stored in
plaintext form can directly remove duplicate data, while cloud servers are semi-trusted and usually need
to store data after encryption to protect user privacy. In this paper, we focus on how to achieve secure
de-duplication and recover data in ciphertext for different users, and determine whether the indexes of public
key searchable encryption and the matching relationship of trapdoor are equal in ciphertext to achieve secure
de-duplication. For the duplicate file, the data user’s re-encryption key about the file is appended to the
ciphertext chain table of the stored copy. The cloud server uses the re-encryption key to generate the specified
transformed ciphertext, and the data user decrypts the transformed ciphertext by its private key to recover
the file. The proposed scheme is secure and efficient through security analysis and experimental simulation
analysis.
I. INTRODUCTION privacy leaks and sensitive data leaks have emerged, and even
As a major service provided by cloud computing technology, more, there are CSP through the sale of user data to achieve
cloud storage enables users to backup and share data eas- corporate profits. The issue of data security in cloud storage
ily and quickly, which can efficiently reduce users’ storage deserves widespread attention.
expenses and improve work efficiency. With the increasing Big data and cloud computing are developing rapidly, with
maturity of cloud computing technology. There are many an explosion of data from users around the world, resulting
Cloud Service Providers (CSP) in the market, such as Baidu in a dramatic increase in demand for cloud servers. An effec-
Cloud, Amazon Cloud, and other famous CSPs. Users will tive solution to the need for storage of massive amounts of
upload and store their confidential data to the data storage data will be deduplicate data. For plaintext data, the equality
center of the cloud server, which is managed and maintained test can be achieved by direct comparison, while user data
by the CSP, but with this comes the frequent occurrence involves user’s personal privacy, and uploading or storing
of cloud computing security issues. For enterprises or indi- it in plaintext form to cloud servers can cause user privacy
vidual users will be personal files, business contracts, user leakage. Encrypting data can protect user privacy effectively.
transaction records, environmental geographic data, and other In practical scenarios, different users use different keys for
susceptible data stored in the cloud server. However, user encrypting files, and there are random parameters in the
encryption, then the ciphertext generated from the same file
The associate editor coordinating the review of this manuscript and is different. By directly comparing ciphertexts, we cannot
approving it for publication was Tyson Brooks . determine the duplicity of files, and cloud servers will store
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
28688 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 11, 2023
L. Li et al.: Data Secure De-Duplication and Recovery Based on Public Key Encryption With Keyword Search
multiple encrypted copies of the same file, which will put transformation of traditional industries. How to effectively
huge storage pressure on cloud servers. Therefore, there is improve the space utilization of cloud storage is one of the
an urgent need to design a secure de-duplication scheme for problems that cloud storage needs to solve urgently. A large
encrypted data with different keys in multi-user scenarios. of duplicate data exists in the massive data being stored on the
Currently, convergent encryption [1] is widely used to cloud server many times. The main ideas to solve the problem
construct secure data de-duplication systems, but convergent of deduplicated data storage are to improve the compression
encryption also faces various dangers such as data leakage, rate of stored files and to secure data de-duplication. This
faking attacks and chosen-plaintext attacks [2], [3], [4]. Since paper focuses on how to achieve secure data de-duplication
the encryption key used in convergent encryption is generated in a multi-user environment.
by the hash value of user’s data file, multiple files of the Secure data de-duplication can be classified into file-level
same user will generate multiple different keys, thus causing de-duplication and block-level de-duplication according to
a key management problem [5]. The operations of encryption the granularity of de-duplication, client de-duplication and
and de-duplication of data affect each other. Encrypting data server de-duplication according to the de-duplication entity.
with the same key by different users will generate the same In this paper, we focus on file-level secure de-duplication
ciphertext. Secure data de-duplication is achieved by directly on the server. To protect the privacy of users, server
comparing ciphertexts with each other, but this will cause the de-duplication requires users to encrypt data files before
problem of key management. If different users use different uploading them to cloud servers. Doucear et al. [1] pro-
keys to encrypt data, the key management problem can be posed convergent encryption, which can effectively balance
effectively reduced, but it is difficult to achieve equality test. data de-duplication and data encryption to achieve secure
Therefore, how different users can encrypt data with the same de-duplication in the ciphertext. It computes the hash value of
key without communicating with each other, thus producing files as the key of encrypted files.The same file will generate
the same ciphertext after encrypting the same data, and how the same key, and encrypting the same file with the same key
users can recover their data are the main research directions will generate the same ciphertext, thus realizing the direct
of this paper. comparison of the duplicity of files in the ciphertext state.
In this paper, Public Key Encryption with Keyword Through this mechanism, we can see that the key generated
Search (PEKS) is used to detect file duplicates by matching using the file does not have randomness, and each file will
keywords with trapdoors, and Proxy Re-encryption (PRE) is generate a key, which will lead to the problem of key man-
used for data recovery [6]. The scheme is mainly divided agement. Bellare et al. [8] designed the variant algorithms
into data de-duplication and data recovery. For the data HCE1, HCE2, and HCE3 of convergent encryption by ana-
de-duplication, the data owner needs to upload the file cipher- lyzing the security of convergent encryption [1] to improve
text, file tag, and re-encryption key to the cloud server, the file the security and efficiency of convergent encryption. Con-
tag points to the file ciphertext, and the re-encryption key is vergent encryption uses the file hash value as the encryption
stored in the corresponding ciphertext chain table. When the key and determines directly whether the file is duplicated
test result is True, it means that the file is already stored in the by the ciphertext. Message Locking Encryption (MLE) is a
server. The data user does not need to upload the ciphertext further improvement to convergent encryption. MLE gener-
again but only needs to upload the re-encryption key of the ates a tag for the file for de-duplication. MLE does not rely
specified file to the corresponding ciphertext chain, which exclusively on the file hash to generate the file encryption
can effectively reduce the storage overhead of the cloud key. The encryption key is generated after mapping the file
server. When the test result is False, it means that the file is by a deterministic function, which is not resistant to brute
not stored in the server, and the data owner needs to upload force attacks on predictable information. The encryption key
the ciphertext, file tag, and re-encryption key. Regarding data and the tag are independent and they are not related in any
recovery, users only need to store the user key locally, not way. To solve the problem, Keelveedh et al. [9] also pro-
the file key for each file. The user key can be generated only posed DupLESS server-assisted secure data de-duplication
locally without introducing a key generation center (KGC) scheme, which effectively improves the randomness of deter-
to avoid key substitution attacks, malicious KGC attacks [7]. ministic ciphertexts. Abadi et al. [10] constructed MLE2 for
The user initiates a request to the cloud server to obtain the lock-dependent messages based on MLE to improve the
file, and the cloud server uses the re-encryption key of the security of data de-duplication. Liu et al. [11] constructed a
user in the ciphertext chain table to re-encrypt and generates secure data de-duplication system based on a key exchange
the transformed ciphertext. The transformed ciphertext is sent protocol without relying on an additional server, but the
to the user, and the user can decrypt the file using his private system has a large communication overhead and compu-
key. tational overhead and requires most users to be online
at the same time. Puzio et al. [12] proposed PerfectDedup
A. RELATED WORKS to perform secure de-duplication based on the popularity
The continuous development of cloud storage technology has of data, combined with the property of perfect hashing
brought new opportunities to many industries, and data on to ensure the confidentiality of data. Li et al. [13] pro-
the cloud has become the immediate need for the digital posed a rekeying-aware encrypted de-duplication storage
VOLUME 11, 2023 28689
L. Li et al.: Data Secure De-Duplication and Recovery Based on Public Key Encryption With Keyword Search
system, where the data owner only needs to re-encrypt dual-server public keys for the generation of keyword
part of the message using convergent all-or-nothing trans- ciphertexts and trapdoors but also the public keys between
form (CAONT) to achieve secure de-duplication and effec- users to ensure that only authenticated users can search the
tively reduce the computational overhead of the system. ciphertext. Lu et al. [32] devise a lightweight public key
Li et al. [14] proposed CDStore, an enhanced secret shar- authenticated encryption with keyword search scheme, which
ing scheme based on convergent encryption which takes is suitable for the resource-constrained mobile devices.
deterministic hash values as the input of secret sharing and
supports de-duplication. Tang et al. [15] proposed a secure B. OUR CONTRIBUTION
de-duplication scheme based on threshold re-encryption, In this paper, we construct a secure data de-duplication and
which can resist side-channel attacks while effectively reduc- recovery scheme based on public key searchable encryption
ing computational overhead. Gao et al. [16] proposed a by combining public key searchable encryption and proxy
secure de-duplication scheme without trusted third par- re-encryption. The contributions of this paper are specified
ties, with hierarchical encryption based on prevalence and as follows:
privacy. Kan et al. [17] proposed an identity-based proxy 1) A secure data de-duplication scheme based on pub-
re-encryption scheme to achieve secure data de-duplication lic encryption with keyword search is constructed
and recovery by combining data de-duplication and user to realize secure data de-duplication in a multi-user
access privileges to achieve a complete de-duplication. environment, which can effectively save the storage
Yuan et al. [18] found that REED [14] has a stub-reserved space of cloud servers. The scheme in this paper
attack problem and constructed a new secure de-duplication is a server de-duplication, which can achieve secure
algorithm using CAONT and Bloom filter to resist stub- de-duplication without users online, and its application
reserved attack. scenario is more flexible.
PRE was first proposed by Blaze et al. [19], PRE enables 2) This paper uses proxy re-encryption to achieve user
data sharing without revealing the data owner’s key. Using data recovery. The server uses the re-encryption key
PRE, a proxy server can transform the ciphertext encrypted stored in the ciphertext chain table for re-encryption,
using the data owner’s key to generate a transformed cipher- and the user can decrypt and recover the files using only
text that can decrypt by the data user’s key, thus protecting his private key, so that there is no need to save each file
the data owner’s key while enabling data sharing. Lu and key, which can effectively reduce the key management
Li [20] proposed a pairing-free proxy re-encryption scheme problem.
that can meet the application requirements of devices with 3) The only entities involved in the interaction are users
limited computing power. According to the practical applica- and cloud servers, and no KGC is introduced, so that
tion scenarios, the PRE scheme applicable to the IoT, cloud key substitution attacks and malicious KGC attacks
computing, and other [21], [22], [23] scenarios is proposed. can be effectively avoided. Meanwhile, for malicious
The application of electronic medical records in medical servers that can obtain file tags and file ciphertexts of
institutions has problems such as information leakage, and arbitrary files, it can achieve one-wayness under the
Liu et al. [24]. used proxy re-encryption and sequential multi- chosen file attack.
signature to solve the problem. 4) Massive data are storaged in the cloud server, so the
PEKS was first proposed by Boneh et al. [6] to support efficiency of equality test in the overall de-duplication
the server to search the ciphertext without knowing the process is critical, through experimental simulation
plaintext message. Fang et al. [25] constructed PEKS based analysis, for the storage of 5000 files in the database,
on the standard security model to resist keyword guess- the realization of safe de-duplication takes 42.86s,
ing attacks. Lu et al. [26] that certificate-based search- about one-third of the time consumed by the paper [17].
able encryption not only resists keyword guessing attacks,
but also supports implicit authentication, no secure chan- C. ORGANIZATION
nel. Guo and Yau [27] proposed PEKS that can satisfy the The rest of the paper is organized as follows. Section II
Indistinguishability of trapdoors. Qin et al. [28] introduced presents the algorithm and specific design. Section III ana-
an improved CI model that enables public key authenti- lyzes the security of the scheme. Section IV simulates the
cated encryption with keyword retrieval to resist fully cho- scheme to analyze its performance. Finally, a conclusion is
sen keyword to ciphertext-keyword attacks in a multi-user presented in Section V.
environment. Zhang et al. [29] first proposed public key
encryption with bidirectional keyword search, which has II. ALGORITHM AND SCHEME DESIGN
practical applications in various scenarios such as email A. SYSTEM MODEL
systems. Chen et al. [30], inspired by the Diffie-Hellman As shown in Fig 1, the entities involved in this scheme include
Exchange algorithm constructed a dual-server public-key data user, CSP, and these entities are described below.
authenticated encryption with keyword search scheme based Data user: The data user encrypts the file and uploads it to
on Chen et al. [31], where the system requires not only the CSP to ensure the user’s privacy. The user is distinguished
into data owner (DO) and data user (DU) according to the and let Z = e(g, g). In addition, the following defines three
status at the time of upload, and the DO needs to upload the hash functions H1 : {0, 1}∗ → Zq∗ ,H2 : {0, 1}∗ → G1 ,H3 :
ciphertext, file tag, and re-encryption key, while the DU only G2 → Zq∗ .
needs to upload the re-encryption key about the file. When FileKey (params, F) → (SKF , PKF ): This algorithm is
the data user recovers the file, the CSP uses the corresponding run by user. Input a file F,a user computes private key SKF =
re-encryption key in the ciphertext chain table to generate the f = H1 (F) and public key PKF = Z f = Z H1 (F) ,then outputs
transformed ciphertext. The user can use his private key to a file key pair (PKF , SKF ).
decrypt the transformed ciphertext to recover the file. UserKey (params) → (SKu , PKu ): This algorithm is run
CSP: store the ciphertext and file tags uploaded by users, by user. A user choices random element a from the set Zq∗ as
and establish file tag index. When the user uploads a test tag, user private key SKu and computes user public key PKu =
a matching operation is performed to determine whether it ga ,then outputs a user key pair (PKu , SKu ).The user key pair
is a duplicate file. For non-duplicate files, the user needs to is generated by itself without the help of KGC, and the key
store the ciphertext, file tags, and the user’s re-encryption information is not involved in the process of interacting with
key about the file in the cloud server. For duplicate files, the server, but only kept by the user, thus ensuring the privacy
users only need to store their re-encryption keys for the files of the user key.
in the corresponding ciphertext chain table. When the user ReKey(SKF , PKu ) → RKF→u : This algorithm is run by
sends a file recovery request, the CSP uses the corresponding user. It inputs the file private key SKF and the user public
re-encryption key in the ciphertext chain table to generate a key PKu , then outputs file F about user’s re-encryption key
transformed ciphertext to send to the user. RKF→u = ga·f . The RKF→u is sent to the cloud server and
stored in the ciphertext chain table, which is used only for the
specified user to recover the specified file owned by itself.
B. ALGORITHM DESIGN FileTag(params, F, SKF ) → TagF : This algorithm is run
The scheme in this paper involves only the user and the cloud by user. It inputs the file, then choices random element r1
server. The overall workflow is divided into two phases of from the set Zq∗ and extract keyword w = H2 (F) by file
data de-duplication and data recovery, containing a total of F. It outputs file tag TagF = (T1 , T2 ) about file F, where
10 algorithms, which are described as follows. T1 = gr1 , T2 = H3 e w, gr1 ·f . The user sends TagF to
Setup (k) → params: Given a security parameter k, return the cloud server, and the tag serves as a unique identifier
a public parameter params = {q, g, e, Z , G1 , G2 , H1 , H2 , H3 }, for the file. When the tag is stored in the server, it indicates
G1 and G2 are two cyclic groups with prime order q and g is that the file already exists and does not need to be uploaded
a generator G1 . It defines a bilinear map e : G1 × G1 → G2 repeatedly.
Enc(F, PKF ) → C: This algorithm is run by user. It inputs H2 (F ′ ) ̸= H2 (F) then it means F ′ ̸= F. Therefore, the output
the file F and user public key, then returns ciphertext C. This of the algorithm Test is true for the same file and false for a
algorithm runs as below: different file, thus determining whether the file is a duplicate.
1) Select random element r2 from the set Zq∗ and compute
C1 = gr2 . 2) CORRECTNESS OF RE-DECRYPTION
2) Compute C2 = F · e (g, g)f ·r2 = F · Z f ·r2 . For the user who owns the file, the ciphertext chain table cor-
This algorithm outputs ciphertext C2 = (C1 , C2 ) = responding to the ciphertext stored in the cloud server should
gr2 , F · Z f ·r2 . For non-duplicate files, the user needs to
hold the re-encryption key of the user. This re-encryption key
encrypt them using this algorithm and then upload to the is generated from the file encryption key and the user’s public
cloud server for storage. key, forming a one-to-one correspondence between the file
TestTag(F) → TF : his algorithm is run by user. It inputs and the user’s identity. Therefore, the user can decrypt the
the file F, then computes test tag TF = H2 (F)H1 (F) . The test corresponding transformed ciphertext with his private key.
tag is used to verify that the corresponding file already exists The specific decryption calculation process is as follows:
in the cloud server. 1/a
Test(TF , TagF ) → ⊥: This algorithm is run by cloud C2 ′ / C1 ′ = F · e(g, g)f r2 /e(C1 , RKF→u )1/a
server. It inputs test tag TF and file tag TagF , the cloud server 1/a
= F · e(g, g)f r2 /e gr2 , ga·f
verifies whether H3 (e(TF , T1 )) = T2 is valid. When the
equation holds, it indicates that the file is duplicated, so the = F · e(g, g)f r2 /e(g, g)f r2
user doesn’t need to upload the ciphertext, but only needs =F
to generate the re-encryption key RKF→u of the duplicate
file about itself by the algorithm, and store the key in the The above equation shows that if and only if the user has
ciphertext chain table of the corresponding ciphertext. the file, he can use his private key to recover the file F
ReEnc (C, RKF→u ) → C ′ : This algorithm is run by correctly, while for users who don’t have access to the file,
cloud server. It inputs original ciphertext C and re-encryption the generated transformed ciphertext cannot be decrypted.
key RKF→u ′
, then computes transformed ciphertext C =
C1 , C2 . This algorithm runs as below:
′ ′ D. WORK PROCESS
C1 ′ = e (C1 , RKF→u ) = Z r2 ·a·f , C2 ′ = C2 . When the This program can realize secure data de-duplication and data
user needs to obtain the duplicate file, he only needs to send recovery. This subsection mainly describes the workflow of
a request to the server, which will find the corresponding this program, which is mainly divided into two phases: data
re-encryption key in the corresponding ciphertext chain table, de-duplication and data recovery.
use the re-encryption key to generate the transformed cipher- Phase 1. data de-duplication
text and send it to the user. The workflow of the data de-duplication phase is depicted
ReDec C ′ , SKu → F: This algorithm is run by user. in Fig 2. The data user generates file tags TagF based on the
When the user receives the transformed ciphertext C ′ from files F and uploads them to the cloud server to form the file
the server, the user can simply use his private key SKu to tag index. When the data user uploads a file, a test tag TF
1/a
compute F = C2 ′ / C1 ′ . is generated and sent to the CSP, which uses the matching
relationship between TagF and TF to determine whether there
C. CORRECTNESS ANALYSIS is a file tag that matches the test tag uploaded by the user,
1) CORRECTNESS OF DE-DUPLICATION and sends the test result Test (TF , TagF ) to the data user.
The user needs to perform de-duplication before uploading If there exist a matching file tag, return true, it means that
files to the server, and when the files are duplicated, there is there is already a duplicate file in the CSP, and the user does
no need to upload the files. The user computes the test tag TF ′ , not need to encrypt the file, but only needs to upload the
and uses the test tag TF ′ and the file tag TagF ′ as input to the file re-encryption key about the user to the ciphertext chain
algorithm Test to determine whether the file F ′ is duplicated. table of the corresponding file in the cloud server. If there is
The specific de-duplication process is as follows: no matching file tag, return false, it means that there is no
Using the bilinear pairing properties it follows that: duplicate file in CSP, then the user as the data owner needs to
f ·r generate file ciphertext, file tag, re-encryption key about the
H3 (e (TF ′ , T1 )) = H3 e H2 F ′ , g 1 user and upload it to the cloud server.
Phase 2. Data recovery
T2 = H3 e H2 (F) , gf r1 The workflow of the data recovery phase is depicted in
= H3 e(H2 (F) , g)f r1 Fig 3. The data user sends a request to get the file, and the
CSP queries whether the re-encryption key RKF→u of the
The above equation shows that if the test tag corresponds user exists in the ciphertext chain table of the file F. When
to the same file as the file tag, then H3 (e(H2 (F ′ ), g)f ·r1 ) = the user’s re-encryption key exists in the ciphertext chain
H3 (e(H2 (F), g)f ·r1 ) show that the equation H3 (e(TF ′ , T1 )) = table of the file F, the cloud server re-encrypts the original
T2 holds. Since the hash function is collision-resistant, when ciphertext C of the file, generates the transformed ciphertext
C ′ , and sends it to the user. After receiving the transformed generates the key pair of the challenger user from UserKey.
ciphertext, the user can use private key SKu to decrypt the The challenger sends params and PKu to the attacker A.
file F. When the user’s re-encryption key doesn’t exist in the Phase 1. A answers queries as follows.
ciphertext chain table of the file F, it means that the user • File key queries: Input file F, then returns the key pair
cannot access the file F. (PKF , SKF ) of the file F.
III. SECURITY PROOF • File key queries: Input file F, then returns the file tag
A. SECURITY MODEL TagF of the file F.
In this paper, we consider an insider attacker malicious CSP • ciphertext queries: Input file F, then returns the cipher-
server. This type of attacker will comply with the execution text CF of the file F.
of the protocol, but can obtain the file tag and ciphertext of • Re-encryption key queries: Input file F and user public
any file. In addition, the attacker can obtain the test tag and key PKu , then returns the re-encryption key RKF→u .
re-encryption key of the file. For the target file, the attacker • Test tag queries: Input file F, then returns the test tag TF
is not allowed to obtain the test tag of the file. If the attacker of the file F.
knows the private key of user, the attacker is also not allowed In the above queries, in addition to the challenge file F ∗ ,
to obtain the re-encryption key of the target file to that user. the attacker can complete various queries for other files based
For this type of attacker, the formal definition of the security on the relevant algorithms.
model of the scheme is given below. Challenge. The challenger randomly selects a challenge
Setup. The challenger generates system parameters file F ∗ from the file space G2 computes the file tag TagF ∗
params = {q, g, e, Z , G1 , G2 , H1 , H2 , H3 } from Setup and and ciphertext CF ∗ , and returns them to A.
Phase 2. A can ask for queries related to any file in the if ε is negligible, the SBDH problem is hard to solve
same way as in phase1, but cannot ask for queries related to by A.
a challenge file F ∗ as follows: Theorem 1: If the SBDH problem is hard to solve, under
• File private key of challenge file F ∗ . the random oracle model, the scheme satisfies one-wayness
• A can obtain the re-encryption key RKF ∗ →u from the security under the chosen file attack.
challenge file F ∗ to the challenge user, but cannot ask for Specifically, If there exists an algorithm A that attacks the
the re-encryption key from the challenge file to another one-wayness of the scheme with probability ε, then another
user whose private key is known. algorithm B can be constructed to solve one solution of the
• File test tag of challenge file F ∗ . SBDH problem instance with probability at least
Guess. A returns a file F ∈ G2 . If F = F ∗ , then the
1 1
1
attacker succeeds, otherwise the attacker fails. + ε− ,
2 2q3 2q3 q
For any Probabilistic Polynomial-Time (PPT) attacker,
a scheme is said to satisfy one-wayness under chosen file where q3 denotes the number of times the attacker asks the
attack if the probability of the attacker succeeding in the hash function H3 .
above game is negligible. Proof Assume that A is a one-wayness PPT attacker of any
attack scheme. Given an instance g, ga , gb , gc ∈ G1 4 of the
B. SECURITY ANALYSIS BDH problem and a bilinear pairing e : G1 × G1 → G2 ,
SBDH problem. Given a security parameter k, a group G if the attacker can successfully attack the one-wayness of the
with the order q, a generator g of G, and randomly choose scheme with probability ε, then another simulator algorithm
a, b, c ∈ Zq∗ . On input a tuple g, ga , gb , gc , g1/a , gbc/a ∈
B can be constructed to successfully solve the above BDH
G1 6 to compute e(g, g)abc . There is an PPT adversary A, problem with at least xxx probability. Algorithm B simulates
the advantage of the execution of the game as follows.
AdvSBDH Setup. Simulator B generates system parameter params =
A h (k)
{q, g, e, Z , G1 , G2 , H1 , H2 , H3 } based on the SBDH problem
i
= Pr A g, ga , gb , gc , g1/a , gbc/a → e(g, g)abc ,
instance, where Z = e(g, g). H1 : {0, 1}∗ → Zq∗ , H2 :
≤ε {0, 1}∗ → G1 and H3 : G2 → Zq∗ are the three hash functions
selected by the simulator. The simulator sets the public key of queries to the attacker are answered in the same way as in the
the challenge user u∗ as PKu∗ = g1/a according to the BDH original security model. Since challenge file F ∗ is randomly
problem instance and implicitly defines its private key as and independently selected by the simulator, the probability
SKu∗ = 1/a. In addition, the simulator randomly selects R ∈ that the attacker selects a file F equal to the challenge file in
G2 and k1 ∈ Zq∗ and implicitly defines F ∗ = R · e(g, g)−k1 abc each queries phase does not exceed 1/q. The correctness of
as the challenge file. Finally, the simulator sends params and the challenge tag and challenge ciphertext is analyzed below.
PKu∗ to the attacker. Because of C1∗ = (ga )k1 and F ∗ = R · e(g, g)−k1 abc ,
Phase 1. When the attacker makes the following query, the it follows that C2∗ = F ∗ · e(g, g)f ·r2 = R · e(g, g)−k1 abc ·
simulator responds as follows. e(g, g)k1 abc = R. The challenge file ciphertext in the simu-
• Hash queries. The simulator builds three hash lists L1 = lated game is consistent with the challenge ciphertext distri-
{(∗, ∗)}, L2 = {(∗, ∗)} and L3 = {(∗, ∗)} with empty ini- bution in the original model.
tial elements. When the attacker asks the hash function Because of T1∗ = gr1 and H2 (F ∗ ) = (ga )k2 , it follows that
H1 about the hash value of element x1,i , the simulator e(H2 (F), gr1 bc ) = e(g, g)abcr2 . If the attacker has never asked
first checks whether the binary (x1,i , h1,i ) exists in list H2 about the hash of e(g, g)abcr2 , then the simulator randomly
L1 and returns h1,i if it does, otherwise, it randomly chooses T2∗ ∈ Zq∗ to be consistent with the challenge tag
selects h1,i ∈ Zq∗ and returns it to the attacker. distribution in the original model. If the attacker has asked
• File key queries. When the attacker asks for the key H2 about the hash of e(g, g)abcr2 , the simulator randomly 1/r
of file F, the simulator returns the (PKF , SKF ) = chooses x3,i , from the query list of H2 . Then x3,i 2 is
(Z H1 (F) , H1 (F)) according to the FileKey. In particular, a solution to the SBDH problem with the probability of at
when the attacker asks for the public key of the challenge least 1/q3 .
file F ∗ , the simulator computes PKF ∗ = e(gb , gc ) and Let E denote the event ‘‘Attacker asks H2 about the hash
returns it to the attacker. of e(g, g)abcr2 , and prove below that the probability of E
• File tag queries. When the attacker asks for the tag of occurring is at least ε − 1/q. When E does not occur, Pr[F =
file F, the simulator computes the file tag TagF of F F ∗ |¬E] = 1/q is completely independent of the challenge
according to FileTag and returns it to the attacker. file because the challenge tag and challenge ciphertext are
• File ciphertext queries. When the attacker asks for the completely independent of the challenge file. Because
ciphertext of file F, the simulator computes the cipher-
text CF according to Enc and returns it to the attacker. Pr[F = F ∗ ] = Pr[F = F ∗ |E] · Pr[E]
• Re-encryption key queries. When the attacker u ̸ = u∗ , = + Pr[F = F ∗ |¬E] · Pr[¬E]
the simulator computes the re-encryption key RKF→u 1
according to ReKey and returns it to the attacker. If the ≤ Pr[E] + Pr[¬E]
q
attacker asks for the re-encryption key from the chal- Pr[F = F ∗ ] − 1/q
lenge file to the challenge user, the simulator makes Pr[E] ≥ ≥ ε − 1/q
1 − 1/q
RKF ∗ →u∗ = gbc/a and returns it to the attacker.
• Test tag queries. When the attacker asks for the test tag of In summary, the analysis shows that if the attacker breaches
file F, the simulator computes the test tag TF according the security of the scheme with probability ε, the simulator
to TestTag and returns it to the attacker. obtains at least one solution to the SBDH problem with
Phase 2. The simulator answers the attacker’s queries as in probability
Phase 1, but the attacker cannot make the following queries.
1 1
1
• A asks for the private key SKF ∗ of challenge file F ∗ . + ε− .
2 2q3 2q3 q
• A asks for the re-encryption key of Challenge file F ∗ to
other users.
• A asks for the test tag of challenge file F ∗ . IV. PERFORMANCE EVALUATION
Guess. The attacker returns a guess file F ∈ G2 . The A. THEORETICAL ANALYSIS
simulator randomly selects a bit b ∈ {0, 1}. If b = 0 the The proposed scheme in this paper involves entities such
simulator computes as data users and cloud servers, and data users divide into
1/k1 data owners and data users based on the order of uploading
R
T∗ = . files. For the entities involved in the paper [17] are data
F owner, data user, cloud server, and KGC. The key gener-
Otherwise, the simulator randomly selects an element 3/r ation involved in this scheme requires the participation of
(x3,i , h3,i ) from the hash list L3 , computes T ∗ = x3,i 2 , KGC, while the key generation in the scheme of this paper
and uses T ∗ as the solution to the challenge SBDH problem. are generated by the user without the need of a third party
Success probability analysis of the Simulator. If the trusted institution. Although key generation increases the
attacker selects file F ̸ = F ∗ in the query phase, then the computational overhead of the client, it is able to avoid KGC
simulator’s hash queries, file key queries, file tag queries, attacks [7]. In Table 1, by comparing with paper [17], the
file ciphertext queries, re-encryption key queries, and test tag algorithms in paper [17] and this paper are grouped into
the ten algorithms involved in Table 1 according to their de-duplication system, it is divided into ten algorithms in
functions for the sake of comparative analysis. The corre- Fig 4. As shown in Fig 4, the computational overhead of
spondence between the algorithms of the scheme in this paper each algorithm is derived from simulations in which the file
and the algorithms in the paper [17] is, where UserKey is size used is 1MB. It can be seen that the computational
equivalent to the Key generation, FileTag algorithm is equiv- overhead generated by our scheme and the paper [17] in
alent to the Data tag generation, TestTag is equivalent to the generating user keys and re-encryption is basically the same.
Ownership challenge and dedupliaction in the Test tag s gen- Since the equality test in the de-duplication system requires
eration algorithm, Test is equivalent to the process of equality Test algorithm for each file in the cloud storage, in order to
test by CSP in ownership challenge and dedupliaction, and improve the efficiency of equality test, our scheme increases
the rest of the algorithms not mentioned can be directly the computational overhead in FileTag compared to Test to
corresponded to the algorithms in Table 1. E denotes a reduce the computational overhead of Test, and the computa-
exponentiation operation, H denotes a hash operation, and tional overhead in Test is about 1/3 of that of the paper [17].
P denotes a pairing operation, and ’-’ denotes the algorithm For TestTag, Enc, ReKey, ReDec algorithm, the simulation
does not exist. Since the paper [17] uses user keys for file results show that the scheme of our scheme is better than the
encryption, it doesn’t need to execute the FileKey algorithm, paper [17]. In terms of the overall execution efficiency of the
while the scheme in this paper uses proxy re-decryption for system, the time required in a single process is about 82.2 ms
both data owner and data user decryption, so it doesn’t need to in ours, while the paper [17] is about 137.65 ms. It can be seen
execute Dec. that the computational overhead of the scheme in this paper
The computational overheads incurred in UserKey and is lower than that in the paper [17].
ReEnc are basically the same for both schemes. The com-
putational overhead of this paper’s scheme in generating
test tags increases by one exponential operation compared
to the paper [17], which will lead to an increase in com-
putational overhead. For each execution of Test algorithm,
our scheme needs to perform one bilinear pairing opera-
tion. In contrast, the paper [17] requires two bilinear pairing
operations, so there is a significant difference in efficiency
when the equality test is performed on a large number of
data. Moreover, the computational withholding overheads of
the encryption algorithm and the re-decryption algorithm in
this paper are smaller than those in the paper [17]. There-
fore, through theoretical analysis, the computational over- FIGURE 4. Average operation time of each phase of the scheme.
scheme reduces one exponential operation when generating searchable encryption to achieve file equality test in cipher-
the test tag compared with the paper [17]. text state and using proxy re-encryption to achieve data recov-
ery. Since the de-duplication process of a single file requires
the execution of multiple equality test algorithms depending
on the size of the database, this scheme is designed to avoid
the computational overhead of this algorithm as much as
possible. Through experimental simulation, the results show
that the scheme in this paper has good performance in a
cloud storage system. At present, scholars have made some
achievements in the study of Secure data de-duplication and
have applied it to practical scenarios. This paper conducts
in depth research based on the previous work, but there are
still some shortcomings, such as the current scheme of this
paper only supports equality test at the file level. In the future,
the main consideration is the de-duplication rate. When the
FIGURE 5. Computational overhead of generating file tags for different data user has two files with only minor differences, this paper
file size.
will determine them as different files, which will reduce the
de-duplication rate.
Test the impact of different numbers of files on the equality
test of the system. In the process of test, the cloud server REFERENCES
compares the uploaded test tags with all the file tags in the [1] J. R. Douceur, A. Adya, W. J. Bolosky, P. Simon, and M. Theimer,
database to determine whether there are duplicate files in ‘‘Reclaiming space from duplicate files in a serverless distributed file
the database, and based on the judgment result, the user per- system,’’ in Proc. 22nd Int. Conf. Distrib. Comput. Syst., Vienna, Austria,
2002, pp. 617–624.
forms the next operation. In order to analyze the performance [2] A. Agarwala, P. Singh, and P. K. Atrey, ‘‘DICE: A dual integrity convergent
advantages and disadvantages of this paper’s scheme and encryption protocol for client side secure data deduplication,’’ in Proc.
the comparison scheme in the process of equality test, the IEEE Int. Conf. Syst., Man, Cybern. (SMC), Banff, AB, Canada, Oct. 2017,
pp. 2176–2181.
implementation of storing 1000, 2000, 3000, 4000, 5000 file [3] P. Anderson and L. Zhang, ‘‘Fast and secure laptop backups with encrypted
tags in the database, size of each file is 1MB. Through simu- deduplication,’’ in Proc. 24th Large Installation Syst. Admin. Conf., San
lation analysis, the results obtained are shown in Fig 6. It can jose, CA, USA, 2010, pp. 1–12.
be seen through Fig 6 that the time consumed for equality [4] D. Harnik, B. Pinkas, and A. Shulman-Peleg, ‘‘Side channels in cloud
services: Deduplication in cloud storage,’’ IEEE Security Privacy, vol. 8,
test increases linearly with the increase of the number of no. 6, pp. 40–47, Nov./Dec. 2010.
files in the database. Due to the large number of files in [5] J. Li, X. Chen, M. Li, J. Li, P. P. C. Lee, and W. Lou, ‘‘Secure deduplication
the database, compared to the paper [17] ours scheme has with efficient and reliable convergent key management,’’ IEEE Trans.
Parallel Distrib. Syst., vol. 25, no. 6, pp. 1615–1625, Jun. 2014.
obvious advantages in practical applications. [6] D. Boneh, G. D. Crescenzo, R. Ostrovsky, and G. Persiano, ‘‘Public key
encryption with keyword search,’’ in Proc. Int. Conf. Theory Appl. Crypto-
graph. Techn., C. Cachin and J. Camenisch, Eds. Interlaken, Switzerland,
2004, pp. 506–522.
[7] M. H. Au, J. Chen, J. K. Liu, Y. Mu, D. S. Wong, and G. Yang, ‘‘Malicious
KGC attacks in certificateless cryptography,’’ in Proc. 2nd ACM Symp. Inf.,
Comput. Commun. Secur., New York, NY, USA, 2007, pp. 302–311.
[8] M. Bellare, S. Keelveedhi, and T. Ristenpart, ‘‘Message-locked encryption
and secure deduplication,’’ in Proc. 32nd Annu. Int. Conf. Theory Appl.
Cryptograph. Techn., T. Johansson and P. Q. Nguyen, Eds. Athens, Greece,
2013, pp. 296–312.
[9] S. Keelveedhi, M. Bellare, and T. Ristenpart, ‘‘DupLESS: Server-aided
encryption for deduplicated storage,’’ in Proc. 22th USENIX Secur. Symp.,
S. T. King, Ed. Washington, DC, USA, 2013, pp. 179–194.
[10] M. Abadi, D. Boneh, I. R. A. Mironov, and G. Segev, ‘‘Message-locked
encryption for lock-dependent messages,’’ in Proc. 33rd Annu. Cryptol.
Conf., Santa barbara, CA, USA, 2013, pp. 374–391.
[11] J. Liu, N. Asokan, and B. Pinkas, ‘‘Secure deduplication of encrypted
FIGURE 6. Computational overhead of duplication test for different file
number. data without additional independent servers,’’ in Proc. 22nd ACM
SIGSAC Conf. Comput. Commun. Secur., Denver, CO, USA, Oct. 2015,
pp. 874–885.
[12] P. Puzio, R. Molva, M. Önen, and S. Loureiro, ‘‘PerfectDedup: Secure data
deduplication,’’ in Proc. 10th Int. Workshop, 4th Int. Workshop, Vienna,
V. CONCLUSION
Austria, 2015, pp. 150–166.
Secure data de-duplication is of great value in cloud storage, [13] J. Li, C. Qin, P. P. C. Lee, and J. Li, ‘‘Rekeying for encrypted deduplication
and it can effectively improve the space utilization of cloud storage,’’ in Proc. 46th Annu. IEEE/IFIP Int. Conf. Dependable Syst. Netw.,
storage systems. In this paper, a Secure data de-duplication Toulouse, France, Jun. 2016, pp. 618–629.
[14] M. Li, C. Qin, J. Li, and P. P. C. Lee, ‘‘CDStore: Toward reliable, secure,
and recovery scheme based on PEKS is constructed by using and cost-efficient cloud storage via convergent dispersal,’’ IEEE Internet
the matching relationship between keyword and trapdoor of Comput., vol. 20, no. 3, pp. 45–53, May 2016.
[15] X. Tang, L. Zhou, W. Shan, and D. Liu, ‘‘Threshold re-encryption based LE LI received the B.S. degree from the Xi’an Uni-
secure deduplication method for cloud data with resistance against side versity of Posts and Telecommunications, Xi’an,
channel attack,’’ J. Commun., vol. 41, no. 6, p. 14, 2020. China, in 2020, where he is currently pursuing the
[16] W. Gao, H. Xian, and R. Cheng, ‘‘A Cloud data deduplication method based M.S. degree. His research interest includes cloud
on double-layered encryption and key sharing,’’ Chin. J. Comput., vol. 44, storage security.
no. 11, pp. 2203–2215, 2021.
[17] G. Kan, C. Jin, H. Zhu, Y. Xu, and N. Liu, ‘‘An identity-based proxy
re-encryption for data deduplication in cloud,’’ J. Syst. Archit., vol. 121,
Dec. 2021, Art. no. 102332.
[18] H. Yuan, X. Chen, J. Li, T. Jiang, J. Wang, and R. H. Deng, ‘‘Secure cloud
data deduplication with efficient re-encryption,’’ IEEE Trans. Services
Comput., vol. 15, no. 1, pp. 442–456, Jan. 2022.
[19] M. Blaze, G. Bleumer, and M. Strauss, ‘‘Divertible protocols and atomic
proxy cryptography,’’ in Proc. Int. Conf. Theory Appl. Cryptograph.
Techn., Espoo, Finland, 1998, pp. 127–144.
[20] Y. Lu and J. Li, ‘‘A pairing-free certificate-based proxy re-encryption DONG ZHENG received the M.S. degree in math-
scheme for secure data sharing in public clouds,’’ Future Gener. Comput. ematics from Shaanxi Normal University, Xi’an,
Syst., vol. 62, pp. 140–147, Sep. 2016. China, in 1988, and the Ph.D. degree in communi-
[21] K. O. Agyekum, Q. Xia, E. Sifah, J. Gao, H. Xia, X. Du, and M. Guizani, cation engineering from Xidian University, Xi’an,
‘‘A secured proxy-based data sharing module in IoT environments using in 1999. He was a Professor with the School of
blockchain,’’ Sensors, vol. 19, no. 5, p. 1235, Mar. 2019. Information Security Engineering, Shanghai Jiao
[22] Q. Wang, W. Li, and Z. Qin, ‘‘Proxy re-encryption in access con- Tong University, Shanghai, China. He is currently
trol framework of information-centric networks,’’ IEEE Access, vol. 7, a Professor with the Xi’an University of Posts and
pp. 48417–48429, 2019. Telecommunications and is also connected with
[23] H. Hong and Z. Sun, ‘‘Sharing your privileges securely: A key-insulated the National Engineering Laboratory for Wireless
attribute based proxy re-encryption scheme for IoT,’’ World Wide Web,
Security, Xi’an. His research interests include provable security and new
vol. 21, no. 3, pp. 595–607, 2018.
[24] X. Liu, J. Yan, S. Shan, and R. Wu, ‘‘A blockchain-assisted electronic cryptographic technology.
medical records by using proxy reencryption and multisignature,’’ Secur.
Commun. Netw., vol. 2022, Feb. 2022, Art. no. 6737942.
[25] L. Fang, W. Susilo, C. Ge, and J. Wang, ‘‘Public key encryption with
keyword search secure against keyword guessing attacks without random
Oracle,’’ Inf. Sci., vol. 238, pp. 221–241, Jul. 2013. HAOYU ZHANG is currently pursuing the
[26] Y. Lu, J. Li, and Y. Zhang, ‘‘Secure channel free certificate-based M.S. degree with the Xi’an University of Posts
searchable encryption withstanding outside and inside keyword guessing and Telecommunications. Her research interests
attacks,’’ IEEE Trans. Services Comput., vol. 14, no. 6, pp. 2041–2054, include public key searchable encryption and the
Nov. 2021. SM9 algorithm.
[27] L. F. Guo and W. C. Yau, ‘‘Efficient secure-channel free public key
encryption with keyword search for EMRs in cloud storage,’’ J. Med. Syst.,
vol. 39, p. 11, Feb. 2015.
[28] B. Qin, H. Cui, X. Zheng, and D. Zheng, ‘‘Improved security model for
public-key authenticated encryption with keyword search,’’ in Proc. 15th
Int. Conf., Q. Huang and Y. Yu, Eds. Guangzhou, China, 2021, pp. 19–38.
[29] W. Zhang, B. Qin, X. Dong, and A. Tian, ‘‘Public-key encryption with
bidirectional keyword search and its application to encrypted emails,’’
Comput. Standards Interfaces, vol. 78, Oct. 2021, Art. no. 103542.
[30] B. Chen, L. Wu, S. Zeadally, and D. He, ‘‘Dual-server public-key authen-
BAODONG QIN is currently a Professor with the
ticated encryption with keyword search,’’ IEEE Trans. Cloud Comput.,
Xi’an University of Post and Telecommunications.
vol. 10, no. 1, pp. 322–333, Jan. 2022.
[31] R. Chen, Y. Mu, G. Yang, F. Guo, and X. Wang, ‘‘A new general framework His research interests include cryptography and
for secure public key encryption with keyword search,’’ in Proc. 20th cloud computing security.
Australas. Conf., E. Foo and D. Stebila, Eds. Brisbane, QLD, Australia,
2015, pp. 59–76.
[32] Y. Lu and J. Li, ‘‘Lightweight public key authenticated encryption with
keyword search against adaptively-chosen-targets adversaries for mobile
devices,’’ IEEE Trans. Mobile Comput., vol. 21, no. 12, pp. 4397–4409,
Dec. 2022.