IEEE2023 Data Secure De-Duplication and Recovery Based On Public Key Encryption With Keyword Search

This document describes a scheme for secure data de-duplication and recovery in cloud storage. It uses public key encryption with keyword search to detect duplicate files by matching keywords in ciphertexts. Proxy re-encryption is used to allow data recovery. For de-duplication, the data owner uploads an encrypted file, tag, and re-encryption key to the cloud server. The tag points to the file ciphertext. For duplicate files, the re-encryption key is appended to the stored ciphertext chain to allow recovery of the file by decrypting a transformed ciphertext. Security and efficiency are analyzed through experiments.

Uploaded by

balavinmail

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

249 views11 pages

IEEE2023 Data Secure De-Duplication and Recovery Based On Public Key Encryption With Keyword Search

Uploaded by

balavinmail

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Received 19 February 2023, accepted 26 February 2023, date of publication 2 March 2023, date of current version 24 March 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3251370

Data Secure De-Duplication and Recovery Based

on Public Key Encryption With Keyword Search
LE LI 1, DONG ZHENG 1,2 , HAOYU ZHANG 1, AND BAODONG QIN 1
1 School of Cyberspace Security, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
2 School of Computer, Qinghai Normal University, Xining 810008, China

Corresponding author: Baodong Qin ([email protected])

This work was supported in part by the Basic Research Program of Qinghai Province under Grant 2020-ZJ-701, and in part by the National
Natural Science Foundation of China under Grant 62072207.

ABSTRACT In the current era of information explosion, users’ demand for data storage is increasing, and
data on the cloud has become the first choice of users and enterprises. Cloud storage facilitates users to
backup and share data, effectively reducing users’ storage expenses. As the duplicate data of different users
are stored multiple times, leading to a sudden decrease in storage utilization of cloud servers. Data stored in
plaintext form can directly remove duplicate data, while cloud servers are semi-trusted and usually need
to store data after encryption to protect user privacy. In this paper, we focus on how to achieve secure
de-duplication and recover data in ciphertext for different users, and determine whether the indexes of public
key searchable encryption and the matching relationship of trapdoor are equal in ciphertext to achieve secure
de-duplication. For the duplicate file, the data user’s re-encryption key about the file is appended to the
ciphertext chain table of the stored copy. The cloud server uses the re-encryption key to generate the specified
transformed ciphertext, and the data user decrypts the transformed ciphertext by its private key to recover
the file. The proposed scheme is secure and efficient through security analysis and experimental simulation
analysis.

INDEX TERMS PEKS, secure de-duplication, proxy re-encryption, data recovery.

I. INTRODUCTION privacy leaks and sensitive data leaks have emerged, and even
As a major service provided by cloud computing technology, more, there are CSP through the sale of user data to achieve
cloud storage enables users to backup and share data eas- corporate profits. The issue of data security in cloud storage
ily and quickly, which can efficiently reduce users’ storage deserves widespread attention.
expenses and improve work efficiency. With the increasing Big data and cloud computing are developing rapidly, with
maturity of cloud computing technology. There are many an explosion of data from users around the world, resulting
Cloud Service Providers (CSP) in the market, such as Baidu in a dramatic increase in demand for cloud servers. An effec-
Cloud, Amazon Cloud, and other famous CSPs. Users will tive solution to the need for storage of massive amounts of
upload and store their confidential data to the data storage data will be deduplicate data. For plaintext data, the equality
center of the cloud server, which is managed and maintained test can be achieved by direct comparison, while user data
by the CSP, but with this comes the frequent occurrence involves user’s personal privacy, and uploading or storing
of cloud computing security issues. For enterprises or indi- it in plaintext form to cloud servers can cause user privacy
vidual users will be personal files, business contracts, user leakage. Encrypting data can protect user privacy effectively.
transaction records, environmental geographic data, and other In practical scenarios, different users use different keys for
susceptible data stored in the cloud server. However, user encrypting files, and there are random parameters in the
encryption, then the ciphertext generated from the same file
The associate editor coordinating the review of this manuscript and is different. By directly comparing ciphertexts, we cannot
approving it for publication was Tyson Brooks . determine the duplicity of files, and cloud servers will store
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
28688 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 11, 2023
L. Li et al.: Data Secure De-Duplication and Recovery Based on Public Key Encryption With Keyword Search

multiple encrypted copies of the same file, which will put transformation of traditional industries. How to effectively
huge storage pressure on cloud servers. Therefore, there is improve the space utilization of cloud storage is one of the
an urgent need to design a secure de-duplication scheme for problems that cloud storage needs to solve urgently. A large
encrypted data with different keys in multi-user scenarios. of duplicate data exists in the massive data being stored on the
Currently, convergent encryption [1] is widely used to cloud server many times. The main ideas to solve the problem
construct secure data de-duplication systems, but convergent of deduplicated data storage are to improve the compression
encryption also faces various dangers such as data leakage, rate of stored files and to secure data de-duplication. This
faking attacks and chosen-plaintext attacks [2], [3], [4]. Since paper focuses on how to achieve secure data de-duplication
the encryption key used in convergent encryption is generated in a multi-user environment.
by the hash value of user’s data file, multiple files of the Secure data de-duplication can be classified into file-level
same user will generate multiple different keys, thus causing de-duplication and block-level de-duplication according to
a key management problem [5]. The operations of encryption the granularity of de-duplication, client de-duplication and
and de-duplication of data affect each other. Encrypting data server de-duplication according to the de-duplication entity.
with the same key by different users will generate the same In this paper, we focus on file-level secure de-duplication
ciphertext. Secure data de-duplication is achieved by directly on the server. To protect the privacy of users, server
comparing ciphertexts with each other, but this will cause the de-duplication requires users to encrypt data files before
problem of key management. If different users use different uploading them to cloud servers. Doucear et al. [1] pro-
keys to encrypt data, the key management problem can be posed convergent encryption, which can effectively balance
effectively reduced, but it is difficult to achieve equality test. data de-duplication and data encryption to achieve secure
Therefore, how different users can encrypt data with the same de-duplication in the ciphertext. It computes the hash value of
key without communicating with each other, thus producing files as the key of encrypted files.The same file will generate
the same ciphertext after encrypting the same data, and how the same key, and encrypting the same file with the same key
users can recover their data are the main research directions will generate the same ciphertext, thus realizing the direct
of this paper. comparison of the duplicity of files in the ciphertext state.
In this paper, Public Key Encryption with Keyword Through this mechanism, we can see that the key generated
Search (PEKS) is used to detect file duplicates by matching using the file does not have randomness, and each file will
keywords with trapdoors, and Proxy Re-encryption (PRE) is generate a key, which will lead to the problem of key man-
used for data recovery [6]. The scheme is mainly divided agement. Bellare et al. [8] designed the variant algorithms
into data de-duplication and data recovery. For the data HCE1, HCE2, and HCE3 of convergent encryption by ana-
de-duplication, the data owner needs to upload the file cipher- lyzing the security of convergent encryption [1] to improve
text, file tag, and re-encryption key to the cloud server, the file the security and efficiency of convergent encryption. Con-
tag points to the file ciphertext, and the re-encryption key is vergent encryption uses the file hash value as the encryption
stored in the corresponding ciphertext chain table. When the key and determines directly whether the file is duplicated
test result is True, it means that the file is already stored in the by the ciphertext. Message Locking Encryption (MLE) is a
server. The data user does not need to upload the ciphertext further improvement to convergent encryption. MLE gener-
again but only needs to upload the re-encryption key of the ates a tag for the file for de-duplication. MLE does not rely
specified file to the corresponding ciphertext chain, which exclusively on the file hash to generate the file encryption
can effectively reduce the storage overhead of the cloud key. The encryption key is generated after mapping the file
server. When the test result is False, it means that the file is by a deterministic function, which is not resistant to brute
not stored in the server, and the data owner needs to upload force attacks on predictable information. The encryption key
the ciphertext, file tag, and re-encryption key. Regarding data and the tag are independent and they are not related in any
recovery, users only need to store the user key locally, not way. To solve the problem, Keelveedh et al. [9] also pro-
the file key for each file. The user key can be generated only posed DupLESS server-assisted secure data de-duplication
locally without introducing a key generation center (KGC) scheme, which effectively improves the randomness of deter-
to avoid key substitution attacks, malicious KGC attacks [7]. ministic ciphertexts. Abadi et al. [10] constructed MLE2 for
The user initiates a request to the cloud server to obtain the lock-dependent messages based on MLE to improve the
file, and the cloud server uses the re-encryption key of the security of data de-duplication. Liu et al. [11] constructed a
user in the ciphertext chain table to re-encrypt and generates secure data de-duplication system based on a key exchange
the transformed ciphertext. The transformed ciphertext is sent protocol without relying on an additional server, but the
to the user, and the user can decrypt the file using his private system has a large communication overhead and compu-
key. tational overhead and requires most users to be online
at the same time. Puzio et al. [12] proposed PerfectDedup
A. RELATED WORKS to perform secure de-duplication based on the popularity
The continuous development of cloud storage technology has of data, combined with the property of perfect hashing
brought new opportunities to many industries, and data on to ensure the confidentiality of data. Li et al. [13] pro-
the cloud has become the immediate need for the digital posed a rekeying-aware encrypted de-duplication storage
VOLUME 11, 2023 28689
L. Li et al.: Data Secure De-Duplication and Recovery Based on Public Key Encryption With Keyword Search

system, where the data owner only needs to re-encrypt dual-server public keys for the generation of keyword
part of the message using convergent all-or-nothing trans- ciphertexts and trapdoors but also the public keys between
form (CAONT) to achieve secure de-duplication and effec- users to ensure that only authenticated users can search the
tively reduce the computational overhead of the system. ciphertext. Lu et al. [32] devise a lightweight public key
Li et al. [14] proposed CDStore, an enhanced secret shar- authenticated encryption with keyword search scheme, which
ing scheme based on convergent encryption which takes is suitable for the resource-constrained mobile devices.
deterministic hash values as the input of secret sharing and
supports de-duplication. Tang et al. [15] proposed a secure B. OUR CONTRIBUTION
de-duplication scheme based on threshold re-encryption, In this paper, we construct a secure data de-duplication and
which can resist side-channel attacks while effectively reduc- recovery scheme based on public key searchable encryption
ing computational overhead. Gao et al. [16] proposed a by combining public key searchable encryption and proxy
secure de-duplication scheme without trusted third par- re-encryption. The contributions of this paper are specified
ties, with hierarchical encryption based on prevalence and as follows:
privacy. Kan et al. [17] proposed an identity-based proxy 1) A secure data de-duplication scheme based on pub-
re-encryption scheme to achieve secure data de-duplication lic encryption with keyword search is constructed
and recovery by combining data de-duplication and user to realize secure data de-duplication in a multi-user
access privileges to achieve a complete de-duplication. environment, which can effectively save the storage
Yuan et al. [18] found that REED [14] has a stub-reserved space of cloud servers. The scheme in this paper
attack problem and constructed a new secure de-duplication is a server de-duplication, which can achieve secure
algorithm using CAONT and Bloom filter to resist stub- de-duplication without users online, and its application
reserved attack. scenario is more flexible.
PRE was first proposed by Blaze et al. [19], PRE enables 2) This paper uses proxy re-encryption to achieve user
data sharing without revealing the data owner’s key. Using data recovery. The server uses the re-encryption key
PRE, a proxy server can transform the ciphertext encrypted stored in the ciphertext chain table for re-encryption,
using the data owner’s key to generate a transformed cipher- and the user can decrypt and recover the files using only
text that can decrypt by the data user’s key, thus protecting his private key, so that there is no need to save each file
the data owner’s key while enabling data sharing. Lu and key, which can effectively reduce the key management
Li [20] proposed a pairing-free proxy re-encryption scheme problem.
that can meet the application requirements of devices with 3) The only entities involved in the interaction are users
limited computing power. According to the practical applica- and cloud servers, and no KGC is introduced, so that
tion scenarios, the PRE scheme applicable to the IoT, cloud key substitution attacks and malicious KGC attacks
computing, and other [21], [22], [23] scenarios is proposed. can be effectively avoided. Meanwhile, for malicious
The application of electronic medical records in medical servers that can obtain file tags and file ciphertexts of
institutions has problems such as information leakage, and arbitrary files, it can achieve one-wayness under the
Liu et al. [24]. used proxy re-encryption and sequential multi- chosen file attack.
signature to solve the problem. 4) Massive data are storaged in the cloud server, so the
PEKS was first proposed by Boneh et al. [6] to support efficiency of equality test in the overall de-duplication
the server to search the ciphertext without knowing the process is critical, through experimental simulation
plaintext message. Fang et al. [25] constructed PEKS based analysis, for the storage of 5000 files in the database,
on the standard security model to resist keyword guess- the realization of safe de-duplication takes 42.86s,
ing attacks. Lu et al. [26] that certificate-based search- about one-third of the time consumed by the paper [17].
able encryption not only resists keyword guessing attacks,
but also supports implicit authentication, no secure chan- C. ORGANIZATION
nel. Guo and Yau [27] proposed PEKS that can satisfy the The rest of the paper is organized as follows. Section II
Indistinguishability of trapdoors. Qin et al. [28] introduced presents the algorithm and specific design. Section III ana-
an improved CI model that enables public key authenti- lyzes the security of the scheme. Section IV simulates the
cated encryption with keyword retrieval to resist fully cho- scheme to analyze its performance. Finally, a conclusion is
sen keyword to ciphertext-keyword attacks in a multi-user presented in Section V.
environment. Zhang et al. [29] first proposed public key
encryption with bidirectional keyword search, which has II. ALGORITHM AND SCHEME DESIGN
practical applications in various scenarios such as email A. SYSTEM MODEL
systems. Chen et al. [30], inspired by the Diffie-Hellman As shown in Fig 1, the entities involved in this scheme include
Exchange algorithm constructed a dual-server public-key data user, CSP, and these entities are described below.
authenticated encryption with keyword search scheme based Data user: The data user encrypts the file and uploads it to
on Chen et al. [31], where the system requires not only the CSP to ensure the user’s privacy. The user is distinguished