0% found this document useful (0 votes)
20 views17 pages

Li2019 Article AMethodForAchievingProvableDat

This paper presents a novel method for ensuring provable data integrity (PDI) in cloud computing environments, addressing the challenges posed by untrusted service providers. The proposed method is efficient in generating verification metadata, supports dynamic data operations, and allows for public verification by third-party auditors. Extensive experiments demonstrate that this method achieves high efficiency while maintaining security in data integrity verification.

Uploaded by

swathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views17 pages

Li2019 Article AMethodForAchievingProvableDat

This paper presents a novel method for ensuring provable data integrity (PDI) in cloud computing environments, addressing the challenges posed by untrusted service providers. The proposed method is efficient in generating verification metadata, supports dynamic data operations, and allows for public verification by third-party auditors. Extensive experiments demonstrate that this method achieves high efficiency while maintaining security in data integrity verification.

Uploaded by

swathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

J Supercomput (2019) 75:92–108

https://doi.org/10.1007/s11227-015-1598-2

A method for achieving provable data integrity


in cloud computing

Aiping Li1 · Shuang Tan1 · Yan Jia1

Published online: 9 January 2016


© Springer Science+Business Media New York 2016

Abstract By storing data files in the cloud, users can make use of cloud computing
techniques, thereby reducing hardware investment and simplifying data management.
However, ensuring data integrity when using unreliable service providers is a problem.
In this paper, we propose a novel method for provable data integrity (PDI) aimed at
clients with data stored in untrusted servers in cloud computing environments. An
advantage of this model is the low client cost since a constant amount of metadata
is generated. Based on a bilinear group, we propose a simple, efficient audit service
for public verification of untrusted outsourced storage. Compared with existing PDI
methods, we aim to resolve this issue by considering the cost of generating verification
metadata at the client. Furthermore, our method supports data dynamics and public
verifiability. Finally, the results of extensive experiments conducted to evaluate the
performance of our method demonstrate that our method achieves high efficiency.

Keywords Cloud security · Provable data integrity · Third party auditor · Provable
data possession · Proof of retrievability

1 Introduction

As we know, cloud refers to computing resources, hardware and/or software, residing


on a remote machine, that are delivered to the end user as a service over a network,
the most pervasive example of which is the Internet. Cloud computing is a model
for enabling convenient, on-demand network access to a shared pool of configurable
computing resources (for example, networks, servers, storage, applications, and ser-

B Aiping Li
[email protected]

1 Computer School, National University of Defense Technology, Changsha 410073, China

123
A method for achieving provable data integrity… 93

vices) that can be rapidly provisioned and released with minimal management effort or
service provider interaction [9]. Several vendors offer products to act as management
front ends for public clouds; for example, Amazon, whose products act as brokers and
management consoles for your application deployed over the Amazon cloud offering.
Cloud computing has introduced many challenges to the security and performance
of the overall system. One of the biggest concerns is that the integrity of the data in the
cloud cannot be effectively guaranteed for the following reasons [7]. First, owing to
users loss of control over data in cloud computing, traditional cryptographic checking
methods cannot be used directly to protect data security. Therefore, the problem of
verifying the integrity of the data in the cloud becomes even more challenging. Sec-
ond, cloud storage services, which periodically have to deal with both software and
hardware failures, may choose to hide the presence of data errors for their own benefit.
Finally, to save money or storage space, cloud service providers may use an offline
method to store rarely accessed data files, or even deliberately delete such data for
reclaiming. As a result, effective protection of the correctness of outsourced data is
lacking under the existing cloud storage models. It is thus natural for users to want to
find efficient ways of performing periodic integrity verifications without a local copy
of data files, over a prolonged period of time.
Many researchers have proposed some methods to ensure remote data integrity in
cloud computing environments. In these works, the implications of integrity checking
protocols satisfy the following varied requirements: (i) stateless verification C the
verifier does not need to maintain and update any state during each checking phase
since such a state is difficult to maintain in complex cloud computing environments;
(ii) unbounded use of queries C considering the large size of the outsourced data
and the users constrained resource capability, the scheme should allow the verifier to
submit unbounded queries on the number of audit-protocol interactions; (iii) public
verification C the client can resort to an independent third party auditor (TPA) to
verify the correctness of data stored in the cloud on behalf of himself through public
verification; (iv) dynamic operation support C the outsourced data might not only
be accessed but also updated by the users, thereby supporting dynamic operation,
which is vital to the practical application of storage outsourcing services; and (v)
batch auditing C to improve audit efficiency, the TPA can simultaneously handle
multiple tasks from different clients. Through these techniques, we can use a checking
scheme to ensure correctness of data in the cloud, but cannot assure that the designed
scheme is highly efficient. Considering various kinds of access devices, e.g., mobile
devices, the client does not wish to allocate a great deal of computational resources
to both the initialization and verification phases when executing an auditing protocol.
Unfortunately, existing work focuses mainly on how to carry out an audit efficiently
in the verification phase while the cost of the initialization phase has received little
attention until now.
We propose a simple yet efficient method to ensure the integrity of users data in
this paper. Based on some desirable properties of a finite group, our scheme achieves
better performance in reducing the users cost in the initialization phase. Usually, an
exponentiation operation in the bilinear group is more expensive than other normal
operations, e.g., addition, multiplication. Thus, we attempt to substitute certain expo-

123
94 A. Li et al.

nentiation operations with normal operations, which will greatly reduce the cost of
generating verification metadata at the client.
Our study considered the cost of the initialization phase when executing an auditing
protocol in this filed. Our contributions are summarized as follows. Firstly, compared
with existing work, which only considers auditing efficiency at the verifiers side, the
challenge-response protocol in our work is also highly efficient during initialization of
the checking protocol. Secondly, similar to most prior work on ensuring remote data
integrity, the new scheme also supports efficient dynamic operations on data blocks,
e.g., modification, insertion, and deletion; however, our scheme attempts to reduce
the storage cost of the verifier by introducing an improved index-hash table. Finally,
extensive security and performance analyses show that the proposed scheme is highly
efficient and secure.
The rest of the paper is organized as follows. Related words are summarized in
Sect. 2. Section 3 introduce the preliminaries of our scheme. In Sect. 4, we explain the
system and threat models, while Sect. 5 presents the detailed design of our scheme,
with the analysis of the security of our scheme given in Sect. 6. Finally, results of
performance studies are discussed in Sects. 7 and 8 concludes the paper.

2 Related work

Juels and Kaliski first studied a proof of retrievability (POR) [7] scheme where
spot-checking and error-correcting codes are used to ensure the possession and irre-
trievability of data files on remote nodes. Unfortunately, this scheme can only handle
a limited number of queries and does not support public auditability.
Ateniese et al. [2] were the first to consider public auditability in their defined prov-
able data possession (PDP) model for ensuring possession of data files on untrusted
storage servers. Their scheme uses the RSA-based homomorphic linear authenticator
for auditing outsourced data. However, this scheme can only handle static storage files
because of its dependence on an index of blocks. In their subsequent work, Tan et al.
[12] proposed a dynamic version of their original PDP scheme,using homomorphic
linear authenticators that aggregate n signatures into one signature, and reduce the cost
of communication. However, this Ateniese protocol does not support fully dynamic
data operations, i.e., it only allows very basic block operations with limited function-
ality and block insertions are not supported. To solve this problem and support fully
dynamic operations, Erway et al. [6] introduced a dynamic PDP scheme using a skip
list to realize communication and computation for a file consisting of blocks.
Based on the works of Juels and Kaliski [8] and Ateniese et al. [2], Shacham
and Waters [10] proposed an improved POR scheme constructed from Boneh–Lynn–
Shacham (BLS) signatures [3] with full proofs of security provided by the security
model defined in [8]. Similar to the construction in [2], this scheme uses publicly
verifiable homomorphic linear authenticators that aggregate signatures into a single
signature, thereby reducing the cost of communication. However, this protocol does
not support dynamic operations either.
In other related works, various POR schemes and models [5,13,14,16] have
explored different variants of POR with private auditability. Wang et al. [14] proposed

123
A method for achieving provable data integrity… 95

a dynamic PDP scheme for cloud storage, while Zhu et al. [17] proposed the first
interactive POR scheme to prevent fraudulence of the prover and leakage of verified
data.

3 Background

We present problem definition and necessary preliminaries in this section.


Definition 1 Bilinear maps [4].
Our method is based on a bilinear map, which has often been called a pairing in
other works. Typically, the pairings’ construction is built on a super-singular elliptic
curve or abelian variety, such as a Weil or Tate pairing. Here, we simply describe the
properties of bilinear maps and their related deductions.
Let G, G T be two multiplicative cyclic groups of prime order p. A bilinear map is
a map e : G × G → G T with the following properties:
• Bilinear e(Q a , R b ) = e(Q, R)ab for all Q, R ∈ G and a, b ∈ Z p .
• Non-degenerate e(Q, R) = 1 for some Q, R ∈ G.
• Computable we can always find an efficiently computable algorithm for computing
map e.
Other useful conclusions:
• For some O, R ∈ G, we have e(Q, R) = e(R, Q).
• For a ∈ Z p and Q, R ∈ G, we have e(Q a , R) = e(Q, R a ).

Definition 2 Computational Diffie–Hellman (CDH) problem [12].


Given P, P a , P b ∈ G, as well as an admissible pairing e : G × G → G T , compute
ab
P .
The CDH assumption holds in G, if no polynomial-time algorithm has an advantage
of at least ∈ in solving the CDH problem in G, which means it is computationally
infeasible to solve the CDH problem in G.

Definition 3 Compact proofs of retrievability (CPOR) [11].


Given a data file F, the client splits it into n blocks F = (m 1 , . . . , m n ) ∈ Z p for
some large prime p. Let e : G × G → G T be a bilinear map as described above,g be a
generator of G, and H : {0, 1}∗ → G be a BLS hash function. The clients secret key is
R
x← − Z p , and the public key is computed with v = g x . For each block i ∈ [1, n], the
signature on the block m i is σi = [H (i)u m
i ] x . Upon receiving request chal = {(i, v )},
 i
the prover computes and sends back σ = (i,vi )∈chal σivi and μ = (i,vi )∈chal vi · m i .
The verifier checks proof {σ, μ} using the following equation:
⎛ ⎞
de f 
e(σ, g) = e ⎝ H (i)iv · u , v⎠ .
u
(1)
(i,vi )∈chal

Obviously,the CPOR are constructed based on BLS signatures.

123
96 A. Li et al.

Fig. 1 Data fragment structure

Tradeoff between storage and communication In [10], Shacham and Waters introduced
a parameter that gives a tradeoff between storage overhead and response length. As
shown in Fig.1, the client first splits file F into n blocks, with each block m i consisting
of s ≥ 1 sectors, {m i1 , . . . , m is } ∈ Z p , where the sector size is the same as the block
size as described in Definition 3. As each block only has one signature, we can reduce
the storage overhead of the verification metadata to 1/s of that of the scheme described
in Definition 3. However, the length of the servers response is about (1 + s) greater
than before. If s = 1, the scheme has the minimum communication cost and maximum
storage overhead.
The other impact of parameter s is that its value affects the efficiency of the schemes
initialization phase. The greater the value of s, the fewer blocks we have, and the less
time we need to spend generating verification metadata during the initialization phase
of the checking scheme. Moreover, with the introduction of parameter s, the signature
m
on block m i changes to σi = [H (i) · sj=1 u j i j ]x . Obviously, the computation cost
of the signature on this block is higher than before (as described in Definition 3). As
G is a multiplicative cyclic group of prime order p, we can find another generator
u ∈ G s
and have u j = u α j for each u j ∈ G , where α j ∈ G . As a result, we can

use u j=1 α j ·m i j to replace the part sj=1 u j i j while computing σi , thereby greatly
m

reducing the computation cost of the signature on the block at the client side.

4 Problem statement

4.1 System model

The data storage model in a cloud computing environment, illustrated in Fig. 2, consists
of three entities: the cloud user (U), who can customize the cloud service and has many
data files to store in the cloud; the cloud service provider (CSP), who provides a storage

123
A method for achieving provable data integrity… 97

Fig. 2 Data storage model

or computing service for the users, and has great computing power and vast storage
space; and the third party auditor (TPA), who has vast experience in auditing, and
performs audit tasks on behalf of users. The TPA can greatly alleviate the auditing
cost of users.

4.2 Threat model

Shacham and Waters proposed a security model for checking a system in [10]; cur-
rently, there exists no polynomial-time algorithm that can cheat the verifier with
non-negligible probability. Under the definition of this security model, the client or
verifier periodically challenges the storage server to determine whether the data stored
in the cloud are intact. In general, there are two kinds of threats related to the integrity
of data in the cloud. First, the adversary can forge a signature for each block, and
then return the forged proof, which can pass the checking mechanism. Second, the
adversary can forge false aggregated information about the real-time data block.

4.3 Design goals

To design an efficient public checking scheme for cloud data storage under the afore-
mentioned model, our protocol should achieve the following security and performance
guarantees.
• Public auditability our scheme should allow the TPA to verify the integrity of the
cloud data on behalf of the users.
• Storage correctness there is no cheating cloud server that can pass the TPAs veri-
fication without storing all the user data.
• Dynamic operation our scheme allows users to perform block-level operations on
the outsourced data, and ensures the correctness of the updated file.

123
98 A. Li et al.

• Batch auditing our scheme allows the TPA to handle multiple auditing tasks from
different users concurrently.
• Lightweight considering the constrained computation capability of users access
devices, our scheme allows the user to carry out the initialization with the minimum
computation overhead.

5 Implementation

5.1 Definition and framework

A public checking scheme is a collection of four polynomial-time algorithms


(K eyGen, SigGen, Gen Pr oo f, Check, Pr oo f ), the details of which are given
below.
K eyGen(1λ ) → ( pk, sk). This is a probabilistic key generation algorithm that is
executed by the client. The input is a security parameter λ, and the outputs are a public
key pk and secret key sk.
SigGen( pk, sk, F) → Φ. This algorithm generates the verification metadata of
data file F at the client. The inputs are the private key sk, and a file F, which is
an ordered collection of blocks {m i }. The output is the signature set Φ, which is an
ordered collection of signatures {σi } on {m i }.
Gen Pr oo f (F, Φ, chal) → P. This algorithm is executed by the server. The inputs
are a file F, its signatures {σi }1≤i≤n , and a challenge chal. According to the specified
block index in chal, the output is a data integrity proof P.
Check Pr oo f ( pk, sk, chal, P) → {“0”, “1”}. This algorithm is executed by the
client. The TPA verifies the proof of stored data correctness. The inputs are the public
key pk, challenge chal, and proof P. The algorithm returns “1” if the integrity of the
data file is intact, otherwise “0”.
Running a public checking protocol consists of two phases, Setup and Check,
explained below.
Setup The client first invokes K eyGen(·) to initialize the secret key and public key.
Then, the verification metadata Φ is generated by executing SigGen(·). If we want
our protocol to be fault-tolerant, the raw data file should be pre-processed with error-
correcting code before invoking procedure SigGen(·). Finally, the client stores data
file F and the verification metadata Φ in the cloud server, and removes these from the
local memory.
Check The TPA sends a challenge message chal to the cloud server to verify the
integrity of data file F. According to chal, the server computes an integrity proof
from a function of the stored data file F and returns the verification metadata by
invoking Gen Pr oo f (·). Upon receipt of the proof, the TPA verifies it and determines
whether the file is correct.
5.2 Scheme details

Let G and G T be multiplicative groups of order p, g be a generator of G, G ×G → G T


be a bilinear map, and H be a hash function, H : {0, 1}∗ → G , which maps strings
uniformly to G.

123
A method for achieving provable data integrity… 99

During the Setup stage, the user first generates a random key pair {sk, pk} by
R
executing K eyGen, where sk = x ← − Z p , pk = g x . Then, F is split into n blocks,
and each block is further split into s sectors. Therefore, data file F can be presented
as follows:
⎛−→⎞ ⎛m . . . m ⎞
m 1 11 1s
⎜ ⎟ ⎜ ⎟
F = ⎝ ... ⎠ = ⎝ ... . . . ... ⎠ ∈ Z n×s p .


m m ... m
n n1 ns

R
Now, the client chooses s random elements, α1 , . . . , αs ←− Z p . and computes
μi = giα ∈ G. For each m i , 1 ≤ i ≤ n , the client runs SigGen to compute:
s
σi = (H (idi ) · g j=1 α j ·m i j )sk , 1 ≤ i ≤ n (2)

where idi = f ilename||i, which is the identifier of block m i . The set of tags is
denoted by Φ = {σi }1≤i≤n . After computing the files signature set, the client sends F
together with Φ to the cloud, and deletes these from the local storage. In addition, the
client sends the verification information { f ilename, n, u 1 , . . . , u s } to the TPA.
During the Check stage, the TPA verifies the integrity of file F by interacting with
the cloud server. Before issuing the challenge, he must first generate the challenge
request chal, which specifies the positions of the blocks to be checked in this challenge
phase. The TPA randomly picks a c-element subset I = {s1 , . . . , sc } from set[1, n].
For each i ∈ I , the TPA chooses a random element vi ← Z p . The verifier then sends
chal = {(i, vi )s1 ≤i≤sc } to the cloud server.
Upon receiving the request chal = {(i, vi )s1 ≤i≤sc } from the TPA, the cloud server
runs Gen Pr oo f to generate the proof of stored data correctness. Concretely speak-
ing, the proof consists of two parts: the aggregated signature value and the linear
combination of the specified blocks, the details of which are given as:

sc 
sc
μi = vi · m i j , σ = σivi (1 ≤ j ≤ s). (3)
i=s1 i=s1

The cloud server sets ({μ j }sj=1 , σ ) as a response proof and returns it to the TPA.
On receiving the response, the TPA runs V eri f y Pr oo f to validate the response by
executing ⎛ ⎞
de f  
s
μ
e(σ, g) = e ⎝ H (idi )vi · u j j , v⎠ . (4)
(i,vi )∈chal j=1

6 Security analysis

This section discusses the security of our scheme, including its correctness and
unforgeability.

123
100 A. Li et al.

Correctness The correctness of our scheme is corrected only if Eq. (4) is true. The
proof is obvious and, therefore, omitted.
Unforgeability It is computational infeasible for an untrusted cloud or adversary to
forge a proof that can pass the verification in our scheme, except by responding with
correct values σ and {μ j }s1 ≤ j≤sc .
Proof If one attacker can pass our scheme, then he can forge an uncorrected proof
that passes the verification Eq. (4). In other words, the attacker responds with an
uncorrected value of {μ j }s1 ≤ j≤sc or σ , or both. Next, we use a series of games to
prove that these situations can never occur in our scheme.
Game 1 In this game, we wish to prove that no adversary can forge σ under a different
value of {μ j }s1 ≤ j≤sc or σ . Let P = ({μ j }s1 ≤ j≤sc , σ }) be the expected response that
would have been obtained from an honest prover. By the correctness of our scheme,
we know that the expected response satisfies the verification equation given as:
⎛ ⎞
 
s
μ
e(σ, g) = e ⎝ H (idi )vi · u j j , v⎠ . (5)
(i,vi )∈chal j=1

Assume the adversary response is P = ({μ j }, σ ), which also passes Eq. (4) as
follows: ⎛ ⎞
 
s
μ
e(σ , g) = e ⎝ H (idi )vi · u j j , v⎠ . (6)
(i,vi )∈chal j=1

Obviously, if μ j = μ j , we obtain σ = σ , which contradicts our assumption in


this game. Thus, there exists at least one value μ j that is different from μ j . For each
1 ≤ j ≤ s, we define Δμ j = μ j − μ j . With the help of this adversary, we can
construct a simulator to solve the CDH problem.
Given g, g α , h , the simulator outputs the value of h α . Assume A is a forger that
(t, q H , q S , )-breaks the signature scheme. We construct a simulator B that (τ, t ,  )-
breaks the CDH problem.
Setup B executes algorithm K eyGen to obtain a public key pk = g α and a private
key sk = α. The adversary A is given pk.
Query on oracle H O The adversary A can query the random oracle H at any time. To
respond to these queries, B keeps a list of tuples idi , w, b, c as explained below. We
refer to this list as the L 1 -list, which is initially empty. When a tuple idi , w, b, c is
submitted to the H O oracle, B responds as follows:
• If query idi already appears in list L 1 in some tuple idi , w, b, c , then B responds
with H (idi ) = w ∈ G.
• Otherwise, B generates a random coin c ∈ {0, 1}, so that Pr [c = 0] = 1/(qs + 1).
• B picks a random b ∈ Z p . If c = 0 holds, B computes w ← h · g b ∈ G 1 . If c = 1
holds, B computes w ← g b ∈ G 1 .
• B stores tuple idi , w, b, c in list L 1 and responds to A with H (idi ) = w.

123
A method for achieving provable data integrity… 101

Query on oracle Sign O A requests a signature on some data block m i under the
challenge key v. Algorithm B responds to this query as follows:
• B runs oracle H on m i to obtain the corresponding tuple idi , w, b, c in list L 1 .
If c = 0 holds, B reports failure and terminates.
• Otherwise, B obtains the value w = g b . Let σi = g αb · g γi α = (g α )bi +γi ∈ G.
Obviously, σ is a valid signature on m i under public key v.
Output Finally, Algorithm A halts, either conceding failure, in which case so does B,
or returning a forgery proof ({μ j }1≤ j≤s , σ ) to B. Algorithm B proceeds only if c = 0;
otherwise B declares failure and halts. Since c = 0, it follows that H (name, idi ) =
h · g b . Then, the verification of ({μ j }1≤ j≤s , σ ) is given by Eq. (5) , and dividing this
verification equation by verification Eq. (6) , we obtain
⎛ ⎞

v
 s
Δμ
e(σ /σ, g) = e ⎝h (i,vi )∈Q i · u j j , v⎠ . (7)
j=1

Rearranging terms, we can obtain


 s
(i,vi )∈Q vi j=1 α j μ j
e(σ σ −1 , g) = e(h, v) ·e g ,v
s 
(i,vi )∈Q vi
⇒ e σ · σ −1 · v − j=1 α j μ j , g = e(h α , g)

where v = g α . This gives a solution to solve the CDH problem in G.


s  1
h α = σ · σ −1 · v − j=1 α j μ j (i,vi )∈Q vi
. (8)

Thus, if any adversary can pass Eq. (4) with a forged value σ , then he can solve the
CDH problem in G.
Analysis Finally, we show that B solves the given instance of the CDH problem with
probability of at least  . First, we analyze the three events needed for A to succeed:
Σ1 B does not abort as a result of any of A’s signature queries.
Σ2 A generates a valid and nontrivial signature forgery {name, idi , m i , σ i }.
Σ3 Event Σ2 occurs and c = 0. Here,c is the c-component of the tuple in list L 1 .
Algorithm B succeeds if all these events occur. The probability Pr [Σ1 ∧ Σ2 ∧ Σ3 ] is
decomposed as Pr [Σ1 ∧Σ2 ∧Σ3 ] = Pr [Σ1 ]· Pr [Σ2 |Σ1 ]· Pr [Σ3 |Σ2 ∧Σ1 ]. Accord-
ing to [14], we know that B’s success probability is about  = (1 − q S1+1 )q S · q S1+1 · .
B’s running time is the same as A’s running time plus the time it takes to respond to q H
hash queries and q S signature queries, and the time to transform A’s final forgery into
the CDH solution. The output phase requires at most one additional hash computation,
two inversions, one exponentiation, and multiplications, and incorporates Z p . Hence,
the total running time which is at most t + cG (q H + q S + 4) + s ≤ t is required
(Fig. 3).
Game 2 Game 2 is the same as Game 1, with one difference. Suppose the file that
causes the abort is one of {μ j }1≤ j≤s .

123
102 A. Li et al.

ι
Fig. 3 Index-hash table e, where i = H (Bi , Vi , Ri ) and ι is an updated key

7 Performance evaluation

7.1 Computation cost

The main cryptographic operations include addition, multiplication, exponentiation,


pairing, and hashing operations. Suppose there are n blocks in data file F and c
random blocks specified in challenge chal. In this situation, we can quantify the
computation cost of each partner in our scheme. From a user perspective, file F is first
split into n blocks, and then a signature is generated for each block. According to Eq.
(2), the corresponding computation cost is about n Mul G + n H ash G + 2n E x pG +
ns Mul Z p +ns Add Z p , where n Mul G denotes the cost of computing one multiplication
in group G, ns Mul Z p and ns Add Z p denote the cost of computing one multiplication
and one addition in group Z p , respectively, and H ash G denotes the cost of one hashing
operation in G. On the server side, the generated response is {{μ j }s1 ≤ j≤sc , σ } , and
the computation cost of calculating a proof is about cs(Mul Z p + Add Z p ) + cE x pG +
cMul G . Similarly, on the TPA side, to check the correctness of proof {{μ j }s1 ≤ j≤sc , σ },
the TPA needs to verify it according to Eq. (4), and the computation cost of verifying a
proof is about 2Pairing + (c + s)E x pG + (c + s + 1)Mul G , where Paring denotes
one pairing operation in G.

7.2 Communication cost

The main communication cost comprises two aspects: the auditing request and the
auditing proof. For each auditing request chal = {i, vi }s1 ≤i≤sc , the communication
cost is about c(| p|+|n|) bits, where | p| is the length of the element in Z p , and |n| is the
length of a block index. Each auditing proof ({u j }1≤ j≤s , σ ) in our scheme contains
s elements in Z p and one element in G. Therefore, the communication cost is about
(s + 1)| p| bits.

7.2.1 Performance of signature generation

In this section, we evaluate the performance of our scheme. Our experiments were
carried out on a system with a 2.4-GHz Intel Core 2 processor, with 2 GB RAM,
and a 5400 RPM Western Digital 320 GB Serial ATA drive with an 8 MB buffer.
Algorithms, written in C, were implemented using the pairing based cryptography

123
A method for achieving provable data integrity… 103

(PBC) library and GNU multiple precision arithmetic (GMP) library. All results are
averages of 100 trials. We chose p = 160 bits as the security parameter. According
to [1], if 1 % of all the blocks are corrupted, the TPA can detect this misbehavior
with probability greater than 99 % by choosing only 460 randomly selected blocks. In
the following experiments, we assume that the number of selected blocks is greater
than 300 (c ≥ 300). For the sake of comparison, we also used two other classic
schemes [10,11,13,15]. Shacham et al. first introduced the data fragment structure to
reduce storage overhead of the verification metadata in the cloud [10,11]. However,
their scheme can only be used for static data storage. For the mechanism proposed by
Wang et al., the authors mainly focused on supporting dynamic operations and batch
auditing, and did not consider the storage overhead of the verification metadata in the
cloud [13]. It is obvious that the two schemes are one-sided. More importantly, the
two schemes do not consider the cost of generating the verification metadata during
the initialization phase. This problem has been successfully solved in our scheme by
reducing the number of group operations.

7.2.2 Performance of metadata generation

Before a file is stored in the cloud, it is first split into n blocks, and each block is
further split into s sectors. Each sector is one
 element
 of Z p , i.e., 20 bytes. Assuming
the size of the file is b bits, there are n = slogp
b
blocks. Obviously, the smaller the
value of s, the greater is the value of n. According to this relationship, we can derive
an equation for the computation cost of generating verification metadata as follows:
 
1 1 2
Ttotal = C · Mul G 1 + H ash G 1 + E x pG 1 + Mul Z p + Add Z p
s s s
size
where C = .
blocksize
From this equation, we know that the total time Ttotal for metadata generation is
inversely proportional to the value of parameter s. To achieve optimal performance,

Fig. 4 Cost with varying numbers of s

123
104 A. Li et al.

Fig. 5 Cost for different file sizes

Table 1 Comparison of auditing performance

Scheme in [15] Scheme in [10] Scheme in [11] Our scheme

Data usage (GB) 1


Sampled blocks (c) 300 460 300 460 300 460 300 460
Server comp. time (s) 0.61 0.94 0.84 1.11 0.83 1.03 0.85 1.13
TPA comp. time (s) 0.64 0.95 0.82 1.16 0.81 1.15 0.83 1.16
Communication (KB) 80 243 6.14 8.35 6.12 8.31 6.18 8.37

we need to increase the value of s as much as possible. However, it is clear that if


we increase the value of s, the block size also increases, which will impact the fine-
grained inner management of the outsourced data, i.e., dynamic operations. As shown
in Fig. 4, considering the data fragment structure can greatly reduce the total time
for generating metadata. As the value of sector s increases, so the cost of generating
metadata decreases. As shown in Fig. 5, the total time for generating verification
metadata increases linearly with the file size, although our scheme grows more slowly
than the other two. Thus, our scheme is better suited to verify the correctness of large
files in the cloud than the other two. A detailed comparison of the auditing performance
of our scheme and the other two schemes is given in Table 1. Owing to the smaller
number of group operations in the Setup phase, the cost of our scheme is 49 and 98
times smaller than those of Shacham et al. and Wang et al., respectively. Obviously,
our scheme greatly reduces the user cost during the initialization phase.

7.2.3 Auditing performance

From Table 1, it is clear that the auditing performance of each of the three schemes
is similar. Since Wangs scheme does not consider the data fragment structure, it has
the best performance of all the checking schemes. The performance of our scheme is
almost the same as that in [14], although it greatly reduces the cost of the cloud user
during the Setup phase. Additionally, the auditing time increases with the value of

123
A method for achieving provable data integrity…

Fig. 6 Batch auditing varying numbers of sampled blocks c. a Detection probability of about 95 %. b Detection probability of about 99 %

123
105
106

123
Fig. 7 Comparison of batch auditing. a Detection probability of about 95 %. b Detection probability of about 99 %
A. Li et al.
A method for achieving provable data integrity… 107

c. Specifically, when c = 300, if there are 100 sectors in a block, the auditing time
is only about 0.83 s for the TPA; when c = 460 , the TPA needs 1.16 s to verify the
integrity of the same data. For the same generated response (i.e., ({μ j }s1 ≤ j≤sc , σ )), the
communication cost of our scheme is almost the same as that in [14], but our scheme
supports dynamic operations. We note that taking into account the data fragment
structure affects the auditing time, which increases linearly with the value of s.

7.2.4 Batch auditing

As discussed in Sect. 5.2, if the TPA has multiple auditing tasks, he can handle those
tasks concurrently, thus greatly improving the efficiency of verification. In Fig. 6, we
give a comparison of batch auditing under varying numbers of sampled blocks c. To
compare batching efficiency, we conducted a series of batch auditing tests, where the
number of auditing tasks increased from one to approximately 200 in intervals of 10.
It was shown that compared with individual auditing, batch auditing helps to reduce
the TPAs computation cost, with more than 4 and 6 ms auditing time saved when c
is set to 460 and 300, respectively. In Fig. 7, we give the performance comparison of
batch auditing for our scheme and Wangs batch auditing scheme. Although Wangs
scheme is more efficient than ours, it is insecure. Our scheme uses a weight-based
batch auditing method to overcome this security hole.

8 Conclusion

In this paper, we proposed a simple yet efficient auditing scheme for checking the
integrity of data stored in the cloud. We attempted to reduce the cost of the initialization
phase for executing an auditing protocol by exploiting certain desirable attributes of
bilinear groups. Additionally, we considered various other properties of our checking
mechanism, such as supporting dynamic operations and batch auditing. Finally, several
experiments show that our construction is efficient and secure.

Acknowledgements This work was supported by the National Natural Science Foundation of China (No.
61472433) and the National Basic Research Program of China (973 Program, No. 2013CB329604).

References
1. Ateniese G, Burns R, Curtmola R, Herring J, Khan O, Kissner L, Peterson Z, Song D (2011) Remote
data checking using provable data possession. ACM Trans Inf Syst Secur (TISSEC) 14(1):12
2. Ateniese G, Burns R, Curtmola R, Herring J, Kissner L, Peterson Z, Song D (2007) Provable data
possession at untrusted stores. In: Proceedings of the 14th ACM conference on computer and commu-
nications security. ACM, New York, pp 598–609
3. Boneh D, Lynn B, Shacham H (2001) Short signatures from the weil pairing. In: Advances in
cryptology—ASIACRYPT’01. Springer, New York, pp 514–532
4. Boneh D, Lynn B, Shacham H (2004) Short signatures from the weil pairing. J Cryptol 17(4):297–319
5. Dodis Y, Vadhan S, Wichs D (2009) Proofs of retrievability via hardness amplification. In: Theory of
cryptography. Springer, New York, pp 109–127
6. Erway C, Küpçü A, Papamanthou C, Tamassia R (2009) Dynamic provable data possession. In: Pro-
ceedings of the 16th ACM conference on computer and communications security (CCS’09). ACM,
New York, pp 213–222. doi:10.1145/1653662.1653688

123
108 A. Li et al.

7. Grobauer B, Walloschek T, Stöcker E (2011) Understanding cloud computing vulnerabilities. Secur


Priv IEEE 9(2):50–57
8. Juels A, Kaliski BS Jr (2007) Pors: proofs of retrievability for large files. In: Proceedings of the 14th
ACM conference on computer and communications security. ACM, New York, pp 584–597
9. Mell P, Grance T (2009) The NIST definition of cloud computing. Natl Inst Stand Technol 53(6):50
10. Shacham H, Waters B (2008) Compact proofs of retrievability. In: Advances in cryptology—
ASIACRYPT’08. Springer, New York, pp 90–107
11. Shacham H, Waters B (2013) Compact proofs of retrievability. J Cryptol 26(3):442–483
12. Shuang T, Lin T, Xiaoling L, Yan J (2014) An efficient method for checking the integrity of data in
the cloud. Commun China 11(9):68–81
13. Wang C, Wang Q, Ren K, Lou W (2010) Privacy-preserving public auditing for data storage security
in cloud computing. In: INFOCOM, 2010 proceedings IEEE. IEEE, pp 1–9
14. Wang Q, Wang C, Li J, Ren K, Lou W (2009) Enabling public verifiability and data dynamics for
storage security in cloud computing. In: Computer security—ESORICS’09. Springer, New York, pp
355–370
15. Wang Q, Wang C, Ren K, Lou W, Li J (2011) Enabling public auditability and data dynamics for
storage security in cloud computing. IEEE Trans Parallel Distrib Syst 22(5):847–859
16. Zhu Y, Hu H, Ahn GJ, Yu M (2012) Cooperative provable data possession for integrity verification in
multicloud storage. IEEE Trans Parallel Distrib Syst 23(12):2231–2244
17. Zhu Y, Wang H, Hu Z, Ahn GJ, Hu H (2011) Zero-knowledge proofs of retrievability. Sci China Inf
Sci 54(8):1608–1617

123

You might also like