AF-DedupSecure Data Deduplication Based On Adaptive Dynamic Merkle Hash Forest PoW For Cloud Storage
AF-DedupSecure Data Deduplication Based On Adaptive Dynamic Merkle Hash Forest PoW For Cloud Storage
XX, XXXX 1
Abstract— For encrypted data deduplication, Proof of In order to address the aforementioned challenge, Conver-
ownership (PoW) verifies a client’s ownership of an entire gent Encryption (CE) [3] is proposed. However, CE cannot
file, preventing malicious users from exploiting a single resist dictionary attacks. There are also variants [4] [5] of
segment of information to gain access to the file. By es-
tablishing the identity of two users who possess the same CE, but they still cannot address the core issue. To enhance
file, CSP can maintain a single copy for the file, enabling the security of CE, several schemes introduce the use of a
deduplication. However, existing PoW schemes based on Trusted Third Party (TTP) [6]. However, TTPs are considered
Merkle hash tree (MHT) cannot guarantee the security of to be fully trusted in these schemes, posing implementation
small files. Therefore, we propose a novel data structure challenges in real-world applications.
called adaptive dynamic Merkle hash forest (ADMHF) for
PoW, and present an encrypted data deduplication scheme Proof of ownership (PoW) plays a vital role in encrypted
called AF-Dedup. It reduces the risks of data content ex- data deduplication, as it can efficiently prove that a user
posure resulting from multiple ownership verification at- indeed possesses a certain file [7]. The current method for
tempts in traditional schemes. Specifically, we first con- PoW involves using file hash to differentiate between identical
struct the file tag as a unique identifier of the file. Second, files, commonly referred to as hash-as-proof. However, if ans
different encryption schemes are employed depending on
the popularity of the data. Then, the corresponding ADMHF attacker can enumerate the hash value offline, he may gain
is generated for subsequent ownership verifications. the ownership of the file even he does not own the file. In
After security analysis and simulation experiments, our response to this, Halevi et al. suggest the Merkle hash tree
scheme is proven to significantly enhance the security of (MHT) [8] as a means of validating data [9]. Compared to the
small files. In a given situation for files with only 2 blocks, hash-as-proof scheme, this scheme enables users to efficiently
our scheme achieves the same level of security as the
existing scheme for a file with 91 blocks. prove ownership of an entire file to the CSP, rather than just
a fragment of the file. For prover that passes MHT-PoW, it
Index Terms— ADMHF, bilinear mapping, encrypted data can be inferred with high probability that the prover possesses
deduplication, proof of ownership
most of the leaves associated with the file.
However, the security of MHT-PoW scheme is compro-
I. I NTRODUCTION mised, especially for small files. As demonstrated by our
URRENTLY, an increasing number of users prefer to analysis in Section V-C, considering side-channel attacks, each
C upload their local files to cloud storage platforms in order
to free up storage space and access their data anytime and
verification attempt poses a potential risk of exposing relevant
nodes. With an increasing number of verifications Ncr , the
anywhere. However, this trend results in a significant amount percentage of exposed nodes γ also increases. Once Ncr
of duplicated data stored in the cloud, leading to waste of the exceeds a certain value, the MHT becomes fully exposed,
storage space. To address this issue, the mainstream methods rendering the PoW mechanism ineffective. We then perform
include data compression and data deduplication, with this experiments in Section VI-A to assess the correlation between
paper focused on the latter. Encrypted data deduplication is Ncr and γ.
a technique where the cloud server stores a single copy of a Building on the aforementioned concerns, we propose a
file and generates an access link for other authorized users. secure encrypted data deduplication scheme, namely AF-
Studies indicate that encrypted data deduplication ratios can Dedup. Our contributions are as follows:
range from 1:10 to 1:500, allowing for over 90% savings in • We propose an encrypted data deduplication scheme that
storage space for backup file systems [1]. does not rely on any TTPs and guarantees semantic
The practice of outsourcing private data to the cloud service security of ciphertexts.
providers (CSPs) necessitates that users place unconditional • We propose a PoW method based on ADMHF, thereby
trust in the security of their data. In reality, users are often preventing the exposure of MHT node information via
concerned about data confidentiality. One viable solution in- multiple verifications. For example, when the file is
volves encrypting the data before uploading it to the CSP. divided into 32 blocks, the percentage of exposed nodes
However, traditional encryption schemes use different keys for in a single PoW round decreased by a factor of 7.199
different users, resulting in the same plaintext being encrypted compared to the traditional MHT-PoW scheme.
into different ciphertexts. This makes deduplication a complex • We utilize blind signature to generate file tag and enhance
and challenging problem [2]. system security by employing bilinear mapping for sig-
2 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, XXXX
file data blocks and facilitating user access requests by com- • Step 4. Input security parameters λ to update the sym-
municating with the main server. metric encryption algorithm.
2) CheckTagGen: When a user U intends to upload a file F
B. Adversary Model to CSP, s/he first executes the remote attestation to establish a
In the proposed scheme, two possible adversaries are con- secure communication channel. Then U tries to acquire a blind
sidered: internal adversary and external adversary. signature of F from CSP. The process is shown in Alg.1.
Internal adversary: The internal adversary refers to some-
one with authorized access to the CSP and may attempt Algorithm 1 Check Tag Generation
to retrieve users’ encrypted data without permission. This 1: User:
adversary, may be a cloud service provider or one of its R
2: Randomly chooses q ←− Zp∗ .
employees, and can access ciphertext without the knowledge 3: Computes the short hash value of the file hF = SH(F ).
or consent of the respective user. 4: Calculates F ′ = hqF .
External adversary: In the context of cloud computing, 5: User → CSP: Sends F ′ for blind signature.
external adversaries are typically defined as malicious users 6: CSP: Computes α′ = F r .
′
who engage in unauthorized access of private data belonging 7: CSP → User: Send α′ to U .
to other users. These adversaries can employ various tactics 8: User: Calculates α = α q .
′ −1
to achieve their objectives, such as attacking legitimate clients 9: if e(α, g) = e(hF , g1 ) holds then
to obtain partial information about specific copies of data. 10: // Signature α′ from CSP is right.
11: User:
C. Design Goals 12: Computes K1 = H(α).
The design goal of this scheme is that CSP can securely 13: Computes the ciphertext of F , CF = E(K1 , F ).
accomplish the deduplication of encrypted data. Therefore, our 14: Computes the file tag T agF = H(CF ).
scheme should satisfy the following properties: 15: Stores T agF to verify data integrity.
• Security: 1) The uploaded ciphertext is semantically 16: User → CSP: Send T agF as the file tag.
secure. 2) The security of file tags, including its un- 17: if T agF already existed in CSP then
forgeability and distinguishability. 3) The security of the 18: CSP → User: U is a subsequent uploader.
ADMHF-PoW process. 19: Execute Algorithm 2.2.
• Efficiency: The proposed solution should be efficient in 20: else
terms of time and storage. 21: CSP → User: U is an initial uploader.
22: Execute Algorithm 2.1.
IV. P ROPOSED D EDUPLICATION S CHEME 23: end if
A. Preliminary 24: else
25: Upload termination.
1) Bilinear mapping: Let (G, +) and (GT , ×) be addictive
26: end if
and multiplicative groups with the same prime order p respec-
tively, g is the generator of G. Let e : G × G → GT be a
bilinear map which satisfies the following properties. 3) FileUpload: Based on the result of Alg. 1, U is divided
• Bilinear: ∀a, b ∈ Zp∗ , ∀P, Q ∈ G, e(aP, bQ)= e(P, Q)ab . into initial uploader or subsequent uploader.
• Non-degenerate: ∃P, Q ∈ G, such that e(P, Q) ̸= 1. Case 1: For initial uploaders, to achieve a balance between
• Computable: There is an efficient algorithm to compute the security and the efficiency of data encryption, we adopt
e(P, Q), for P, Q ∈ G. a strategy that classify the data into two categories: popular
data and unpopular data. For popular data with low level
B. Implementation of privacy, it is sufficient to upload CF as ciphertext. For
unpopular data, a two-layered encryption is required. When the
Our scheme consists of four processes, including SystemSet,
number of authorized users reaches the popularity threshold t,
CheckTagGen, FileUpload, and FileDownload. The overview
that is CoutF = t, the outer layer encryption of unpopular
is shown in Fig.2.
data is decrypted. Notably, t is a constant determined by the
1) SystemSet:
application scenario and security requirements. As t increases,
• Step 1. Two cyclic groups G1 and GT of prime order p the security of the system improves, but it increases the
are chosen and g is the generating element of G1 . Define overhead for users to decrypt during the file download phase.
bilinear mapping e : G1 × G1 → GT . Regarding the specific value of t in real-world setting, readers
R R
• Step 2. CSP randomly choose g ′ , h, h′ ←− G, r ←− Zp∗ , can refer to [18].
and calculates g1 = g r , Z = e(g r , g ′ ). Set the system After obtaining the ciphertext, the corresponding ADHMF is
′
main public key as mpk = g r , the public parameter is generated for subsequent verifications. To mitigate the poten-
(G1 , G2 , p, e, g, g1 , g ′ , h, h′ , Z, H, SH). tial vulnerabilities associated with the MHT, we propose using
• Step 3. Each user in the system picks a random number a new verification data structure called the adaptive dynamic
R
s ←− Zp∗ , calculates the user’s public key as pk = g s , Merkle hash forest (ADMHF). It is a collection of m mutually
′
and private key as sk = (d1 , d2 , d3 ) = (g r , (g1pk h)s , g s ). independent MHTs. Moreover, each ADMHF corresponding
4 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, XXXX
to a particular file. The number of trees contained within the 12: Generates random salt values Salt = {s1 , s2 , · · · , sm }.
ADMHF is determined by the file size, according to a specified 13: for i = 1 to m do
formula. 14: Generates M HTi and add si to M HTi when calcu-
a late the hash value of each node.
y = F (x) = c (1)
1 + eb− x 15: Signs the root node nRooti , and obtains SigFi =
The equation (1) illustrates the dynamic relationship be- nrRooti to verify the integrity of M HTi .
tween y, which denotes the number of trees in ADMHF, 16: Stores si and the root node nRooti .
and x, representing the number of data blocks. The graph 17: end for
of the function appears as an inverted S-curve. Parameter a
Among them, T agpop is a variable that indicates the current
determines the upper bound of the function value. Parameter
data popularity. When CountF < t, F is unpopular data now,
b and c affect the critical point and change tendency of the
that is T agpop = 1. When CountF > t, F is popular data,
curve.
then T agpop = 0. When CountF = t, then T agpop = 2,
To achieve the diversity of MHTs in ADMHF, we introduce which indicates that the file is undergoing a transition from
salt to the hash function in the MHT generation process. unpopular data to popular data, and it is considered popularity
Salt = {s1 , s2 , · · · , sm } is a set of randomly generated transitioning data.
strings, in which each element si , i ∈ {1, · · · , m} corresponds
Case 2: For subsequent uploaders, CSP tends to perform
to a unique MHT in the forest. The value of the parent node
PoW on U . The ADMHF-PoW process consists of Challenge,
is obtained by concatenating the values of its two child nodes
Response and Verification. The detailed process is shown in
with a salt value in Salt, and then performing a hash operation
Alg. 2.2.
on the resulting string. The detailed process is shown in Alg.
2.1.
Algorithm 2.2 File Upload: Subsequent Uploader
Algorithm 2.1 File Upload: Initial Uploader 1: Challenge:
1: // CountF = 0 < t, which implies the data is unpopular 2: // CSP generates a challenge set for PoW.
data now. 3: Randomly select a M HTi , i ∈ {1, 2, · · · , m} from
2: User: ADMHF.
R
3: L ←− Zp∗ for two-layered encryption. 4: Computes the hash value of current root node.
4: C1 = g L , C2 = CF ⊕ Z L , 5: if e(SigFi , g) = e(nRooti , g1 ) then
5: C3 = (g1pk h)L , C4 = (g1hF h′ )L . 6: Generates a challenge set ch = {ch1 , ch2 , · · · , chk },
6: Computes the ciphertext CF′ = C1 ||C2 ||C3 ||C4 . chj ∈ {1, 2, · · · , N umt }.
7: Computes K2 = H(F ), τF = K2 ⊕ L. 7: CSP → User: Sends < si , ch, τF > to U .
8: User → CSP: Send < CF′ , τF , T agpop = 1 > to CSP. 8: else
9: CSP: 9: The data integrity of M HTi has been corrupted.
10: Divides CF into blocks {Bi }, i ∈ {1, · · · , N umt }. 10: end if
11: Calculates m according to (1). 11:
AUTHOR et al.: AF-DEDUP: SECURE DATA DEDUPLICATION BASED ON ADAPTIVE DYNAMIC MERKLE HASH FOREST POW FOR CLOUD STORAGE 5
12: Response:
13: // U generates a response to prove his ownership.
14: Computes K2 = H(F ), L = τF ⊕ K2 .
15: Encrypts CF with L to obtain CF′ .
16: Divides CF′ into blocks {Bi }, i ∈ {1, · · · , N umt }.
17: Adds salt si to each of the nodes to obtain M HTi′ .
18: Computes the corresponding response res = {leafi ,
Bro(leafi )}, i ∈ ch for each node indexed by ch.
19: User → CSP: Send res to CSP.
20:
21: Verification:
22: // CSP verifies the correctness of the response.
23: CSP uses the values in res to calculate the root node of
M HTi .
24: if calculated value is identical to the actual value then
25: Adds U to the owner list, CountF + = 1.
26: if CountF < t then
Fig. 3: ADMHF-PoW Process
27: CSP → User: Returns T agpop = 1.
28: else if CountF = t then
29: CSP → User: Returns T agpop = 2. Here is the correctness proof for the outer layer decryption:
30: else
31: CSP → User: Returns T agpop = 0. (C2 · e(C3 , d3 )) ⊕ (e(d1 , C1 ) · e(d2 , C1 ))
32: end if = ((CF ⊕ e(g r , g ′ )L ) · e(C3 , d3 )) ⊕ (e(d1 , C1 ) · e(d2 , C1 ))
33: Execute Algorithm 3.
= CF ⊕ (e(g r , g ′ )L · e((g1pk h)L , g s )) ⊕ (e(g ′r , g L ) · e((g1pk h)s ), g L )
34: else
35: The verification fails. = CF ⊕ (e(g r , g ′ )L · e((g1pk h)L , g s )) ⊕ ((e(g r , g ′ )L · e((g1pk h)L , g s )))
36: end if = CF ⊕ 0 = CF .
Among them, in the process of Response, Route(leafi ) 4) FileDownload: When U intends to download file F , he
denotes the set of nodes located on the path from the leaf sends a request to CSP. CSP authenticates U and searches
node leafi to the root node. And Bro(leafi ) denotes the set the list of file owners to determine whether U has access. If
of sibling nodes for each node within Route(leafi ). found, CSP returns current data popularity to U . Otherwise,
the request will be rejected. In addition, the user is informed if
The ADMHF-PoW process is illustrated with the exam- there is a change in data popularity. The encrypt and decrypt
ple shown in Fig. 3. In this case, the file is split into 8 process is shown in Alg. 4.
blocks. During the challenge process, CSP randomly selects
a M HTi and generates a challenge set ch = {3, 7}, then Algorithm 4 File Download
sends < si , ch, τF > to U . U calculates the response res = 1: CSP → User: Sends < CF′ , τF , T agpop = 1 > or <
{n31 , n43 , n44 , n33 , n47 , n48 } with si . Then CSP uses the node CF , T agpop = 0 >.
values in res to calculate the root node to verify the response. 2: if T agpop = 1 then
If U passes the PoW, it can be assumed that U owns the file. 3: User: Decrypts CF′ using L.
Then data deduplication needs to be performed, the detailed 4: User: Obtains CF = (C2 · e(C3 , d3 )) ⊕ (e(d1 , C1 ) ·
process is shown in Alg. 3. e(d2 , C1 )).
5: end if
6: Computes T agF′ = H(CF ).
Algorithm 3 Data Deduplication
7: if T agF′ = T agF then
1: if T agpop = 2 then 8: User: Computes F = D(K1 , CF ) to get the plaintext.
2: // It is considered as popularity transitioning data. 9: else
3: User: 10: Data integrity is compromised.
4: Decrypts CF′ using L. 11: end if
5: Then obtains CF = (C2 · e(C3 , d3 )) ⊕ (e(d1 , C1 ) ·
e(d2 , C1 )).
6: User → CSP: Uploads < CF , T agpop = 0 > to replace C. Computation Complexity Analysis
the original tuple < CF′ , τF , T agpop = 1 >. We analyze the computation complexity of users and CSP
7: else in different phases in Table II. Among them, F Size represents
8: // Data Deduplication. the size of the file, N umt is the number of leaf nodes of MHT,
9: U does not need to do anything. m represents the number of MHTs in ADMHF, and k is the
10: end if number of nodes CSP challenges at each round.
6 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, XXXX
If there exists CFi = CF or CF′ i = CF′ , where i ∈ files, the adversary needs to collect fewer different response
[1, 2F Size ], A can obtain the plaintext data. The time complex- sets, making the MHT constructed on it more vulnerable.
ity of this process is O(p · 2F Size ), which is computationally
Theorem 6. If the adversary collects N umt /2 distinct re-
infeasible.
sponse sets of the same file, where N umt represents the
Lemma 2. The Discrete Logarithm problem (DL problem): number of leaf nodes, then the adversary can reconstruct this
Given g a ∈ G where a ∈ Zp∗ , computing a is hard. MHT.
Theorem 4. The scheme is resistant to offline brute-force Proof. Suppose for file F , the adversary collected N umt
attack. distinct response sets. We represent this with a matrix, where
each row of the matrix represents the response set collected
Proof. From Lemma 2 in α = hrF , guessing r is difficult.
by the adversary each round:
Therefore, A cannot compute the plaintext from the ciphertext
even if it perform an offline brute-force attack.
a1,1 a1,2 · · · a1,n
By analyzing the scheme’s resistance to online attacks, it a2,1 a2,2 · · · a2,n
.. . (7)
can be proved that an adversary cannot gain ownership of a .. .. ..
. . . .
certain file by interacting with the CSP in the case of unknown
ac,1 ac,2 · · · ac,n
the plaintext.
2) Online brute-force attack: After eavesdropping the com- For the sake of convenience, we use n to represent the
munication channel, the external adversary A obtains the file length of res and c to denote N umt /2. Here, we discuss the
tag T agF . Using this information, A performs an online brute- scenario where only one challenge node is initiated by CSP.
force attack on F . In practical applications, there are typically multiple challenge
• Step 1. A uploads T agF to CSP. As T agF is previously
nodes, resulting in the exposure of a greater number of nodes.
stored, A is recognized as a subsequent uploader. During the Verification process, to enable the CSP to
• Step 2. CSP generates a challenge ch to A, assuming that
correctly generate the root node based on res, it’s necessary
|ch| = 1. to label whether the node is a left child or a right child. How-
• Step 3. A generates response by enumerating the hash
ever, adversaries can exploit this information along with the
values in res. For one hash value, A lists all its pos- collected response sets to launch attacks. The last column in
sible values {hi }, |h| = HashLen, i ∈ [1, 2HashLen ]. the matrix corresponds to the second level of the MHT. Since
In one verification round, A needs to give a total of the second level has only two nodes, this column has only two
log(N umt ) + 1 such hash values. distinct values. Based on this, attackers can reconstruct the
• Step 4. A sends res to CSP for verification.
second level. By repeating this process, attackers can use the
• Step 5. CPS returns the verification result.
same method to reconstruct all subsequent nodes and thereby
reconstruct the entire MHT.
Theorem 5. If A knows T agF and unknows F , the proba-
bility that A wins the Game to obtain the ownership of F is
negligible. That is: 2) Security of ADMHF-PoW: The previous section demon-
strated the insecurity of the MHT scheme. In the following
P r[Awins ] ≤ ε. (6) section, we will demonstrate how the ADMHF-PoW scheme
Proof. The probability of A guessing one hash value is enhances the security significantly of the PoW process.
( 21 )HashLen . A must guess all log(N umt ) + 1 hash values Theorem 7. Compared with MHT-PoW scheme, ADMHF-
correctly at the same time to win the game. Hence, the PoW scheme significantly reduces the node exposure rate in
probability that A wins is ( 12 )HashLen(log N umt +1) . For SHA- each round of verification.
256, HashLen is 256 bits. Furthermore, in practice, the value
of |ch| is usually greater than 1, and this probability decreases Proof. Assuming MHT is a complete binary tree, its leaf nodes
sharply as |ch| increases. Therefore, it is computationally are 2i . Therefore, the total number of nodes in the MHT is
infeasible for A to win the Game. N umsum = 2i+1 − 1. Assuming |ch| = 1 in each verification
attempt, as the user needs to provide the values of all the
sibling nodes on the path from the challenge node in ch to
C. Security of PoW Process the root node, then |res| = i + 1. Since some of the other
1) Exposure of Merkle Hash Tree: Typically, PoW is de- nodes can be computed by nodes in res as child node, the
signed to operate reliably, regardless of the number of times it number of exposed nodes is |Expose| = 2i + 1.
occurs. However, in the MHT-PoW scheme [9], a portion of We define γ as the percentage of the exposed content of
the MHT nodes becomes exposed with each ownership veri- MHT, which is shown in (8).
fication attempt. If the adversary can collect enough response
|Expose| 2i + 1
sets res, all nodes of the entire MHT can be reconstructed. γ= = i+1 × 100% (8)
At this point, no matter what challenge CSP initiates, the N umsum 2 −1
adversary can return the correct response, thereby gaining For ADMHF scheme, CSP generates an ADMHF with m
ownership of the entire file. It should be noted that for smaller MHTs. Among them, m can be calculated by (1). So the
8 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, XXXX
1 0 0 1 0 0
P e rc e n ta g e o f e x p o s u re (% )
P e rc e n ta g e o f e x p o s u re (% )
8 0 8 0
6 0 6 0
c h a lle n g e b lo c k n u m b e r= 1 0 c h a lle n g e b lo c k n u m b e r = 1 0
4 0 4 0
c h a lle n g e b lo c k n u m b e r= 1 5 c h a lle n g e b lo c k n u m b e r = 1 5
2 0 c h a lle n g e b lo c k n u m b e r= 2 0 2 0 c h a lle n g e b lo c k n u m b e r = 2 0
c h a lle n g e b lo c k n u m b e r= 2 5 c h a lle n g e b lo c k n u m b e r = 2 5
0 0
0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0
N u m b e r o f c h a lle n g e -re s p o n s e N u m b e r o f c h a lle n g e -re s p o n s e
(a ) (b )
Fig. 5: Percentage of content exposure. (a)File block number=500, (b) File block number=800.
C o m p u ta tio n O v e rh e a d (s )
2 5
N u m b e r o f c h -re s re q u ire d
8 0 0
2 0 K e y G e n e ra tio n
6 0 0 D a ta E n c ry p tio n
1 5 T a g G e n e ra tio n
4 0 0 F ile U p lo a d
1 0
M H T sc h e m e A D M H F G e n e ra tio n
2 0 0
A D M H F sc h e m e 5
0
0
0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0
1 6 3 2 6 4 1 2 8 2 5 6
B lo c k N u m b e r F ile S iz e (M B )
(a ) (a )
N u m b e r o f c h -re s re q u ire d
C o m p u ta tio n O v e rh e a d (s )
2 5
4 0 0
2 0 K e y G e n e ra tio n
3 0 0
D a ta E n c ry p tio n
1 5 T a g G e n e ra tio n
2 0 0
M H T sc h e m e 1 0 F ile U p lo a d
1 0 0 A D M H F G e n e ra tio n
A D M H F sc h e m e 5
0
0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 0
B lo c k N u m b e r 1 6 3 2 6 4 1 2 8 2 5 6
(b ) F ile S iz e (M B )
(b )
Fig. 6: Actual number of challenge-response resulting in full
Fig. 7: Computation overhead for different file size. (a)
exposure. (a) Challenge block number=10, (b) Challenge block
Popular data Computation overhead, (b) Unpopular data com-
number=20.
putation overhead.
TABLE V: Storage overhead comparison
Scheme Server side Client side TTP side
Our scheme SC + N Stree + Ssk + Sk ST + Sk - overhead for ADMHF generation demonstrates little difference
Key-sharing Sk + ST + SC Ssk SC + Ssk between files of different sizes, which exhibits good scalability.
TEE SC + Ss + SξF + SAm Sh + bSAα + Std -
For small files, more MHTs need to be generated, while for
large files, although the number of MHTs is reduced, the
computation overhead of generating a single MHT increases.
C. Computation Overhead Besides, the computation overhead required for both file
The computation overhead of the proposed scheme is eval- encryption and upload process increases with file size.
uated by conducting experiments on five distinct file sets, We compare the latency of the PoW process with the
ranging in size from 16MB to 256MB. Measurements are MHT-PoW scheme [9] and the key-sharing scheme [17]. The
taken at various stages of the file upload process, including results are illustrated in Fig. 8(a). In our experiment, We
key generation, data encryption, file tag generation, file up- set the number of challenge nodes to be 10%N umt , where
loading, and ADMHF generation. The experimental results are N umt represents the number of leaf nodes in the MHT. As
presented in Fig. 7. shown in Fig. 8(a), our scheme has lower latency compared
Fig. 7 highlights that the computation overhead associated to [17], but slightly higher than [9]. Although the efficiency
with key and file tag generation is practically negligible. of [9] is higher, it comes at the cost of sacrificing security.
This is attributed to the fact that during the tag generation Specifically, it can only prove security for a more restrictive
phase, we compute the short hash value for files of any set of input distributions and under an assumption about the
size. The subsequent blind signature is applied to this hash linear code. On the other hand, [17] has a fixed number
value, resulting in higher efficiency. Besieds, the computation of leaf nodes in the construction of the MHT, resulting in
10 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, XXXX
T o ta l C o m p u ta tio n O v e rh e a d (s )
1 6
C o m p u ta tio n O v e rh e a d (s )
5 0
M H T -P o W O u r s c h e m e -P o p D a ta
1 4
K e y s h a rin g O u r s c h e m e -U n p o p u la r d a ta
1 2 4 0
O u r s c h e m e K e y -s h a rin g
1 0 C lo u d D e d u p
3 0 T E E
8
6 2 0
4
1 0
2
0 0
1 6 3 2 6 4 1 2 8 2 5 6 5 1 2 1 6 3 2 6 4 1 2 8 2 5 6
F ile S iz e (M B ) F ile S iz e (M B )
(a ) (b )
Fig. 8: (a) Comparison of computation overhead in the PoW process. (b) Total computation overhead comparison.