Integrity and Confidentiality in Cloud Outsourced Data
Integrity and Confidentiality in Cloud Outsourced Data
a r t i c l e i n f o a b s t r a c t
Article history: Cloud services have become an increasingly popular solution to provide different services to clients. One
Received 2 April 2017 of the cloud services is database as a service (DBaaS), in which the service provider offers different
Revised 11 October 2018 resources such as software, hardware and network to the clients to be able to manage and administer
Accepted 14 March 2019
the database. However, the data and the execution of database queries are under control of the distrustful
Available online 4 April 2019
service provider. This lack of trust opens up new security issues and serves as the chief motivation of our
work. This study presents an overview of different cryptographic algorithms, based on different schemes
Keywords:
in the outsourced database security and query authentication. We conclude the paper by proposing a new
Cloud storage
Data confidentiality
architecture that achieves the confidentiality and integrity of query results of the outsourced database.
Data integrity Ó 2019 The Authors. Published by Elsevier B.V. on behalf of Faculty of Engineering, Ain Shams University.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-
nd/4.0/).
https://doi.org/10.1016/j.asej.2019.03.002
2090-4479/Ó 2019 The Authors. Published by Elsevier B.V. on behalf of Faculty of Engineering, Ain Shams University.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
276 M. Rady et al. / Ain Shams Engineering Journal 10 (2019) 275–285
3. Availability: The data in the cloud servers should be accessible many encryption algorithms, as well as to index and retrieve items
to its users. Major threats to availability are denial of service in a database. Examples include MD5, SHA1 and SHA2 [28].
(DOS) attacks, natural disasters, and equipment failures at the
service provider’s end [5].
4. Access control: The outsourced data should be accessed only by 2.2. Digital signature
authorized users.
5. Firewall: The CSP must be safeguarded against false accusations Data owners’ generate two keys: public key, Pk, and private key,
that may be claimed by dishonest owners or users [15]. Sk. A digital signature is created by a signing function, sign, which
takes as input a message, m, and Sk, and outputs a signature r,
We detail these security requirements with the techniques such that: sign (Sk,m) ? r. To verify the message, a verification
developed to handle them in the following section. Therefore, our function is used, which uses the signature, r, and Pk, and provides
main contributions in this paper can be outlined as follows: y if verification succeeds such that: verify (Pk,m,r) ? y. However,
the digital signature is an expensive operation compared to hash-
– We survey and discuss the important security requirements to ing operations. In some cases, both digital signature and hash oper-
store the data or database in the cloud. Security requirements ations are used, where the message can be hashed first then the
include data confidentiality, data integrity, and query process- digest is signed. Examples include RSA, DSA, and BGLS [7], pre-
ing over encrypted data. sented in the following sub-subsections.
– We provide a comparative summary of the different algorithms
and techniques to resolve security issues.
– We conclude the survey by proposing a design of a new archi-
2.2.1. RSA (Rivest-Shamir-Adleman signature)1
tecture to achieve the confidentiality and integrity of query
The RSA is an asymmetric cryptographic algorithm, in which each
results on outsourced data.
signer has a public key, Pk = (n,e), and a private key, Sk = (d), where n is
a k-bit modulus generated as the product of two random k/2-bit prime
The rest of the paper is organized as follows. In Section 2, we
numbers, p and q, while Z n ¼ f0; 1; 2; :; n 1g and e; d 2 Z n , which
address the different cryptographic primitives and definitions used
satisfy: ed 1 mod £ðnÞ, where £ðnÞ ¼ ðp 1Þðq 1Þ; which is the
to ensure data security. In Section 3, data confidentiality is dis-
number of elements in Z n .The message hashed first as: h(m) then sig-
cussed. In Section 4, different ways to process query over d
encrypted data is outlined. In Sections 5 and 6, data integrity and nature generated as: r ¼ hðmÞ ðmod nÞ: To verify the signature,
data-integrity-using-TTP different approaches are presented, re hðmÞ mod n, is used. The signature generation and verification
respectively. In Section 7, we provide a comparison of the different involve computing one modular exponentiation [7].
algorithms and techniques used to solve the main security issues.
In Section 8, the new architecture is proposed. In section 9, the
conclusion of the paper is presented. 2.2.2. DSA (digital signature algorithm)
DSA, same as RSA, is an asymmetric cryptographic algorithm, in
which each signer has two keys Pk and Sk. The private key Sk is
2. Cryptographic algorithms
chosen randomly, where 0 < Sk < q. The public key is calculated
as k ¼ nSk mod p. To sign a message, a random per-message value,
Many cryptographic algorithms are used in different crypto-
k, is generated, where 0 < k < q. Then a value r is calculated as:
graphic protocols to ensure data security, database security, and
r ¼ nk mod p mod q. The signature is a pair of r and s,
query authentication. In the following subsections, we present
the main and widely used algorithms and techniques used in sat- r ¼ ðr; sÞ, and computed on the hash of the message h(m), where
1
isfying security requirements. s ¼ k ðhðmÞ þ Sk r Þmod q. The signature verification requires at
rs
least two modular exponentiations as: r ¼ nm s :Pk mod q, where
2.1. Hash function s ¼ s1 mod q [28].
2.2.3. BGLS (an aggregate signature scheme by Boneh et al.) the user sends a query, she/he will receive the response in a mes-
Let G1 ,G2 and G be cyclic groups of the same prime order q. sage with the MAC. The user can authenticate the message by gen-
G1 and G2 are cyclic additive groups with generator g1, g2, and G erating the MAC of the message and comparing it to the received
is a cyclic multiplicative group, and e is computable bilinear map MAC. This mechanism assumes that all users possess the same
e: G1 G2 ! G, which satisfies the following properties: owners’ private key [13,30].
– Bilinear: for all 2 G1 , y 2 G2 , and a; b 2 Z, e xa ; yb ¼ eðx; yÞab : 2.4.2. Hash-based message authentication code (HMAC)
– Non-degenerate: eðg 1 ; g 2 Þ–1. The HMAC is generated by combining the hash function with
the owners’ private key. It takes a message, hash it and takes the
BGLS requires the use of the hash function that maps binary digest with the private key to produce the MAC output. The user
strings to non-zero points in G1 and G2 . Similar to RSA, BGLS is authenticates the message the same way as in the MAC mechanism
an asymmetric cryptographic algorithm, in which each signer has [13,30].
two keys Pk and Sk, picking a random private key as: Sk 2 Z x ,
and computing the public key Pk ¼ Sk:g 1 , Pk 2 G1 ; G2 . To sign a 2.4.3. Public key based homomorphic linear authentication (HLA)
message m, compute the hash of the message H ¼ hðmÞ, where It is like MAC, verification metadata that authenticates the data
H 2 G1 and r ¼ Sk::H. To verify a signature, compute H ¼ hðmÞ, integrity; in addition, it can be aggregated [10].
then check if eðr; g 1 Þ ¼ eðH; PkÞ [7].
2.4.4. Merkle hash tree (MHT)
2.3. Signature aggregation The data is fragmented into blocks. Each block is hashed and the
leaves of the tree are the values of the block hashes. The inner
Signature aggregation is used to reduce the cost of digital signa- nodes, and the root, are the hash of the concatenation of the hash
tures and to validate a number of messages with a single signature values of their children as shown in Fig. 2. The root also can be
instead of validating each message individually and the aggregate signed using the owners’ private key to ensure more security
signature is the same size as a standard signature [9]. [2,9,31].
The data structure authenticates the result by providing verifi- Data confidentiality is the process of protecting data from illegal
cation object VO associated with it. Verification usually occurs by access and disclosure from the outsourced server and unautho-
using hash functions and digital signature schemes together. In rized users. This is done by encrypting the data so that only the
the following, the most common authenticated data structures authorized users can decrypt it.
are abstracted. The authors, in [32], added the encryption properties as follows:
1- Sensitive data only is encrypted while insensitive data is kept
2.4.1. Message authentication code (MAC) unencrypted, 2- Different parts of data can be encrypted using dif-
MAC mechanism takes as input a variable length message and ferent encryption keys, 3- During query execution, data relevant to
an owner private key to produce an authenticator (MAC). When this query only is encrypted/decrypted.
The authors in [23], used a prototype toolset, called ‘‘Silverline”, of the chunk to integrity check purposes. This scheme applicable
to provide data confidentiality on the cloud. The procedure under- in strong medical or business data and it suffers from an important
goes as follow: 1 – Identify subsets of data, which can be function- challenge, data fragmentation in the private cloud. In [37] the
ally encrypted without breaking application functionality, by using authors proposed a cryptography technique that integrates both
an automated technique that marks data objects using tags, and encryption and obfuscation techniques of data before outsourcing.
tracking their usage and dependencies. 2 – Discard all data that It begins by checking the data type. If the data is in the form of dig-
is involved in computations on the cloud. 3 – Each subset is its, obfuscation technique is applied through specific mathematical
encrypted using symmetric encryption with different keys and functions without using a key, and if the data is in the form of
accessed by different sets of users. 4 – Store the subsets in the alphabets or alphanumeric, then it is encrypted by using symmet-
cloud. The keys used to decrypt the data are stored by the data ric encryption. This integration technique provides more data secu-
owner. To fetch data from the cloud, the user first contacts the data rity rather than encryption or obfuscation only.
owner to get the appropriate keys, and then sends the query to the
cloud. The input parameters to the query are also sent in an
encrypted form so that the cloud could execute the encrypted 4. Query processing over encrypted data
query, and then sends back the encrypted results. Then, the user
decrypts the result. This technique suffers from a lot of pre- After the data is encrypted and stored in the cloud, the question
computation functions to ensure data confidentiality, and the data is how the encrypted data would be processed without decrypting
owner server must be online all the time that she/he have the it. If the user decrypts the data each time he/she would use it, it
decryption keys. might affect the confidentiality of data. To solve this problem, dif-
In [34] the authors proposed a new scheme called Dual Encryp- ferent solutions have been proposed in the literature. The authors
tion to ensure data confidentiality, which enables cross- in [25–27], suggest new security method called the Homomorphic
examination of the outsourced data. The data owner generates Encryption method, which enables performing operations on
two keys using DES (cipher block chaining MAC) algorithm, pri- encrypted data without decrypting it. These operations might be
mary key and secondary key. Before outsourcing the data, the multiplication or addition, which fit the processing and retrieval
whole data is encrypted using the primary key, and only small sub- of the encrypted data. For example, two messages m1and m2 are
set of the data is encrypted using the secondary key. Then, the encrypted to m1 and m2, it is possible to compute: Fðm1, m2Þ,
encrypted data is merged together and stored in the cloud. This where F can be: addition or multiplication function, without
scheme suffers from storage overhead, which includes the decrypting the encrypted messages. However, it is not suitable
encrypted data, dual column, and the subset of the data. for many real world applications, because of its computational
In [35], the authors proposed using biometric encryption to complexity.
encrypt the biometric outsourced data which includes iris recogni- In [17], the authors provide a system that can process a match-
tion, voice, fingerprint, and face reorganization. Biometric encryption ing query over the encrypted data. The proposed system consists of
is done by generating a random key, and then using a binding algo- four phases. In the first phase, pre-processing and outsourcing: the
rithm to merge the biometric image with this key to generate the data is encrypted and sent to the CSP to be stored. In the second
Biometrically-encrypted key to encrypt the data. To decrypt the data, phase, query pre-processing: the query is pre-preprocessed before
using biometric encryption retrieval algorithm, the Biometrically- sending to the CSP, which means each value in the query will be
encrypted key is merged with the biometric image to retrieve the encrypted, so the query can be searched over the encrypted data
key. And this technique is useful only with biometric data. in the cloud without the need to decrypt the data. This depends
The authors in [36] proposed a scheme, which is appropriate in on the encrypted key used to encrypt the data itself. This key must
a hybrid cloud to ensure the data confidentiality on the public be the same as the one used to encrypt the value. In the third
cloud. First, the data file to be outsourced is fragmented into smal- phase, query processing and response: the received query is pro-
ler file chunks, and these chunks are encrypted and then stored in cessed by the CSP. The server will search for the first match for
the public cloud with different locations. The process of file frag- the query condition, scan each attribute to get the matches, and
mentation occurs on the private cloud and then the chunks sent then send the encrypted results to the query issuer. In the fourth
in a different order than obtained from the original file to the pub- phase, query post-processing and result: the received encrypted
lic cloud. Every data chunk upload operation is recorded in a Chunk result is decrypted by the query issuer.
Distribution Dictionary (CDD) that is stored in the private cloud. The authors in [18] proposed encrypting the sensitive attributes
Every record from the CDD has three fields: 1-the index of the of each table in the database, and then storing the encrypted table
chunk in the original file, 2-the location of where the chunk is in the cloud, EncryptedDataTable (EDT). After that, another table is
stored (in the case of using disturbed CSP), and 3-the hash value created from the original table that contains two columns: a data
M. Rady et al. / Ain Shams Engineering Journal 10 (2019) 275–285 279
column, which contains a copy of the sensitive data columns, but – Three entities: CSP, the owner/user and Trusted Third Party
kept unencrypted, and a key column, where encrypted data is kept. (TTP), which will be explained in the next section.
This table is called QuerySearchTable (QST). The records in QST are
re-ordered randomly, as not as the same in the EDT. Only autho- Data integrity between CSP and Owner/User: As shown in Fig. 4,
rized users are allowed to access the encrypted data, when a search the data owner stores the encrypted data in the CSP, and can insert,
query issues, the search is done first in the QST, find the key to EDT, update or delete data from CSP. Then, the data owner authorizes
decrypt the key and then decrypt the result from EDT and the users, who will have the ability to issue a query and get the result
resulting records will be returned to the user. This technique from the CSP. The integrity check of query result done by the query
returns only those records satisfying the user query and no addi- issuer that check if the answer is correct, complete and from the
tional record will be given. last updated version uploaded to CSP.
Data integrity based different approaches: Different approaches
5. Data integrity have been proposed to provide integrity guarantees of query
results. These approaches have been categorized into four types:
Data integrity includes three aspects: Digital Signature based schemes, Data structure based schemes,
Deterministic based schemes, and Bucket-based index schemes.
– Correctness: the query issuer is able to validate that the
returned results do exist.
– Completeness: the result is complete and no answers have been 5.1. Digital signature based schemes
omitted from it.
– Freshness: the results are based on the last version of the data. In [7,24], the authors apply different signature schemas to a dif-
ferent outsourced database model. The outsourced database mod-
Two models to store the data in the cloud, database as a service els are: Unified Client Model (each database is used by only one
and data as a service. entity), Multi- Querier Model (owner and queriers), and Multi-
Owner Model (one database can have multiple owners).
– In the database as a service model: the integrity guarantee is In [6], the message is hashed first using a hash algorithm such
applied on the query result returned from the CSP. The integrity as: SHA1, then the signature is computed on the hash value.
is provided at four different levels: table (entire relation), column Because of RSA signature multiplicatively homomorphic property,
(attribute of the relation), field level (individual value), or row it allows multiple signatures generated by a single signer to be
(record/ tuple from the table). The integrity verification at the aggregated into one condensed signature and to verify this signa-
table/column level is expensive since it can be performed only ture, and it ensures that each signature has been generated by
by the query issuer, in which all the data corresponds to that the actual signer. Condensed-RSA is appropriate to Unified-Client
table/ column should be returned in the result. Because the table as well as Multi-Querier models. BGLS signature allows multiple
or column was signed, the signature should be decrypted to signatures generated by different signers on different messages
detect any unauthorized modification. At the field level, it is too to be aggregated into single signature based on elliptic curves
complex to sign each value in the table. Therefore, data integrity and bilinear maps, this appropriate to all three models of the out-
suffers from a high verification overhead for the query issuer and sourced database. DSA signatures allow a certain amount of pre-
for the server as well as table/ column level. At the record level, computation, in which the signature is computed on the hash value
the integrity guarantee is only on the entire record, which con- for each message individually and to verify different signatures
tains the query result, as each individual record was signed, aggregated based on the multiplicative homomorphic property of
and it is the best solution to provide data integrity [6,7]. DSA. DSA signature aggregation is appropriate to all three models
– In the data as a service model: The integrity guarantee is applied of the outsourced database.
on the data returned from the CSP. This data could be a message, In [24], the authors proposed a technique to make the con-
file, block of a file, or even sector of the block of the file [15]. densed RSA and BGLS signatures immutable, in which new signa-
tures computed from a set of other aggregated signatures to
Data integrity can be done between: protect it from the adversary, as it would be hidden. There are
two extensions of Condensed RSA: 1-Interactive, and 2-Non-
– Two entities: CSP and the owner/user, which discussed in this interactive. In the Interactive approach, the server attaches a tag
section. with a result to the query issuer, and sends a random challenge
to the server, which gives a valid response, together, convinces
the query issuer of server’s knowledge of signature. In the non- with the owners’ private key. To verify the query result, aggregate
interactive approach, uses a signature of knowledge method the individual signatures of tree roots, which involved in the query
‘‘SKROOTLOG” that is universally verifiable; the server sends the result by using the aggregated signature scheme.
result with SKROOTLOG proof and the query issuer verifies by In [9], the authors proposed different data structures to both
checking this proof. The extension of aggregated BGLS, the server static and dynamic scenario. Aggregated Signatures with B + trees
computes its own signature on the query answer and aggregates schema, the data owner creates B + Tree for each attribute, which
it with the aggregated BGLS signature of the owner. contains the hashes of all consecutive pairs of records, and then
In [19], the authors use Signature Aggregation and Chaining the root signed. In the static scenario, the server constructs a VO
approach to authenticate the query. Before outsourcing the data- contains one consecutive pair per query result and one record from
base, each record is signed and the signature is outsourced too. the left and one from the right. In the dynamic scenario, the left
To answer a query, the server sends the matching records and their and right neighbors have to compute their signature also. This
signatures, which are aggregated into a single signature using schema drawback is the very high verification cost. While the Mer-
Condensed-RSA signature or Aggregated-BGLS scheme. This kle B-tree consists of ordinary B + tree nodes that are extended
ensures the query correctness. To achieve the query completeness, with one hash value associated with every pointer entry. The leaf
it proposes a signature chain of record, which is computed as: the nodes contain the hash value of records associated with index node
hash of the record concatenation with the hashes of all immediate computed on the concatenation of the hash values of their children
predecessor records, which is a record with the highest value of and the hash of the root signed using the owners’ private key. In
attribute that is less than the value of the given record along with the static scenario, to answer a range query the server builds a
this attribute, and the owners’ private key. VO that returns the data in the nodes between the discovered
leaves, the hash values of the left and the right nodes and the
5.2. Data structure based schemes signed root of the tree. In the dynamic scenario, Merkle B-tree is
very efficient in update operations, to update such a record, only
The data structure provides a verification object, VO, with the the path from the affected leaf node to the root need to be updated.
answers for authenticating the results. It occurs by using both hash In [21,31], Merkle B-tree created based on the table in the
functions and digital signature schemes. record level as shown in Table 1, the attribute values are used as
In [2], the authors proposed an authentication scheme, by con- keys in the MBT to index. To identify and preserve the order of
structing Merkle Hash Tree (MHT) on individual records. The leaf the pointers in internal nodes and records in leaf nodes of a MBT,
nodes contain the hash value of each record from such a table. Radix-Path Identifier scheme is proposed as shown in Fig. 5, which
The ascendant nodes are the concatenation of the descendant use numbers based on a certain radix to identify each pointer or
hashed nodes and the root after all nodes hashed would be signed record in a MBT depends on its level and position in the MBT.
The root of MBT will be updated each time a database updated,
and by sharing the new root signature with the users, freshness
Table 1 can be guaranteed. After the pointers are identified, the authentica-
Data table.
tion data associated with them stores into a database as Single
ID Name Email Salary Authentication Table (SAT) or Level-based Authentication
1 Mai [email protected] 1000 Table (LBAT). In SAT, store all authentication data as data records
2 Sarah [email protected] 1200 called Authentication Data Record (ADR) into one table in a data-
3 Len [email protected] 1400 base, where its corresponding data table is stored. In LBAT, store
4 John [email protected] 1700 ADRs in different levels to different tables and create one table
5 Ahmed [email protected] 2000
6 Nancy [email protected] 2500
per level for an MBT except for the leaf level along with a mapping
7 Lina [email protected] 3000 table to indicate which table corresponds to which level.
8 Will [email protected] 4000 In [20], a verification metadata (tree) is generated over data-
base, used by the user to verify the query freshness. A signature
Fig. 5. The data table converts to Merkle B-Tree with Radix Path Identifier.
M. Rady et al. / Ain Shams Engineering Journal 10 (2019) 275–285 281
Certificate is assigned to each root within an expiration time d, as 5.4. Bucket-based index schemes
every expiration units of time the certificate will be replaced on
the current signature no matter there is an update or not. The sig- Authentication is performed at bucket level rather than record
nature will be changed, and the new signature is stored in the out- level. A bucket is generated by partitioning database with the
sourced database. equivalent data range (values) or with the equal number of data
(count). Such that all buckets are security satisfied and efficiency
5.3. Deterministic based schemes optimized [38]. Bucket-based index contains a bucket id BID, data
range (upper-lower bound) (U,L), a number of records in the
In [20,22], the authors proposed a probabilistic method to bucket and a checksum as shown in Table 2 using the data table
check the query completeness. Before the database is outsources, from Table 1. A bucket checksum is a hash digest, returned with
fake records are inserted to it. When query issues, the fake a result. When the result received, the user calculates the check-
records, which is satisfying the conditions returned with the sum of the result again and compares it with received ones. If
result. In order to know which set of the inserted records should equal, this guarantees the query result authenticity [33].
be returned, use deterministic function, to definite the inserted
records and send this definition of the function to the users, 6. Data integrity between CSP, owner/user and TTP
and the user would be verified whether all the fake records is
presented within the query result. If at least a fake record is miss- The database owner outsources their data management to a
ing, the query result is not complete. Query freshness can be trusted third party to check the integrity of data and save their
guaranteed using this method also; as the function determined computation resources. As shown in Fig. 6, the data owner stores
by a randomly chosen key to definite the set of fake records the encrypted data in the CSP, and can insert data, update or delete
and the time the function should be next used is also decided data from CSP. Then, the data owner authorizes users. The data
on a randomly chosen key. owner delegates the integrity check of the query result to the
In [34], the authors proposed a scheme called Dual Encryption trusted third party server that checks if the answer is correct, com-
to ensure query integrity, as using two different keys, a primary plete and from the last updated version uploaded to CSP.
and a secondary, and the query result would be duplicated, com- Data integrity based different approaches: Different approaches
paring the returned result to obtain the integrity assurance. have been proposed to provide integrity guarantees of query
results. These approaches have been categorized into three types:
Hash based schemes, Data structure based schemes, and other
schemes.
Table 2
Bucket based index.
Fig. 6. Using TTP to provide mutual trust between owner/ user and the CSP.
282 M. Rady et al. / Ain Shams Engineering Journal 10 (2019) 275–285
private key D0 . The encrypted message and the signature are stored group, there is an encryption key (KE) and an asymmetric key pair
in CSP. In this system there are three different ways to check integ- (KS, KV) for signing and verification. Using broadcast encryption to
rity, 1-Integrity check between user and CSP, the CSP uses the distribute the keys and data storage. The data organized as follows
receiver’s public key on the signature to retrieve the digest D and (encrypted block, metadata, signature, tag). A block’s metadata
runs the hash algorithm on the encrypted message to get the digest includes a global id, data block identification, a version number,
D, then compares the two digests, if equal, the message is accepted. KV and the broadcast messages of KE and KS. A signature com-
Otherwise, send to the user that the message in the cloud is mod- puted from the concatenation of the encrypted block and its meta-
ified. 2-Integrity check between user and TTP, TTP would take the data, along with a tag computed from the encrypted data block,
digest from the CSP and the message signature. Then, decrypt the which is used as proof to verify the data. A version authentication
message signature with the public key and finds the message tree (VAT) would be constructed for each block group and stored in
digest. Compare the two digests and then let know the user if the CSP along with the data. The tree root is signed with the latest
the CSP was trusted or not. 3-Integrity check between TTP and version signing key KS of a block group. The integrity checked with
CSP, the CSP takes the digest from TPA (D). CSP would takes the the signature attached to group block to ensure the data correct-
data signed from the TTP and decrypts it with public key and finds ness and completeness. To ensure the data freshness, when a data
the message digest. Compare the two digests, and then the CSP will block in a group is updated, the metadata attached to the block is
inform the user if the TTP was trusted or not. updated. The metadata contains the version number of this block
which increments when an update is done. The version authentica-
6.2. Data structure based schemes tion tree (VAT) of the block group will also be updated. The fresh-
ness verified each time a query issuer request query from the
In [10], the authors use a HLA to generate verification metadata cloud, where the VAT root is authenticated using the latest version
over data and store it with the data in CSP, used by the user to ver- of root signing key.
ify the correctness of data. The proposed scheme consists of four As in [20], an expiration time is assigned to the FIS, if the data is
algorithms: 1-KeyGen, runs by the owner to generate the public updated or not, the version number saved in TTP is updated. When
and secret keys of the system. 2-SigGen, preprocesses the data file this expiration time is short, the data freshness guarantee is higher,
as the owner generates the verification metadata, stores it in CSP but this will increase the network transmission overhead. In [15],
and deletes its local copy. 3-GenProof, a proof is generated by using the authors use block status table (BST) data structure to recon-
the data file and its verification metadata as inputs. 4-VerifyProof, struct the file to blocks and outsource it to the CSP. The BST con-
verifies the proof by the TTP, and decrypts the result. sists of three columns: (1) serial number (SN): an indexing to the
The authors in [16], use Message Authentication Code Chain file blocks, (2) block number (BN): counter used to make logical
(MACC) scheme to ensure Query Authentication. The verification numbering to the file blocks, and (3) key version (KV): indicates
process is implemented in the TTP server. The table structure is the version of the key used to encrypt each block. The BST imple-
modified before outsourced to support integrity verification as fol- mented as a linked list to simplify the insertion and deletion of
lows: R(A1,. . .. . ..., An, version, precursor, checksum), where A table entries. The data owner enforces access control by revoking
denotes as attribute, ‘‘version” contains the update numbers of a access rights to the outsourced data and by using the broadcast
record, ‘‘Precursor” field stores the concatenation of precursor’s encryption to encrypt the message for a group of users. To access
primary key of each searchable attribute and the MAC value of a the data, the authorized users send a request to the CSP, and
record calculated and is saved in ‘‘checksum” field. When the user receive the data file in an encrypted form.
issues a query, the result received from CSP send to the TTP that The authors in [39] propose a new scheme for query correctness
reconstructs each record’s MAC value and compare it with the and completeness based on invertible Bloom filter (IBF) and use
checksum, if equal, then the record is correct, and send the result trusted third party server (TTP) and this scheme supports multi-
to the user. If the result set forms a complete chain from the first user setting by incorporating multi-party searchable encryption.
record boundary to the last record boundary, then the query The IBF is constructed over the whole database, for each attribute
results are complete. The data freshness is guaranteed as the data column in table; all distinct attribute values are treated as key-
owner stores freshness summary information (FIS) in the TTP, words to construct an index. For each attribute column value, com-
which includes the primary key and the version of each updated pute a hash. Then, construct MHT and the hashes stored in the leaf
record. nodes. Each attribute value together with its index is encrypted
While in [13], the data owner chooses certain keywords from using symmetric encryption. When the user receives the query
the data and creates an index table to search efficiently using result, he/ she reconstruct the hash of the record to check its cor-
public-key encryption with keyword search (PEKS) scheme. And rectness. If an empty set is returned, the user can verify the query
also generates public/private key pair for the PEKS tokens genera- correctness by checking whether the search request belongs to the
tion and moves the encrypted index table to two index clouds. IBF. Besides, using the property of member enumeration of IBF, the
Then encrypts the data and generates HMAC to the encrypted data data user can check whether all the desired records are returned.
and store them in the CSP. Then, the CSP returns a unique identifier
of the data to the owner and by specifying it, the query issuer can
use it to retrieve, add or delete data. By using TTP, the user issues 6.3. Other schemes
queries to both the index clouds to compare the identifiers to check
the data correctness. Unqualified files in query results can be found In [12], the authors proposed a scheme that can handle multiple
once the HMAC verification is complete. To check query result sessions of various users at the same time by using TTP to audit
completeness, there is a counter for each keyword, because their external knowledge files by batch for higher potency. By
unqualified files can be included in query result and the number implementing this scheme, reduce the computation and the com-
of returned files is correct, but it will be found once the HMAC ver- munication overhead between the owner/user and the CSP. And
ification is complete. to evaluate this mechanism, they use the CLOUDBEES service
Adopting a new data structure in [14], which is version authen- (the first PaaS to support the entire application lifecycle from
tication tree (VAT), constructed by employing the Merkle Hash development to deployment) to host the cloud application and per-
Tree. In this system, the file is divided into blocks and there are forming the multiple users to upload data and performing auditing
groups, each group consists of a number of blocks. For each block service using the TTP and maintain generation proof from CSP.
M. Rady et al. / Ain Shams Engineering Journal 10 (2019) 275–285 283
7. A comparative summary of data confidentiality and integrity struct the file into blocks, this ensures data correctness and confi-
techniques dentiality. In [21,31], MBT is used to ensure data integrity, as MHT
the root signature can guarantee the data correctness and fresh-
As explained in previous sections, many studies have been dis- ness, and if the result set forms a complete signature chain, data
cussed or implemented to check the outsourced data confidential- completeness is also guaranteed. In [20,22], the data integrity is
ity and integrity based on different approaches. Early works [6,7], ensured by inserting or deleting fake records to the DB and the data
use digital signatures to ensure the data correctness. In addition, freshness is guaranteed by using an expiration time to the function
the digital signature is used in [19,24] to ensure the data correct- of inserting or deleting fake record. In [34], the dual encryption
ness and completeness. In [6,11,19], the hash algorithm is used scheme is used; this scheme can guarantee data confidentiality,
to ensure the data correctness. In [6,19], after the data is hashed, correctness, and completeness. In [39], using IBF over the whole
the data would be signed using the digital signature as RSA. More- database and construct MHT, this guarantee the data correctness
over, the digital signature can also ensure the data confidentiality and by using the property of member enumeration of IBF, the data
as in [17], where the data is encrypted before it is outsourced. completeness is guaranteed. In [33,38], the data correctness and
The data structure is also used to ensure data integrity and confi- completeness are guaranteed at bucket level. Most of the previous
dentiality. In [2], MHT is used to ensure data correctness. In works are ensure the data integrity and few give attentions to the
[10,13,16], verification data are generated using a MAC or HLA on data confidentiality, as the data must be encrypted before out-
the hash value of data, and is stored with data, to verify the data sourced. In [23], prototype toolset called ‘‘Silverline” is used to pro-
correctness. In [13], the data completeness can also be verified, vide data confidentiality. In [35], use biometric encryption to
using PEKS scheme. In [16], the data completeness verified by encrypt the biometric data. In [37] integrate encryption and
the MAC chain and the freshness guaranteed by the information obfuscation techniques to data before outsourced. In [25–27], a
that stored in TTP. In [14], collecting number of blocks from file new method called the Homomorphic Encryption is used. In [36],
into group and for each block group create VAT, sign the root, using both private and public cloud to store data to ensure the data
the data correctness, completeness and also the freshness can be confidentiality and this scheme can also guarantee data
guaranteed by the signature. In [15], BST is implemented to recon- correctness.
Table 3
Comparative summary.
An overview of different algorithms and techniques to check be signed using asymmetric encryption algorithm BGLS. The
data confidentiality and integrity based on different schemes is Merkle B-Tree is stored in TTP as metadata to check the
presented in Table 3. query result integrity.
2. Audit and Result phase, which contains query pre-processing,
8. Proposed architecture for secure outsourcing querying and result integrity checking.
(a) The user logins and sends the original query, which is pro-
To achieve data security, while the database is outsourced to cessed by the Query filter. If it contains values that are
CSPs, we proposed a system that can handle data security prob- encrypted in the outsourced database, these values will be
lems including, data confidentiality, data availability, data privacy, encrypted using AES key.
query integrity verification, and query processing over encrypted (b) The query is executed in the CSP, and the received result is
data. We propose an architecture, which integrates different tech- processed by the Query filter. In the Query filter, the
niques to ensure the data security, and we use trusted third party received result is hashed, and using the metadata, the tree
(TTP) server to verify data integrity as it reduces the computation root signature is decrypted. Then we can compare between
and the communication complexity and cost on the data owner the new computed hash values of the result and the leaf val-
and the user servers. ues; if equal then the integrity check result is correct.
This architecture consists of two phases as shown in Fig. 7: (c) The result is returned to the query translator, which
decrypts the result and sends them to the user. The data
1. Setup phase, which contains the database pre-processing, out- correctness and completeness of the query result can be
sourcing and the user authorization. guaranteed using this comparison. Data freshness assured
(a) we assume that the data owner encrypts the sensitive attri- by the root signature that each update done in the out-
butes using symmetric encryption algorithm (advanced sourced database, the root signature will be reassigned with
encryption standard) AES, and gives authorization to users. new signature; and the owner will publish the new key to
The AES key is delivered to the authorized users by using the authorized users using broadcast encryption.
the broadcast encryption, which is used to decrypt the
result later. By using the AES and broadcast encryption, Data availability can be achieved by storing several data copies
the data confidentiality of the outsourced database can be on multiple cloud servers; so that if the server is down because of
guaranteed. The encrypted database is stored in the CSP. the different threats, such as the denial of service attacks and nat-
(b) The data owner creates a Merkle B-Tree data structure over ural disasters, we can use another server.
the encrypted database. Each tree is built on one attribute
from one table; each value is hashed using MD5, and stored 9. Conclusions and future work
in the leaf nodes, which contains pointers to those values.
The inner nodes contain the concatenation of the hash val- Despite the significant growth of data and database as services
ues of their children nodes and then the root of the tree will (DaaS, DBaas), there are still challenges to the widespread adoption
M. Rady et al. / Ain Shams Engineering Journal 10 (2019) 275–285 285
of these services. The most significant challenge is data security. [23] Puttaswamy KPN, Kruegel C, Zhao BY. Silverline: toward data confidentiality in
storage intensive cloud applications. In: Symposium on cloud computing
There have been a number of contributions to satisfy the security
(SOCC), Portugal; 2011.
requirements to outsource the data: data confidentiality, query [24] Mykletun E, Narasimha M, Tsudik G. Signature bouquets: immutability for
integrity verification, and query processing over encrypted data. aggregated/ condensed signatures. In: European symposium on research in
computer security, France; 2004.
[25] Ryan Mark D. Cloud computing security: the scientific challenge, and a survey
of solutions. Elsevier; 2013.
[26] Zhao F, Li C, Liu CF. A cloud computing security solution based on fully
References homomorphic encryption. In: International conference on advanced
communication technology (ICACT); 2014.
[1] Attas D, Batrafi O. Efficient integrity checking technique for securing client [27] Tebaa M, El-Hajji S, El-Ghazi A. Homomorphic encryption applied to the cloud
data in cloud computing. Int J Comput Sci Issues (IJCSI) 2011;11. computing security. In: The World Congress on Engineering, vol. I, London, U.
[2] Ma D, Deng RH, Pang H, Zhou J. Authenticating Query Results in Data K; 2012.
Publishing. The International Conference on Information and Communications [28] Raghuvanshi K, Khurana P, Bindal P. Study and comparative analysis of
Security, Berlin 2005. different hash algorithm. J Eng Comput Appl Sci (JECAS) 2014;3.
[3] Arora I, Gupta A. Cloud databases: a paradigm shift in databases. Int J Comput [29] Boneh D, Lynn B, Shacham H. Short signatures from the weil pairing. In: The
Sci Issues (IJCSI) 2012;9. 7th international conference on the theory and application of cryptology and
[4] Ferrari E. Database-as-a-Service: challenges and solutions for privacy and information security: advances in cryptology, London; 2001.
security. In: Services computing conference IEEE, Asia; 2009. [30] Silverio A, Custodio R, Carlos M, Mello R. Efficient data integrity checking for
[5] Mehak F, Masood R, Ghazi Y, Shibli MA, Khan S. Security aspects of database- untrusted database systems. In: The 6th international conference on advances
as-a-service (DBaaS) in cloud computing. Switzerland: Springer International in databases, knowledge and data applications; 2014.
Publishing; 2014. [31] Niaz M, Saake G. Merkle hash tree based techniques for data integrity of
[6] Hacigumus H, Iyer B, Mehrotram S. Ensuring the integrity of encrypted outsourced data. In: The 27th GI-workshop on foundations of databases,
databases in the database as a service model. In: Data and Applications Germany; 2015.
Security XVII. US: Springer; 2004. p. 61–74. [32] Shmueli E, Vaisenberg R, Gudes E, Elovici Y. Implementing a database
[7] Mykletun E, Narasimha M, Tsudik G. Authentication and integrity in encryption solution, design and implementation issues. Elsevier; 2014.
outsourced databases. ACM Trans Comput Logic 2006. [33] Wang J, Du X, Lu J, Lul W. Bucket-based authentication for outsourced
[8] Singh S, Maakar SK, Kumar S. A performance analysis of DES and RSA databases. J Concurr Comput: Pract Exp 2010.
cryptography. Int J Emerg Trends Technol Comput Sci (IJETTCS) 2013. [34] Wang H, Yin J, Perng C, Yu PS. Dual encryption for query integrity assurance.
[9] Li F, Hadjieleftheriou M, Kollios G, Reyzin L. Dynamic authenticated index In: Conference on information and knowledge management; 2008.
structures for outsourced databases. USA: ACM Management of Data [35] Omar MN, Salleh M, Bakhtiari M. Biometric encryption to enhance
(SIGMOD); 2006. confidentiality in cloud computing. In: International symposium on
[10] Wang C, Sherman SM, Wang Q, Ren K, Lou W. Privacy-preserving public biometrics and security technologies (ISBAST); 2014.
auditing for secure cloud storage. IEEE Trans Comput 2013;62. [36] Butoi A, Tomai N. Secret sharing scheme for data confidentiality preserving in
[11] Shantala C, Kumar A. Integrity check mechanism in cloud using SHA-512 a public-private hybrid cloud storage approach. In: IEEE/ACM 7th international
algorithm. Int J Eng Comput Sci 2014;3. conference on utility and cloud computing; 2014.
[12] Lakshmi A, Kavitha D. An implementation of public auditing mechanism for [37] Arockiam L, Monikandan S. Efficient cloud storage confidentiality to ensure
secure cloud storage. Int J Adv Res Comput Sci Softw Eng 2014;4. data security. In: International conference on computer communication and
[13] Fu, Tseng K, Liu YH, Chen RJ,Toward authenticated and complete query results informatics (ICCCI), India; 2014.
from cloud storages. In: 11th international conference on trust, security and [38] Wang J, Du X. LOB: Bucket based index for range queries. In: 9th international
privacy in computing and communications; 2012. conference on web-age information management, China; 2008.
[14] Jin H, Jiang H, Zhou K, Wei R, Lei D, Huang P. Full integrity and freshness for [39] Wang J, Chen X, Li J, Zhao J, Shen J. Towards achieving flexible and verifiable
outsourced storage. In: IEEE/ACM international symposium on cluster, cloud search for outsourced database in cloud computing. Future Gener Comput Syst
and grid computing; 2015. 2016.
[15] Barsoum A, Hassan A. Enabling dynamic data and indirect mutual trust for
cloud computing storage systems. IEEE Trans Parallel Distrib Syst 2013;24.
Mai Rady received a B.Sc. degree in Information Technology and Computing in 2011
[16] Hong J, Wen T, Guo Q, Sheng G. Query integrity verification based-on mac
from Arab Open University, Jeddah, Saudi Arabia. Currently, she is a master student
chain in cloud storage. Int J Netw Distrib Comput 2014;2.
[17] Purushothama BR, Amberker BB. Efficient query processing on outsourced in information system department, Ain Shams University, Egypt, Cairo. Her current
encrypted data in cloud with privacy preservation. In: International research interests include cloud computing, database security, and data mining.
symposium on cloud and services computing; 2012.
[18] Sharma M, Chaudhary A, Kumar S. Query processing performance and Tamer Abdelkader received a B.Sc. degree in electrical and computer engineering
searching over encrypted data by using an efficient algorithm. Int J Comput and an M.Sc. degree in 2003 from Ain Shams University, Cairo, Egypt. He received
Appl 2013;62. M.Sc. and Ph.D. degrees in 2012 in electrical and computer engineering from the
[19] Narasimha M, Tsudik G. Authentication of outsourced databases using University of Waterloo, Ontario, Canada. He worked in the University of Waterloo as
signature aggregation and chaining. In: International conference on database a postdoctoral, and a network consultant in Egypt. Currently, he is an associate
systems for advanced applications (DASFAA); 2006. professor at Ain Shams University. He is the author of several publications in IEEE
[20] Xie M, Wang H, Yin J, Meng X. Providing freshness guarantees for outsourced journals and conferences. His current research interests include mobile computing,
databases. The 11th International conference on extending database vehicular networks, energy-efficient protocols, mobile social networks, and cloud
technology: advances in database technology (EDBT). USA: ACM; 2008. computing.
[21] Wei W, Yu T. Integrity assurance for outsourced databases without DBMS
modification. Int Federat Inform Process 2014.
Rasha Ismail worked in Ain Shams University as associate professor. she is the
[22] Xie M, Wang H, Yin J, Meng X. Integrity auditing of outsourced
author of several publications in IEEE journals and conferences.
data. Austria: Very Large Data Bases (VLDB); 2007.