Confidentiality and Integrity of Query Results For Cloud Databases
Confidentiality and Integrity of Query Results For Cloud Databases
2022
I
Acknowledgements
II
Abstract
The digital data amount is increasing at a phenomenal rate, outpacing the local storage ability of
many organizations. Therefore, outsourcing data is considered a solution to store large data into
efficient cloud servers. Cloud computing environment offers on demand access to several
computing services providing benefits such as reduced maintenance, lower fundamental costs on
different resources, global access, etc. Database as a Service (DBaaS) is one of its prominent
services. Database Service Providers (DSPs) have the infrastructure to host outsourced databases
at distributed servers and provide efficient facilities for their users to create, store, update and
access databases anytime from any place through Internet. Outsourcing data into cloud servers
have many features as flexibility, scalability and robustness, but in order to provide these, features
as data confidentiality and data integrity are often sacrificed .
This thesis presents secure scheme to provide Confidentiality and Integrity of Query results for
Cloud Databases (SCIQ-CD). The scheme architecture integrates different cryptographic
techniques such as: AES, RSA, and SHA2 to achieve the data security properties. The scheme
allows the data owner to outsource their database, which includes sensitive data to a DSP, and
performs dynamic and static operations on the outsourced Database. In addition, the scheme uses
trusted third-party server that serves as intermediate server between the DSP and users to check
the data confidentiality and the data integrity. The security analysis demonstrates that our
construction can achieve the desired security properties. The performance analysis shows that our
proposed scheme is efficient for practical deployment, it imposes a small overhead for select,
range, update and a reasonable overhead for insert and delete.
III
Table of Contents
Acknowledgements ......................................................................................................................... II
Abstract ......................................................................................................................................... III
Table of Contents .......................................................................................................................... IV
List of Figures ............................................................................................................................... VI
List of Tables ............................................................................................................................... VII
Introduction ..................................................................................................................................... 1
1.1 Motivation .....................................................................................................................1
1.2 Problem definition ..............................................................................................................2
1.3 Challenges ...........................................................................................................................3
1.4 Objectives ...........................................................................................................................4
1.5 Contribution ........................................................................................................................4
1.6 Thesis Outline .....................................................................................................................4
Background and Related Work ....................................................................................................... 5
2.1 Cloud Computing................................................................................................................5
2.1.1Cloud computing characteristics ................................................................................ 6
2.1.2 Cloud computing service models .............................................................................. 6
2.1.3 Cloud computing deployment models ...................................................................... 7
2.2 Cloud Database Service ......................................................................................................8
2.2.1 Cloud Data Models ................................................................................................... 9
2.3 Database as a Service (DBaaS) challenges .......................................................................10
2.4 Database privacy..........................................................................................................11
2.4.1 Data confidentiality................................................................................................. 12
2.4.2 Query processing over encrypted data .................................................................... 13
2.4.3 Data integrity .......................................................................................................... 14
2.5 Conclusion ........................................................................................................................18
SCIQ-CD: Confidentiality and Integrity of Query results for Cloud Databases Scheme ............. 19
3.1 Notations and Abbreviation .........................................................................................19
3.2 System preliminaries ...................................................................................................20
3.2.1 AES: Advanced Encryption Standard ..................................................................... 20
3.2.2 SHA256: Hash function h( ) ................................................................................... 20
3.2.3 RSA: Digital Signature ........................................................................................... 20
IV
3.2.4 MBT: Merkle B-Tree ............................................................................................... 20
3.3 System Model ..............................................................................................................24
3.3.1 Database Storage Model ......................................................................................... 24
3.3.2 Assumptions and Attack Models ............................................................................ 25
3.3.3 Security goals .......................................................................................................... 26
3.4 System Design .............................................................................................................26
3.4.1 Setup and Database Preparation: ............................................................................. 26
3.4.2 Database Accessing ................................................................................................. 29
3.4.3 Query Assurance and Processing ............................................................................. 29
Data Operations on the Outsourced Data using SCIQ-CD Scheme ............................................. 19
4.1 Data Operations on the outsourced data using MSZ ........................................................32
4.1.1 Insert Operation ....................................................................................................... 32
1.1.2 Select operation ....................................................................................................... 34
1.1.3 Update operation ..................................................................................................... 36
4.1.4 Delete operation ....................................................................................................... 38
4.2 Data Operations on the outsourced data using TBI ..........................................................42
4.2.1 Insert Operation ....................................................................................................... 42
4.2.2 Select Operation ....................................................................................................... 45
4.2.3 Update Operation .................................................................................................... 47
4.2.4 Delete Operation ...................................................................................................... 49
Experimental Evaluation ............................................................................................................... 52
5.1 Scheme implementation ....................................................................................................52
5.2 Experiment setup ..............................................................................................................52
5.3 Performance analysis ...................................................................................................53
Conclusion and Future work ......................................................................................................... 59
References ..................................................................................................................................... 61
V
List of Figures
VI
List of Tables
Table 3.1: Symbols and abbreviations used in the proposed scheme 19-20
Table 3.2: Employee table 21
Table 3.3: Emp_0 (root authentication table) 24
Table 3.4: Emp_1 (internal nodes authentication table) 24
Table 3.5: Employee (data authentication table) 24
Table 3.6: ET for Employee table 27
Table 3.7: ST for Employee Table 27
Table 3.8: ET Table (leaf nodes authentication table) 28
Table 3.9: Emp_2 Table 28
VII
Chapter (1)
Introduction
This chapter briefly introduces the background of the research topic. The outline of this chapter is
the following: section 1.1 introduces the motivation of this work, Section 1.2 introduces the
problem statement, section 1.3 introduces challenges that face storing database in CSP, section 1.4
stating the research objective, section 1.5 contains the approach that has been followed, section
1.6 describes the design science research methodology that is applied, section 1.7 provides the
organization of the rest of our work and Section 1.8 concludes the chapter.
1.1Motivation
Cloud computing is a paradigm of computing that is shifting the way of thinking about IT industry.
It provides different computing services remotely rather than locally and these services accessed
through the Internet [17]. Since the invention of the internet in the 1970’s, there have been different
services offered to users; as they able to login remotely and transfer files via the FTP protocol, but
the cloud environment took the online services to a new dimension [18]. The cloud computing
services are hosted by cloud service providers (CSPs) and users of the CSP pay for these services
according to the service type and their usage of the service [1].
The cloud computing services can be classified into three models: 1) Infrastructure as a service
IaaS, 2) Platform as a service PaaS, and 3) Software as a service SaaS. In the infrastructure as a
service, users being able to use servers, storage, network settings on-demand from the CSP and
only pay per their usage. In the platforms as a service, users being able to build their own
applications as the CSP offered all resources and development tools such as: databases, which are
required to build the application and also maintain and secure this application. In the software as a
service, the application is hosted as a service offered to users through the internet and the users
doesn’t have to worry about its maintenance, backups or security [20, 21].
The benefits of Cloud Computing for its users are various and significant as in [2, 19]:
1. Avoid big initial investments in hardware and software purchasing,
2. Flexibility and scalability: providing services to the user according to their needs,
3. Availability: different services accessed from anywhere through an internet connection,
4. Reduce maintenance and operational costs,
5. Reduce costs as users only pay for the service per their usage.
Alongside cloud computing services, the data storage is an important service that has been
progressing, evolving and adapting [3]. The reason is that the digital data amount is increasing at a
phenomenal rate, outpacing the storage ability of many organizations. Therefore, storing data in
cloud servers is considered as a solution to store greater data into efficient distributed servers. By
storing the data in the cloud, the fundamental costs of different resources such as software,
hardware and even professionals hired to maintain the system are reduced [5]. The Storage as a
Service is offered by different cloud service providers that allow users or organizations to store
data on remote servers than organization servers. The Storage as a Service is offered under the
Infrastructure as a service model or Platform as a service model. In the Infrastructure as a service,
the CSP offered the infrastructure for users to be able to upload their data, and the user maintains
the data management storage system. In the Platform as a service model, the database to store data
offered by CSP which maintains the data management storage system and the user manages the
data .
Database outsourcing introduces a new paradigm, called database as a service (DBaaS) offered by
different Database Service Providers (DSPs) which have the infrastructure to host the outsourced
databases at distributed servers and provide efficient facilities for their users to create, store, update
and access databases anytime from any place through Internet connection. The relational databases
and non-relational databases can be outsourced, but our main concern in this work about relational
databases. Amazon’s SimpleDB, Amazon RDS, Google’s BigTable, Yahoo’s Sherpa and
Microsoft’s SQL Azure Databases are the commonly used databases in the Cloud [4].
Database Service Providers (DSPs) offer database cloud services in three different models: 1)
Virtual Machine (VM) image, 2) Database as a service (DBaas), 3) Managed Hosting. In the virtual
machine image model, the DSP offers infrastructure, which a Database Management System
DBMS can run, to users that can upload or purchase DBMS. In DBaaS model, the DSP maintains
the DBMS and the user manages the databases supported by the DBMS and paying for storage and
resources. In the managed hosting model, the three phases of database implementation installs,
maintains and manages is done by DSP [15, 29]. In our work we have implemented our scheme in
the DBaaS model as we will explain later.
By storing database in the cloud using the DBaaS model, the data management becomes one of the
DSP tasks to satisfy essential requirements that are inherent to these environments and the users
can concentrate on their main tasks. Some of these requirements are related to: 1) scalability by
handling the increase and decrease of active users and store the growing amounts of data, 2)
flexibility by allowing the storage of large amounts of data, and 3) availability as the users can
access the service from anywhere through an internet connection. The DSP offers these
requirements, but doesn’t guarantee the data security as the cloud servers and the networks are
often targets of malicious attacks. Moreover, the cloud server, it might be malicious and may
attempt to insert fake records into the database or modify existing records or even delete it [5]. To
guarantee the data security, the database may be encrypted before outsource, but we still need to
ensure that the query result is correct, complete and from the last version of the data. And this is
the motivation of our work.
2
provide their own server, which use another server because of the cost effectiveness
and flexibility.
2. Privacy issues: the CSPs enforce their own policies to ensure the security of the
database stored on their servers. The database stored in the CSPs accessed only by
authorized users. Checking users’ authorization done on the provider side and also the
user side.
3. Application issues: the database monitoring and maintenance should be done by the
CSPs to ensure that the cloud is secure and not infected by malicious code that have
been uploaded to the cloud by hackers or attackers with the purpose of stealing
sensitive information or even damaging the information of certain users.
Among the previously issued, the main question here is how we can protect the outsourced
database from inside and outside malicious or attackers. Here are the important security
issues to be addressed during our work, data confidentiality and data integrity. Data
confidentiality can be guaranteed by encrypting the database by the data owners’, using a
certain key before outsourcing it to the DSP and only the authorized users can decrypt it [6].
Unauthorized users and the DSP will not have the key to decrypt the data. Data integrity is
applied on query results, which are retrieved from the DSP. Data integrity includes three
aspects: 1) Correctness, the returned results do exist in the outsourced database, 2)
Completeness, the result is complete and there is not any missing parts, and 3) Freshness,
the results are based on the recent version of the data .
Our main issue in this work is how to achieve the confidentiality and integrity of the query
result of cloud databases.
1.3 Challenges
Among the increasing of the digital data amount and speed of threats to outsourced
databases, there are some challenges meet the outsourcing database security as follows [22]:
1. Data Quality
By storing database in the cloud, the database is under control the CSP. The data owner
and users must ensure that the database is secure and the returned data is correct,
complete and from the recent version of the data. To evaluate and assure the data quality,
some methodologies and techniques are used.
2. Privacy preserving databases
To achieve the database privacy, different approaches can be used such as data
anonymization, which modify the data by removing all information that can directly link
data items with users like names or ID numbers, but this may not be enough to anonymize
the data. To overcome this problem, use data mining techniques that allow the recovering
of the removed information. When the data is under a lot of modifications, the database
quality may be affected [24].
3. Intellectual Property Rights
Watermarking techniques are used to protect the content of the organization's data with
from unauthorized duplication and distribution by enabling provable ownership of the
content.
3
4. Database Survivability
Database systems need to operate and continue their functions, even with reduced
capabilities. Moreover, prevents attacks on data and detect if one happened. Recover
corrupted or lost data and repair failed system functions to re-establish a normal level of
operation.
1.4 Objectives
This thesis focuses particularly on the important security issues to be solved among
outsource the database into the DSP. The most important security issues to be addressed are
data confidentiality and data integrity.
1.5 Contribution
In order to achieve the aforementioned objective, we implement and test a secure scheme to
provide Confidentiality and Integrity of Query results for Cloud Databases (SCIQ-CD) on
the Microsoft Azure Cloud. Data confidentiality guaranteed by encrypting the sensitive
attributes from the database before outsourcing it, and only the authorized users can decrypt
it [70]. Data integrity guaranteed by converting the database to authenticated data structure
and store it within the database in DSP, and data integrity, includes three aspects:
correctness, completeness, and freshness. In this model, we use Trusted Third Party (TTP)
to provide an indirect mutual trust between the data owner/users and the DSP, which checks
the data confidentiality and verifies the query result integrity before sending it to the users.
The performance of this solution is assessed in terms of response time for data outsourcing,
and data retrieving.
Chapter 2 provides the background information of our research regarding Cloud Computing,
database as a service, database outsourcing model, and database security issues.
Chapter 3 describes the design of the cloud based storage scheme (SCIQ-CD).
Chapter 4 describes the implementation of solutions in the Microsoft Azure Cloud.
Chapter 5 details the experimental design used to measure the performance of the solutions.
Chapter 6 presents the conclusions and further ideas to extend this research.
4
Chapter (2)
Background and Related Work
This chapter presents an overview of some of the significant topics relevant to this thesis. Section
2.1 describes cloud computing, its characteristics and the various services it provides. Section 2.2
gives a detailed description of the cloud database service as one of the key services provided in a
cloud environment. Section 2.3 presents some of the challenges existing in the database as a
service. Section 2.4 addresses database privacy as one of the crucial challenges in the DBaaS.
Section 2.5 concludes the chapter.
5
The architecture of cloud computing services has three main elements as shown in figure 2: 1)
user: who uses any hardware or software application from cloud services as the front-end to
perform their work, 2) web service: any software applications that are used to perform cloud
computing, 3) the cloud server: that provides the service to the user.
• On demand self-service: Enables the users to access the computing capabilities as server and
application automatically as needed without any interaction between user and service provider.
• Broad network access: The computing capabilities are available over the Internet and can be
used by heterogeneous users’ platforms as laptops, desktop and mobile phones.
• Resource pooling: the computing resources can be combined and dynamically assigned to
serve multiple users based on a multi-tenant model. Examples of resources include storage,
memory, and network bandwidth.
• Rapid elasticity: according to the user needs based on the demand, every computing capability
can be provisioned rapidly, elastically and/or automatically to scale out or in, to meet the
changes of the demand .
• Measured service: the cloud system automatically controls and optimize the resource usage to
provide monitoring, controlling and reporting for providing transparency between the service
provider and the user of the service.
6
Figure 2.2: Cloud Computing Service Models
Infrastructure as a Service (IaaS): is the service where users can use the fundamental computing
resources such as servers, network, storage, or hardware to create and run applications or even
operating systems. The IaaS requires a virtualization platform where the users can install and
configure a virtual machine which runs on the cloud servers. The CSP is responsible to control and
manage the Cloud infrastructure while the user is responsible for the management of the virtual
machine. Example of IaaS: Amazon’s Elastic Compute Cloud (EC2), Windows Azure Virtual
Machines, Google Compute Engine.
Platform as a Service (PaaS): is the service where a hardware or software platform is provided
to users to be able to create their applications. A platform could be hardware equipment, software
applications, or programming environment which can be used by users to create and run their own
applications or to improve applications. The CSP is responsible to control and manage the Cloud
infrastructure while the user can control and manage only the deployed applications. Examples of
PaaS: Google App Engine, AWS (Amazon Web Services), Microsoft Azure.
Software as a Service (SaaS): is the service provided by the CSPs where users didn’t have to
install the software applications to maintain or support it. The CSP offers the computing capability
which is deployed on a Cloud infrastructure and the users access the applications through a web
browser or application interface. Examples of SaaS: Google apps (Gmail, Google Docs),
YouTube, Facebook.
7
Community cloud: the cloud infrastructure is used exclusively by a community of users that
have the same requirements and concerns. This infrastructure managed and owned by the
members of the community or a CSP can be chosen for the management and operational tasks.
This model has the security benefits similar to private cloud, but are more cost effective than
owning an on premise private cloud.
Hybrid cloud: A hybrid cloud is formed when two or more distinct cloud models are used
together. Such as when a private cloud and community cloud is used together, or when public
and private cloud infrastructures are used together. Hybrid clouds are usually used when there
are different security requirements for different sets of data. For instance, highly sensitive data
can be kept in a private cloud while less sensitive data can be uploaded to a public cloud.
8
Atomicity: the transactions are all-or-nothing, when an update occurs in a database, either all
or none of the update becomes available to anyone beyond the user or application performing
the update. This means that only a fragment of the update cannot be placed into the database
and should a problem occur with either the hardware or the software involved.
Consistency: the system stays in a stable state before and after the transaction. If a failure
occurs, the system reverts to the previous state.
Isolation: transactions that occur at the same time must be hidden from each other, which are
processed independently without interference. The techniques that achieve isolation are
known as synchronization.
Durability: once a transaction has been completed and has committed its results to the
database, the system must guarantee that these results will not be lost if either the system or
the storage media fail. The database keeps track of the updated made so that the system can
recover from an abnormal termination.
The SQL Structured Query Language was developed by IBM by Edgar Codd who introduced the
relational model of data. Since then, SQL has become the standard query language for relational
database management systems (RDBMS). In the relational model, data are organized into
relations; each is represented by a table consisting of rows and columns. Each column represents
an attribute of the data; the list of columns makes up the header of the table. The body of the table
is a set of rows; one row is an entry of data which is a tuple of its attributes. To map a data to the
other relation, a primary key is used, which identify each row in the table. To access a relational
database, SQL is used to make queries to the database such as the CRUD basic tasks of Creating,
Reading, Updating, and Deleting data. The SQL supports indexing mechanism to speed up reading
operations, or creating views which can join data from multiple tables, and other features for
database optimization and maintenance. There are many relational databases available, such as
MySQL, Oracle, SQL Server, and all are using SQL [32].
The database cloud services offered by DSPs in three different models: 1) Virtual Machine (VM)
image, 2) Database as a service (DBaas), 3) Managed Hosting. The next subsection gives a
description these models.
10
The system is scalable, if it indicates its ability to handle the increase and decrease of active
users, store the growing amounts of data, and its ability to improve throughput when additional
resources are added. The challenge arises when a database workload exceeds the capacity of a
single server. A DBaaS must therefore support scale-out, where the responsibility for query
processing is partitioned amongst multiple servers to achieve higher throughput.
4. Elasticity:
The system is elastic, as it can be scaled-up dynamically by adding more nodes or can be
scaled-down by removing nodes without service disruption to handle growing workloads.
5. Privacy:
When the database is stored in the cloud, the data owner loses control of data, and the data
management becomes one of the CSP tasks. The CSP has the ability to access the data at any
time and it could accidentally or deliberately alter or even delete data. Moreover, the CSP can
share the data with third parties if necessary for purposes of law and order even without a
warrant. Solutions to privacy include policy and legislation as well as users' choices for how
data is stored. A significant barrier to deploying databases in the cloud is the perceived lack of
privacy, which in turn reduces the degree of trust users are willing to place in the system. This
challenge comes as the main motivation of our work, how to guarantee the database privacy
while storing it on the cloud.
11
Challenge 4: how to query the CSP without revealing query details?
As the data is encrypted, if the CSP knows the user’s query details it means knows the user’s
possibly sensitive search interest, in which can reveal the database information.
Challenge 5: how to hide query contents from the database owner?
For example, an organization has many user roles, one user qualified to search any value
from the database without willing to reveal query to anyone even to database owners’. So,
how to get the approval without revealing the query contents even to the database owner.
Challenge 6: how to hide query contents while assuring the database owner the hidden
contents are authorized by some certificate authority (CA)?
For example, as in the previous challenge, the query needs to get approval to process
without revealing its contents, while database owner wants to get some confidence by
making sure that the user who queries is authorized, this can be done by giving certificate
authority to some data.
To address the above challenges to storing a database or data on the cloud, users need to be assured
that proper data security measurements are in place that encompasses data confidentiality and data
integrity, in the next subsections, these security measurements and also how the query and result
would be processed briefly reviewed according to the previous related work.
12
and also high computational costs that has to encrypt all parameters and decrypt the results of
every SQL operation through all the Encryption Layers.
In [42] the authors proposed a cryptographic technique that integrates both encryption and
obfuscation techniques of data before outsourcing. It begins by checking the data type. If the data
are in the form of digits, obfuscation technique is applied through specific mathematical functions
without using a key, and if the data are in the form of alphabets or alphanumeric, then it is
encrypted by using symmetric encryption. This integration technique provides more data security
rather than encryption or obfuscation only.
The authors in [41] proposed a scheme, which is appropriate in a hybrid cloud to ensure the data
confidentiality on the public cloud. First, the data file to be outsourced is fragmented into smaller
file chunks, and these chunks are encrypted and then stored in the public cloud with different
locations. The process of file fragmentation occurs in the private cloud and then the chunks sent
in a different order than obtained from the original file to the public cloud. Every data chunk upload
operation is recorded in a Chunk Distribution Dictionary (CDD) that is stored in the private cloud.
Every record from the CDD has three fields: 1) the index of the chunk in the original file, 2) the
location of where the chunk is stored (in the case of using disturbed CSP), 3) and the hash value
of the chunk to integrity check purposes. This scheme applicable in strong medical or business
data and it suffers from an important challenge, data fragmentation in the private cloud.
The authors in [39], used prototype toolset called “SilverLine” to provide data confidentiality on
the cloud by, first, identify subsets of data that can be functionally encrypted without breaking
application functionality, and this can be done by using an automated technique that marks data
objects using tags and tracks their usage and dependencies and then discard all the data that is
involved in any computations on the cloud. Second, each subset is encrypted using a symmetric
encryption with different keys and accessed by different sets of users. Third, store the subsets on
the cloud. The keys used to decrypt the data stored by the data owner. To fetch data from the cloud,
the user first contacts the data owner to get the appropriate keys, and then sends the query to the
cloud. The input parameters to the query also send in an encrypted form, which make the cloud
execute the encrypted query and then send back the encrypted results to users. Finally, the user
decrypts the result. This technique suffers from a lot of pre-computation functions to ensure data
confidentiality, and the data owner server must be online all the time that she/ he have the
decryption keys.
The authors in [71] used vertical fragmentation to protect the confidentiality of data, it hides the
users identifies by separating identifier attributes with descriptive attributes. To satisfy privacy
constraints, encrypted searching is developed to preserve privacy against adversaries in a cloud
computing environment.
13
without decrypting it. This operation might be multiplication or addition, which fits the processing
and retrieval of the encrypted data. For example, two messages 𝑚1 and 𝑚2 are encrypted to 𝑚1 ́
and 𝑚2́ , it is possible to compute: 𝐹(𝑚1 ́ , 𝑚2
́ ), where F can be: addition or multiplication
function, without decrypting the encrypted messages. But it is not suitable for many real world
applications, because of its computational complexity.
In [56], the authors provide a system that can process a matching query over the encrypted data.
The proposed system consists of four phases. In the first phase, pre-processing and outsourcing:
the data is encrypted and sent to the CSP to be stored. In the second phase, query pre-processing:
the query is pre-preprocessed before sending to the CSP, which means each value in the query will
be encrypted, so the query can be searched over the encrypted data in the cloud without need to
decrypt the data. This depends on the encrypted key used to encrypt the data itself. This key must
be the same as the one used to encrypt the value. In the third phase, query processing and response:
the received query is processed by the CSP. The server will search for the first match for the query
condition, scan each attribute to get the matches, and then send the encrypted results to the query
issuer. In the fourth phase, query post-processing and result: the received encrypted result is
decrypted by the query issuer.
The authors in [57], proposed technique, in which first encrypt the sensitive attributes of each table
in the database, and then store the encrypted table in the cloud, EncryptedDataTable (EDT).
Second, making another table from the original table contains two columns, first, data column,
which contains a copy of the sensitive data columns, but kept non-encrypted and key column, in
which kept encrypted, this table called QuerySearchTable (QST). The records in QST are re-
ordered randomly, as not as the same in the EDT. Only authorized users are allowed to access the
encrypted data, when a search query issues, the search done first in the QST, find the key to EDT,
decrypt the key and then decrypt the result from EDT and the resulting records will be returned to
the user. This technique returns only those records satisfying the user query and no additional
record will be given.
1. Correctness: the query issuer is able to validate that the returned results do exist.
2. Completeness: the result is complete and no answers have been omitted from it.
3. Freshness: the results are based on the last version of the data.
The integrity is provided at four different levels: table (entire relation), column (attribute of the
relation), field level (individual value), or row (record/ tuple from the table). The integrity
verification done by assigning a tag to the data, in which returned with the result. At the table/
column level, the integrity verification is expensive since it can be performed only by the query
issuer, which all the data corresponds to that table/ column should be returned in the result. Because
the table or column was assigned to tag, the tag should be decrypted to detect any unauthorized
modification. At the field level, it is too complex to assign a tag to each value in the table.
14
Therefore, data integrity suffers from a high verification overhead for the query issuer and for the
server as well as table/ column level. At the tuple level, the integrity guarantee is only on the entire
tuple, which contains the query result, as each individual tuple was assigned to tag, and it is the
best solution to provide data integrity [45, 46].
To check the query result integrity, different solutions have been proposed in literature.
Query result integrity can be done between:
Two entities: DSP and the owner/ user. As shown in Figure 2.3, the data owner stores the
encrypted database in the DSP, and has the ability to insert, update or delete from the database.
Then, the data owner authorizes users, who will have the ability to issue a query and get the result
from the DSP. The query result’s integrity check done by the query issuer that checks if the answer
is correct, complete and from the last updated version uploaded to DSP.
Figure 2.3: Query execution between owner/ user and the DSP
Three entities: DSP, the owner/ user and Trusted Third Party (TTP). As shown in Figure 2.4, as
before the data owner stores the encrypted database in the DSP, and has the ability to insert, update
or delete from the database. Then, the data owner authorizes users. The data owner delegates the
integrity check of query result to the trusted third party server that checks if the answer is correct,
complete and from the last updated version uploaded to DSP.
15
Figure 2.4: Using TTP to provide mutual trust between owner/ user and the DSP
The authors in [48] use signature aggregation and chaining approach to check the query result
correctness and completeness. Before the database is outsourced, each record is hashed and then
signed. The signature is outsourced within data also. In behave to answer a query, the DSP sends
the matching records and their signatures, which are aggregated into a single signature. This
ensures the query correctness and to achieve the query completeness, proposes a signature chain
of record, computed as: the hash of the record concatenation with the hashes of all immediate
predecessor's records, which is a record with the highest value of attribute that is less than the value
of the given record along with this attribute, and the owners’ private key.
In [62] the authors propose an efficient signature scheme, which support the integrity verification
of the outsourced databases with multi-user modification. As the user has the ability to update the
data, and sign it by his/her private key. Then the user can use his/ her public key to verify the query
results even if the outsourced data have been updated and signed by different users.
The authors in [49] proposed an authentication scheme, by constructing Merkle Hash Tree (MHT)
over each table at the record level. The leaf nodes contain the hash value of each record of such a
table. The ascendant nodes are the concatenation of the descendant hashed nodes and the root
signed with the owners’ private key. To verify the query result correctness, aggregate the
individual signatures of tree roots, which involved in the query result by using the aggregated
signature scheme.
In [47, 50], the authors construct Merkle B-Tree, which is similar to MHT from [49], but the
internal nodes contain a pointer to the leaf nodes. The attribute values are used as keys in the MBT
to index. To identify the order of the pointers in internal nodes and records in leaf nodes of a MBT,
Radix-Path Identifier scheme is used, in which use numbers based on a certain radix to identify
each pointer or record in a MBT depends on its level and position in the MBT and the root of MBT
16
will be updated each time a database updated, and by sharing the new root signature with the users,
freshness can be guaranteed. After the pointer is identified, the authentication data associated with
them stores into a database as table.
To check the query completeness, the authors in [51, 52] proposed a probabilistic method as before
the database is outsourced, fake records are inserted into it. When a query issues, the fake records,
which is satisfying the conditions returned with the result. In order to know which, set of the
inserted records should be returned, use deterministic function, to definite the inserted records and
send this definition of the function to the users, and the user would be verified whether all the fake
records is presented within the query result. If at least a fake record is missing, the query result is
not complete. Also, this method can verify query freshness, the user generates a signature
certificate when she updates the database at time, the signature would be changed also and the new
signature stored within the outsourced database and assigns an expiration time δ to such a
certificate, as every δ units of time the certificate will be replaced on the current signature no matter
there is an update or not.
The authors in [53] use Message Authentication Code Chain (MACC) scheme, and use trusted
third party server (TTP), to verify the query correctness and completeness. Before the database is
outsourced, the table structure is modified as follows: R (A1,…,An, version, precursor, checksum),
where A denotes as attribute, “version” contains the update numbers of a record, “Precursor” field
stores the concatenation of the precursor’s primary key of each searchable attribute and the MAC
value of a record calculated and is saved in “checksum” field. When the user issues a query to
CSP, the retrieved result sends to the TTP that reconstructs each record's MAC value and compare
it with the checksum, if equal, then the record is correct, and send the result to the user. If the
result set forms a complete chain from the first record boundary to the last record boundary, then
the query results are complete.
In [55], query authentication is performed at the bucket level rather than record level. A bucket is
generated by partitioning the database with equivalent data range (values) or with the equal number
of data (count). Bucket-based index contains a bucket id BID, data range (upper-lower bound) (U,
L), number of records in the bucket and a checksum. A bucket checksum is a hash digest, returned
with the result. When the result received, the user calculates the checksum of the result again and
compares it with received ones. If equal, this guarantees the query result, authenticity [54].
In [61], the query authentication is performed at a partial column level rather than record level, as
the user request the cloud to return only a part of attributes in a table according to such a search
condition. The data owner encrypts each record separately and then for each given record, the
owner encrypts each of its attributes separately. This scheme supports dynamic operations,
including data insertion, deletion, and update.
The authors in [63] propose a new scheme for query correctness and completeness based on
invertible Bloom filter (IBF) and use trusted third party server (TTP) and this scheme supports
multi-user setting by incorporating multi-party searchable encryption. The IBF is constructed over
the whole database, for each attribute column in the table; all distinct attribute values are treated
as keywords to construct an index. For each attribute column value, compute a hash. Then,
construct MHT and the hashes stored in the leaf nodes. Each attribute value together with its index
is encrypted using symmetric encryption. When the user receives the query result, he/ she
reconstruct the hash of the tuple to check its correctness. If an empty set is returned, the user can
17
verify the query correctness by checking whether the search request belongs to the IBF. Besides,
using the property of the member enumeration of IBF, the data user can check whether all the
desired tuples are returned.
The authors in [72], proposed a deterministic private verification integrity check scheme, which
uses RSA based accumulator as verification value and this value is stored by the data owner. This
scheme limits the computational and storage overhead for both the CSP and the data owner as it
prevents data deletion, replacement, and data leakage attacks and detects replay attacks, as the CSP
might use an old challenge response to respond to a challenge which is new that matches it.
The authors in [73], proposed a data auditing system that integrates Merkle Tree and block chain
methods and uses a third-party auditor (TPA) to verify the data integrity. The core idea is to record
each verification result in a block chain as a transaction, and the verification result is time stamps,
to let the users check that the verification is performed at the prescribed time.
2.5 Conclusion
Despite the significant growth of the Database as a Service, there are still some obstacles to the
widespread adoption of this service. The most significant obstacle is data security. There have
been a number of contributions of the security requirements to outsource the data, 1) data
confidentiality, 2) query integrity verification, 3) how to process the query over encrypted data.
18
Chapter (3)
SCIQ-CD: Confidentiality and Integrity of
Query results for Cloud Databases Scheme
This chapter presents the design and implementation of secure scheme that provide the
Confidentiality and Integrity of Query results for Cloud Databases (SCIQ-CD). Section 3.1
describes the abbreviations used in the scheme design, as shown in Table 1. Section 3.2 presents
some known cryptographic primitives which are exploited to construct our scheme. Section 3.3
presents the model that adopted for cloud data storage for processing the queries on the outsourced
encrypted database). Section 3.4 presents the system design following as setup and database
preparation before being outsourced, then, how the database is accessed by the user, and finally,
how the query is authenticated and processed to users based on the database outsourcing model.
Abbreviation/Symbol Definition
DB Database
ET Encrypted Table
ST Search Table
X Attribute value
Y New Attribute value
EY New encrypted value
ri....n Record in table ( X 1 , X 2 , X 3 ,..., X n )
idi...n record id of record
19
PK data owner Public Key to decrypt the signature (RSA)
MBT Merkle B Tree
ti Root signature from root table
A digital signature is created by a signing function, which takes as input a data value and SK, and
uses them to generate a signature σ such that: sign (SK, X)→ σ. By using the verification function
to verify the data value, which uses the signature and PK, such that: verify (PK, X, σ ) [46, 65].
20
Leaf nodes contain the hash values of data records, d H (ri )
Internal nodes contain the keys for its child nodes, the hash values associated with it are
computed on the contention of the hash values of their children, h H (d1 ..... d fn ) , where
fn is the fan-out per node.
The fan-out of a node is defined as the number of children of an internal node in
the tree. A tree with high fan-out, have a low height, which is log n to the base F,
where n is the total number of index records and F is average fan-out.
After computing all the hash values, the hash of the root is signed using SK (data owners’
secret key).
21
By using the MBT, the number of Input and Output operations required to find an element in the
tree are reduced. And to store the MBT in the DSP, we used two different techniques, MBT-
Serialization and MBT-Base Identifier.
The serialization process is done by using a method called (Object to byte array) including few
steps as follows:
The deserialization process is the opposite and is done by using a method called (Byte array to
object) including few steps as follows:
The serialized MBT (byte array) is stored in a table called MBT_Serialization, which consists of
4 columns (ID, Table_Name, TreeSerialization and Version_No), and this table is stored within
the encrypted database in the DSP.
i is the index of a pointer or record, from 0 to f. the TBI of the root always equal to the index.
All the calculations done are based on the ternary number system. To have the TBI for the
employee table tree as shown in Figure 3.2, to clarify it, we will have two examples. From figure
3.1, the TBI of key 4 in the leaf node, 𝑖 is 0 as 4 is the first key in the leaf node. 𝑇𝐵𝐼𝑝𝑎𝑟𝑒𝑛𝑡 is 11
22
and the base is 3. Through the equation 𝑇𝐵𝐼𝑝𝑎𝑟𝑒𝑛𝑡 ∗ 𝑏𝑎𝑠𝑒 + 𝑖, the TBI of the key 4 in leaf node is
110.
The TBI of key 8 in the leaf node, 𝑖 is 1 as 8 is the second key in the leaf node. 𝑇𝐵𝐼𝑝𝑎𝑟𝑒𝑛𝑡 is 22
and the 𝑏𝑎𝑠𝑒 is 3, the TBI of the key in leaf node is 221.
After identifying the pointers and keys of all nodes in the tree with numbers, each level of MBT is
stored in an individual table except the leaf level, which will be extended by adding two columns
TBI and hash to store authenticated data record as shown in Table 3.3, Table 3.4, and Table 3.5,
each record in employee table represents a key from a leaf node of the MBT. Each record in EMP_1
and EMP_0 tables represents a pointer from an internal node.
The authentication tables have three columns named as ID, TBI, and Hash. The ID column in
EMP_1 and EMP_0 tables, are the keys from the MBT, adding an extra pointer -1(NULL pointer)
because the number of pointers in each node is always one more than the number of keys in each
node, which considering the left most pointers in each internal node, -1 is assigned to the ID
column of these pointers. The Hash column holds the hashes associated with the pointers in the
internal nodes and keys in leaf nodes.
At the EMP_0 table, the extra pointer -1 hash value is signed using RSA SK.
TBIs shown in Figure 3.2 are in ternary system, to store TBIs in authentication tables, convert to
the decimal system.
23
Table 3.3: Emp_0 Table 3.4: Emp_1
(root authentication table) (internal nodes authentication table)
ID TBI Hash ID TBI Hash
-1 0 8z9cxqrUMfP+Q -1 0 3788957
3 1 2efae55 2 1 ace31427
5 2 634c277 -1 3 70df12da
4 4 09f55fb6
-1 6 c287feb8
6 7 1f05b76
7 8 5b56746
24
know where and how to store the data structure and when and how to return the
authentication data to users for integrity verification.
Trusted third party (TTP): is an external server that provides a mutual trust between all
system components. TTP is trusted by all of them, and has the capabilities to check the
query result authentication.
Attack models: By outsourcing the database; the DSP is target of malicious attacks. A DSP itself
might be malicious and may attempt to insert fake records into the database or modify existing
25
records or even delete it. Also, the DSP may hide data loss, or ignore the record update requests.
Further, it could return old data to users instead of executing queries over the latest updated data
version.
The following steps that the owner takes into consideration in his\her role as shown in Figure 3.4:
1) The data owner generates an initial secret key K,
2) For each table containing sensitive data as shown in table 3.2, which is an example from the
database, create two tables:
a. An encrypted table (ET): is the same as the original table, except that only the sensitive
data is encrypted using EK as shown in table 3.6.
b. A search table ST: contains only two columns, the column having the original sensitive
attributes, in which the data values kept non-encrypted, while the corresponding record
id from the original table is encrypted using EK as shown in table 3.7.
26
Table 3.6: ET for Employee table Table 3.7: ST for Employee Table
DbYFdPNFlUi+S5/WnB5q1w== 3000
4 Heba Engineer ZTQMyUcQ== 1
EZty2KI/wgM1X6Rtjnsd7A== 2500
5 Huda Marketing Specialist YgDXY9tDRh 3
+zrbDk0HW9AyDXysLTriBg== 2500
6 Alaa Marketing Specialist YNP9FvsKKO 3
kLZcEa/K/pok8MVCyC6mng== 2000
7 John Production Technician 6e1IXUW2t3w 7
ZpLNMmhzHKLjvkUgB/VZAw== 2000
3) For employee table, the MBT is built on 50,000 data records. The tree fan-out is 32
and the height is 4. The height is computed based on the fact that we insert the digest
of data records into an MBT with a certain fan-out.
a. Each record is hashed using SHA256.
b. Each tree leaf node consists of key and value. The value contains the hash digest
of the record, and the key is the record id.
c. The internal nodes contain the pointers for their leaf nodes,
d. The root is signed using SK.
4) To store a tree as mentioned before, we have been using two techniques, 1- MBT-
Serialization: convert the tree into a sequence of bits and store it in a table within the DB in
DSP.2- MBT-Base Identifier: convert the tree into authentication tables and store it within
the DB at DSP.
5) Compute the hash digest of each record in the ET for the employee table, and update the ET
table by adding the hash digest as shown in table 3.8.
27
Table 3.8: ET Table (leaf nodes authentication table)
6) Update the ET table by adding the TPI, to be used in the query assurance by the TBI
technique as shown in table 3.9.
TTP Side:
The DK is stored in TTP that is used to decrypt the result before send it to the user. A Metadata
table is stored in TTP server that is used to check data integrity.
A Metadata table consists of 5 columns (ID, Table_name, PK, SK, Version_No),
As shown in Figure 3.5, the user issues a query and first, the original query processes to TTP,
which encrypts any value need to be encrypted to search over the encrypted database. Second, the
TTP process the encrypted query to DSP. Third, the DSP sends the encrypted result and the
Authenticated data records to TTP. The TTP checks the data integrity using the authenticated data
and the metadata stored in it, if the integrity ensured, decrypt the result and process the decrypted
result to the user.
29
Figure 3.6: Query assurance and processing using MBZ
1- The original query before process with DSP, process first to the TTP, which has two parts:
Query Translator and Query Filter. The TTP processes the query to the DSP to get the result
and check the result confidentiality and integrity.
a- Query Translator:
1. Checks if the original query has an encrypted value, TTP will process the query on the
ST instead of the ET at DSP.
2. When the requested values are found, the DSP sends the values and record_ids to
query translator, which decrypts the record_id for each value using DK, and uses the
decrypted ids, to get the records from DSP and process it to query filter, to check the
integrity and decrypt the result.
3. After the query is verified by the query filter, the result is decrypted using DK, and
process it to the user.
b- Query filter:
1. Receives the encrypted result and the MBT serialization from DSP.
2. De-serialize the tree, compare between the root signature and the SK from Metadata
table, if both are identical the data freshness is ensured, and then the root signature is
decrypted using PK from Metadata table.
3. Hash each record included in the result from each table using SHA2, and compare
between the new computed hash values and the hash values from the tree. If both are
identical, the data correctness and completeness are ensured.
30
Figure 3.7: Query assurance and processing using TBI
The same procedure works as in MBZ; the difference is in the query filter.
Query filter:
1. Receives the data records, which contains the encrypted result and the TBI from DSP.
2. Compute the TBI of its parent pointer, using the TBI of record.
3. To ensure the data correctness and data completeness, all the authenticated data of
records or pointers in the same node, which have the same parent TBI can be computed
by using any of their TBI. Then, Compute the root hash and then the root signature, by
using the SK from the Metadata Table.
4. Compare between the new computed root signature and the SK, if both are identical the
data freshness is ensured, and then the root signature is decrypted using PK from the
Metadata table.
31
Chapter (4)
Data Operations on the Outsourced Data
using SCIQ-CD Scheme
This chapter illustrates the details of handling queries, including static operations such as select
and range, and dynamic operations such as insert, update and delete. Dynamic operations are
performed at the record level only by the system admin who has the write privileges. Static
operations can be performed by the system admin and authorized users. To check integrity by
using MBT, the handling queries varies from using the two techniques, MSZ and TBI. In the
following examples, we present how the system works in each case of the basic query operations.
Assume that all queries run on employee table.
This insertion process includes the new data records and the updated tree serialization. We have
implemented query rewrite algorithms for TTP to serialize and de-serialize the MBT to check the
data integrity. We assume that the new record always contains sensitive data.
This process involves 3 network round trips between user, TTP and DSP.
32
1) The TTP encrypts any sensitive attribute value of the new record, from the insertion query
by the user, hash the record and encrypts the record id.
2) Retrieve the tree serialization from the DSP, de-serialize it, insert the new record hash
value, sign the root with the owners’ SK, serialize the tree.
3) In one transaction at the DSP, the SQL statements have been sent for Insert the new record
which contains, insert the new record in ST and ET, and update the tree serialization.
Based on the examples in figure 4.1 for the two different cases, the following examples show the
generated update and insert SQL statements that are used.
Example 1.1.1: Case 1: insert the value without split or merge on the tree.
Insert a new record to employee table, assume this record id is 7
INSERT into employee values (‘Ayman’, engineer, 5000, 2)
Insert node 7, and the update queries for authentication data on DSP
**insert the new record
Insert into [Employee_ST] ([ID], [Salary]) values(‘IJKJNbbhhnfOL,MNU’,5000);
Insert into [Employee_ET] set [Name]= ‘Ayman’, [Job]= ‘engineer’, [Salary]= ‘cdbnciducn nvkjdllvvfo’,
[Department_No]= 2, [hash]= ‘b569052rik’ where [id]= 7;
Example 1.1.2: Case 2: insert the value, which cause split and merge on the tree.
Insert a new record to employee table, assume this record id is 8
INSERT into employee values (‘Heba’, engineer, 3000, 2)
Insert node 8, and the update queries for authentication data on DSP
**insert the new record
Insert into [Employee_ST] ([ID], [Salary]) values(‘jhbcjhcgdbkufhn’,3000)
Insert into [Employee_ET] set [Name]= ‘Heba’, [Job]= ‘engineer’, [Salary]= ’cdlicjdlicjdiocdnckdjchjdkcjn,
[Department_No]= 2; [hash]= ‘bijidjp526jndj’ where [id]= 8;
33
In figure 4.2, Algorithm 1 shows the pseudo-code to explain the general process of the record
insertion query.
Algorithm 1: Record Insertion procedure in the proposed scheme
Require: ri ( X 1 , X 2 , X 3 ,..., X n ) : new record, ST: search table, ET: encrypted table, idi : record
id, EK: encryption key, ti : root signature of table tree, d i : hash digest of record, SK: secret key.
12: DSP Store encrypted record in Encrypted table and idi , and
non- encrypted X 3 in ST in DSP
13: DSP Update the tree serialization in the MBT-Serialization
table and increment the VERSION_NO
34
If the selection value is encrypted, the selection process involves 4 network round trips between
user, TTP and DSP.
1) Retrieve the encrypted record ids that relevant to the selected value from the ST table to
TTP.
2) The TTP decrypts the record ids, and searches with the decrypted record ids from DSP.
3) Retrieve the selected records that are relevant to the selected value within the table tree
serialization to the TTP, which de-serialize it, compute the record hash value.
4) Compare between the root signature and SK stored in TTP metadata, if equal decrypt the
root signature and get the hash value of this record to compare with the computed hash value,
and sends the results to the user.
If the selection value is non-encrypted, the selection process involves 2 network round trips
between user, TTP and DSP.
1) Retrieve the selected records that are relevant to the selected value within the table tree
serialization to the TTP, which de-serialize it, compute the record hash value.
2) Compare between the root signature and SK stored in TTP metadata, if equal decrypt the
root signature and get the hash value of this record to compare with the computed hash
value, and sends the results to the user.
The range select query is handled different than select query. In the range select query the two
boundary keys in the range must be identified, then retrieve that authentication data for those
boundaries, to check the data integrity. For example, 1.2.1, find the employee with a range from
25 to 35, we need to find its boundaries which are 24 and 36. We have implemented rewrite
algorithms for trusted third party to identify this boundaries keys, and to retrieve the hash values
of them.
Example 1.2.1: Write a select query with range condition (employee id between 25 and 35)
SELECT * FROM employee WHERE id between 25 and 35
**To find the left boundary key
Declare @startKey AS int;
Select top1 Value @startKey=id From [ Employee_ET] Where id>24 Order By id Desc;
Select top1 Value @endKey=id From [ Employee_ET] Where id <36 Order By id Aesc;
**retrieve the hash values for the left boundary and the right boundary for all the tree levels
In figure 4.3, algorithm 2 shows the pseudo-code to explain the general process of the selection
query.
Algorithm 2: Selection procedure in the proposed scheme
35
Require: X: attribute value, ST: search table, ET: encrypted table, idi : record id, DK: decryption
key, ri : record in table, ti : root signature from root table, d i : hash digest of record.
3: DSP Search in ST
4: Return id relevant to X to TTP
5: TTP Decrypt id
6: Return decrypted id to DSP with DK
An example to the record modification query is as shown in example 1.3.1. from figure 4.4.
The record modification procedure in the proposed scheme can be done in single value as shown
before or set of values together.
37
Example 1.3.2: Write an update query to update a set of records based on non-encrypted selection
value
UPDATE employee set JOB= Electrical engineer
WHERE JOB= engineer
To update multiple records at one time in one table, due to the hierarchical structure of an MBT
tree, only one update is done.
In figure 4.5, Algorithm 3 shows the pseudo-code to explain the general process of the record
modification query.
Algorithm 3: Record modification procedure in the proposed scheme from X to Y
Require: X: attribute value, Y: new attribute value, ST: search table, ET: encrypted table, idi :
record id, ti : root signature from root table, d i : hash digest of record, SK: secret key.
38
proposed scheme can be done in single record or multiple records as shown in example 4.1 and
4.2.
We have implemented query rewrite algorithms for trusted third party to generate the delete SQL
statements to delete the authentication data, which will be sent within the delete query by the user
to DSP. The deletion process involves 3 network round trips between user, TTP and DSP.
1) Retrieve the encrypted id from the DSP, which is relevant to the record to be deleted and
the tree serialization from the DSP.
2) TTP de-serialize the tree and compare between the root signature and the signature stored
in TTP meta-data; if equal decrypt the root signature and delete the hash value of that
record from the tree, some nodes may need to be merged together. Then, sign the root and
update the metadata table with the new signature and increment the Version_NO.
3) In one transaction at the DSP, the SQL statements have been sent for deleting the record
which contains, delete the record in ST and ET, and update the tree serialization.
Based on the examples in figure 4.5 for the two different cases, the following examples shows the
generated update and delete SQL statements that is used.
39
Example 1.4.1: Case 1: delete a record without split or merge on the tree.
Delete node 6, and the update queries for authentication data on DSP
**Delete the record
Delete From [Employee_ET] where [id]=6;
Delete From [Employee_ST] where [id]=DteXTR0NG8z9cxqrUMfP+Q;
Example 1.4.2: Case 2: delete a record, which cause split and merge on the tree.
Delete node 12, and the update queries for authentication data
In figure 4.7, Algorithm 4 shows the pseudo-code to explain the general process of the record
deletion query.
Algorithm 4: Record deletion procedure in the proposed scheme
Require: X: attribute value, ST: search table, ET: encrypted table, idi : record id, ri : record in
table, ti : root signature from root table, SK: secret key.
40
2: Return tree serialization to TTP
1: INPUT X
41
4.2 Data Operations on the outsourced data using TBI
4.2.1 Insert Operation
Inserting a new record to a table is complicated process, to insert a record; this may cause a change
to the MBT structure, while inserting new leaf node may cause splitting the parent nodes. An
example to insert new leaf nodes with two cases: b) insert a new leaf node without split the parent,
c) insert a leaf node that cause split the parent node, which are shown in Figure 4.8.
In the first case, there is available TBI for the new record between sibling nodes in MBT.
In the second case, the TBI values for some records must be changed before inserting the new
record because the MBT structure allows only two nodes per leaf node.
This insertion process doesn’t include only the new data, but also the new authentication data
records. We have implemented query rewrite algorithms for trusted third party to generate the
insert SQL statements to insert the authentication data, which will be sent within the insert query
by the user to DSP. We assume that the new record always contains sensitive data.
42
This process involves 3 network round trips between user, TTP and DSP.
1) The TTP encrypts any sensitive attribute value of the new record, from the insertion query
by the user.
2) Retrieve the verification object from the DSP, which contains pointers and hashes within
the tree range if left or right according to the inserted value.
3) In one transaction at the DSP, the SQL statements have been sent for Insert the new record
which contains, insert the new record in ST and ET, the record is hashed, compute the TBI
and update the authenticated data records.
Based on the examples in figure 4.1 for the two different cases, the following examples show the
generated update and insert SQL statements that are used.
Example 2.1.1: Case 1: insert the value without split or merge on the tree.
Insert node 8, and the update queries for authentication data on DSP
**insert the new record
Insert into [Employee_ST] ([ID],[Salary]) values(‘DteXTR0NG8z9cxqrUMfP+Q==’,3000);
Update [Employee_ET] set [Name]=‘Mai’, [Job]=‘engineer’, [Salary]=’lkeo9ijeijxqrUMfP+Q==o’,
[Department_No] = 3; [TBId] = 23, [hash]= ‘b569052rik’ where [id]=8;
Example 2.1.2: Case 2: insert the value, which cause split and merge on the tree.
Update [Employee_ET] set [Name] = ‘Lucy’, [Job] = ‘doctor’, [Salary] =’lrkgfgmpogkgml[‘, [Department_No] = 2;
[TBId] = 23, [hash]= ‘b56dkejdndfldmcc9052rik’ where [id]=7;
**because of the node split, a new node is inserted into the MBT, so we need to update the ripd
Update [ Employee_ET] set [ripd]=[ripd]+1 where [ripd]>=4 and [ripd]<=6;
43
** update the child nodes ripd
Update [ Employee_ET] set [ripd]=[ripd]+4 where [ripd]>=25 and [ripd]<=35;
In figure 4.9, Algorithm 5 shows the pseudo-code to explain the general process of the record
insertion query.
Algorithm 5: Record Insertion procedure in the proposed scheme
Require: ri ( X 1 , X 2 , X 3 ,..., X n ) : new record, ST: search table, ET: encrypted table, idi : record
id, EK: encryption key, ti : root signature from root table, d i : hash digest of record, SK: secret
key, TBI: radix path identifier.
11: END IF Figure 4.9: The record insertion procedure in the proposed scheme
44
4.2.2 Select Operation
The selection procedure in the proposed scheme depends on the selected value if the value is
encrypted or non-encrypted, which will be shown in algorithm 2.
Different examples of the selection procedure are shown in example 2.2.1, and 2.2.2, in example
2.2.1 the query returns one data record based on a selection value, while in example 2.2.2, the
query returns multiple records with a range.
If the selection value is non-encrypted, the selection process involves 2 network round trips
between user, TTP and DSP.
1) Retrieve the selected records that are relevant to the selected value within the verification
object to the TTP.
2) The TTP checks the data integrity, and sends the results to the user.
If the selection value is encrypted, the selection process involves 4 network round trips between
user, TTP and DSP.
1) Retrieve the encrypted record ids that relevant to the selected value from the ST table to
TTP.
2) The TTP decrypts the record ids, and searches with the decrypted record ids from DSP.
3) Retrieve the selected records that are relevant to the selected value within the verification
object to the TTP.
4) The TTP checks the data integrity, and sends the results to the user.
To retrieve the authenticated data records to the TTP, we have implemented query rewrite
algorithms for trusted third party to generate the select SQL statements to select the authentication
data, relevant to the selected record, as shown in examples 2.2.1 and 2.2.2.
**retrieve the verification object for for all the tree levels
Select [ripd], [hash] From [Employee_ET] Where ripd<=(@ recordRipd /3)*3 And ripd<(@ recordRipd /3)*3 +3;
Select [ripd], [hash] From [emp_1] Where ripd<=(@ recordRipd/3*3)*3 And ripd<(@ recordRipd/3*3)*3+3;
Select [ripd], [hash] From [emp_0] Where ripd<=(@ recordRipd/ 3*3*3)*3 And ripd<(@ recordRipd/3*3*3)*3+3;
The range select query is handled different than select query. In the range select query the two
boundary keys in the range must be identified, then retrieve that authentication data for those
boundaries, to check the data integrity. From example 2.2, find the employee with a range from 25
to 35, we need to find its boundaries which are 24 and 36. To identify this boundaries keys, and to
retrieve the verification object, we have implemented query rewrite algorithms for trusted third
45
party to execute the range select query on DSP. Then, the TTP checks the data integrity, and sends
the results to the user.
Example 2.2.2: Write a select query with range condition (employee id between 25 and 35)
SELECT * FROM employee WHERE id between 25 and 35
**To find the left boundary key and the left boundary ripd
Declare @startKey AS int, @startRipd AS int;
Select top1 Value @startKey=id, @startRipd=ripd From [ Employee_ET] Where id>20 Order By id Desc;
**To find the right boundary key and the right boundary ripd
Declare @endKey AS int, @endRipd AS int;
Select top1 Value @endKey=id, @endRipd=ripd From [ Employee_ET] Where id<40 Order By id Aesc;
**retrieve the verification object for the left boundary for all the tree levels
Select [ripd], [hash] From [Employee_ET] Where ripd<=(@startRipd/3)*3+3 And ripd<(@startRipd/3*3)*3 +3;
**retrieve the verification object for the right boundary for all the tree levels
Select [ripd], [hash] From [Employee_ET] Where ripd<=(@ endRipd/3)*3+3 And ripd<(@endRipd/3*3)*3+3;
Select [ripd], [hash] From [emp_1] Where ripd<=(@ endRipd/3)*3+3 And ripd<(@ endRipd/3*3)*3+3;
Select [ripd], [hash] From [emp_0] Where ripd<=(@ endRipd/3)*3+3 And ripd<(@ endRipd/3*3)*3+3;
In Figure 4.10, algorithm 6 shows the pseudo-code to explain the general process of the selection
query.
46
2: IF X is encrypted in DSP 1: INPUT X 12: IF ELSE
3: DSP Search in ST
4: Return id relevant to X to TTP
5: TTP Decrypt id
6: Return decrypted id to DSP with DK
This process involves 3 network round trips between user, TTP and DSP.
47
1) The TTP encrypts any sensitive attribute value of the record, from the modification query
by the user.
2) Retrieve the verification object from the DSP, which contains pointers and hashes within
the tree range if left or right according to the modification value from the update query.
3) In one transaction at the DSP, the SQL statements have been sent to update the existing
record which contains, update the record in ST and ET tables, the new record hash, the root
signature.
An example for the record modification query is as shown in example 2.3.1. from figure 4.4.
The record modification procedure in the proposed scheme can be done in single value as shown
before or set of values together.
In figure 4.11, Algorithm 7 shows the pseudo-code to explain the general process of the record
modification query.
Algorithm 7: Record modification procedure in the proposed scheme from X to Y
Require: X: attribute value, Y: new attribute value, EY: new encrypted value, ST: search table,
ET: encrypted table, idi : record id, EK: encryption key, ti : root signature from root table, d i :
hash digest of record, nd i : new hash of record, SK: secret key, TBI: radix path identifier .
48
2: Request Verification Object 1: INPUT X
This process involves 2 network round trips between user, TTP and DSP.
1) Retrieve the encrypted id from the DSP, which is relevant to the record to be deleted.
Moreover, the verification object from the DSP, which contains pointers and hashes within
the tree range if left or right according to the deletion value.
2) In one transaction at the DSP, the SQL statements have been sent for delete the record
which contains, deleting the record in ST and ET, compute the TBI and update the
authenticated data records.
Based on the examples in figure 4.6 for the two different cases, the following examples show the
generated update and delete SQL statements that is used.
49
Example 2.4.1: Case 1: delete a record without split or merge on the tree.
Delete node 6, and the update queries for authentication data on DSP
**Delete the record
Delete From [Employee_ET] where [id]=6;
Delete From [Employee_ST] where [id]=DteXTR0NG8z9cxqrUMfP+Q;
Example 2.4.2: Case 2: delete a record, which cause split and merge on the tree.
Delete node 12, and the update queries for authentication data
**delete the authentication record for the deleted node from the parent node
Delete From [emp_1] where [id] =12;
**update the authentication record for the deleted node from the parent node, where the node is merged
Update [emp_1] set ([id],[ripd],[hash]) values(10,4,’sxktow,nifmcSJNMdf’) where [id]=7;
In figure 4.12, Algorithm 8 shows the pseudo-code to explain the general process of the record
deletion query.
Algorithm 8: Record deletion procedure in the proposed scheme
50
Require: X: attribute value, ST: search table, ET: encrypted table, idi : record id, ri : record in
table, ti : root signature from root table, SK: secret key, TBI: radix path identifier.
51
Chapter (5)
Experimental Evaluation
52
5.3 Performance analysis
We evaluate the performance overhead against the MBT-Serialization scheme [8], as measuring
the average latencies of query result, according to different operations: insert, select, range
condition, update, and delete with confidentiality and integrity verification support.
In our scheme the query latency is dominated by the computation cost in the DSP side and the TTP
side, as the computation cost in the DSP side is the execution of encrypted queries and
authentication data retrieval, and in TTP side is rewriting query, verifying data integrity and
decrypt the result.
In MBT-Serialization scheme, also, the query latencies are dominated by the computation cost in
the DSP side and the TTP side, as the computation cost in the DSP side is the execution of
encrypted queries, and in TTP side is de-serialize the tree, rewriting query, verifying data integrity,
and decrypt the result.
The communication cost in both schemes includes the execution of additional queries to return the
result.
Data operations: We run experiments to explore how the latencies overhead changes when the
number of records to be retrieved is increased.
8
4
7
8
6
16
5
32 64
Height
4
128 256
3
0
0 50 100 150 200 250 300
Fan-out
53
the overhead decreases as the fan-out increases, as the number of update statements required to
update the authentication data are decreased, while the number of the fan-out increases.
The update statements are varying according to the fan-out of the MBT, suppose the MBT fan-out
is 32, then the tree height is 4. The number of update statements is 4 times the number of records
to be updated.
8000
7000
7000
No of Updated Statments
6000
5000
5000
4000
2800
3000
2100 2000 1800
2000 1500
1000
0
0 50 100 150 200 250 300
Fan-out
And to evaluate the performance of insert a new record in a table with integrity protection. We
assume that always the inserted record is at the end of the table. In our work, the experiment was
executed for insert query and to insert single record into employee table, we generate insertion
queries to insert record in encrypted table and search table, moreover insert the authentication data
by generating a different set of insert and update statements to update it and send them to the DSP
to be executed.
The average overhead of our scheme for insert a new record is 1.4 sec, while MBT-Serialization
scheme is 1.6 sec. the response time is varied because maybe the inserted record requires merge
some node together. The major performance overhead comes from the number of update
statements to be executed in the TTP side.
1.2
Latencies (Response time) in S
0.8
0.6
0.4
0.2
MBT-TBI MBT-Serialization
30
28
25
VO Size (KB)
20
15 16
14
12
10
8 8
5 6 6
44 3 4
22
0
0 50 100 150 200 250 300
Fan-out
Select Query: We run experiments to evaluate the performance of select queries from a table with
integrity protection by two different select cases: select a single record or multiple records
according to a specific value within one query as shown in examples 2.2.1 and 2.2.2 with MBT
fan-out equal to 32. There are two cases, the selection value is encrypted or non-encrypted. Figure
5.5 shows that the latency overhead increases when the number of records to be returned increases.
MBT-TBI denotes our scheme, and MBT- Serialization denotes the other scheme. The overhead
when the selected value is non-encrypted is less than the encrypted value. Overall, the overhead in
our scheme is less than in the MBT-Serialization scheme.
1.2
Latencies (Response time) in S
0.8
0.6 MBT-TBI(Enc)
0.4 MBT-TBI(Non-Enc)
MBT-Serialization
0.2
0
1 16 64 128 256 512 1024
NO of Rows
3
Latencies (Response time) in S
2.5
1.5 MBT-TBI(Enc)
MBT-TBI(Non-Enc)
1
MBT-Serialization
0.5
0
1 16 64 128 256 512 1024
NO of Rows
Update Query: We evaluate the performance overhead caused by update queries. The data to be
updated is first retrieved to TTP to verify its data integrity and then, we generate update queries
for both data and authentication data and send them to the DSP to be executed. Figure 7 shows the
overhead against the number of rows to be updated. The results show that when we update a few
records, the overhead is high in both schemes, because of the additional round trip to verify the
data integrity, but when the numbers of rows to be updated are increasing, the overhead of update
decreases. In both of schemes, if the number of record increases, the overhead decrease, as in the
MBT update multiple records is much better than update one record as updating the leaf nodes and
the inner nodes just once. The result, as shown in Figure 5.7 shows that the overhead in our scheme
is less than in the MBT-Serialization scheme.
56
6
3 MBT-TBI(Enc)
MBT-TBI(Non-Enc)
2
MBT-Serialization
1
0
1 16 64 128 256 512 1024
NO of Rows
Delete Query: We evaluate the performance overhead caused by deleting queries by two different
delete cases: delete a single record or multiple records from a table with integrity protection. The
data to be deleted is first retrieved to verify its data integrity and then, we generate delete and
update queries for both data and authentication data and send them to the DSP to be executed. In
both of schemes, if the number of records to be deleted are increasing, the overhead increases,
because of the update that is done on the MBT. But in overall the average overhead of our scheme
for deleting query is less than in an MBT-Serialization scheme as shown in figure 5.8.
6
Response Time in S
0
1 16 64 128 256 512 1024
NO of rows
58
Chapter (6)
Conclusion and Future work
In this thesis, two works are included for security schemes to provide confidentiality and integrity
of query result for databases processing over cloud service, using trusted third party (TTP), which
enables the mutual trust to store a database in the database service provider (DSP), we have
implemented query rewrite algorithms for users to generate the SQL statements based on the data
table in DSP according to the users’ query. In both schemes, we used Merkle B-Tree that provides
integrity assurance for the query result.
In the first work, we have presented MBT-Serialization scheme, in which the MBT is implemented
through a tool that serialize the MBT into a sequence of bits to be able to store it within the
encrypted database in DSP and de-serialize it. There is an extended table to DB, contains the tree
serialization that used to check the query result integrity and confidentiality.
In the second work, we have presented MBT-Base Identifier scheme, in which the MBT is
serialized by using a Base Identifier approach, in which each level of the MBT is stored in
authentication table, then store these authentication tables within the encrypted database in the
DSP.
We have explored the efficiency of different methods such as Insert, Update, Delete, Select and
Range Condition, to retrieve the result from DSP through the TTP to check the data confidentiality
and authenticate the result by using the authentication data stored within the encrypted database.
Both schemes are succeeded to achieve the security goals, but MBT-Base Identifier scheme is
better while imposing a small overhead for the different operations.
For future work, we plan to explore more involved types of queries with confidentiality and
integrity assurance. Moreover, to extend our work for Merkle B tree within non-relational
Database.
59
List of publications
1. Mai Rady, Tamer Abdelkader, and Rasha Ismail; “Integrity and Confidentiality in Cloud
Outsourced Data”; Ain Shams Engineering Journal, Volume 10, Issue 2, June 2019, Pages 275-
285, Elsevier.
2. Mai Rady, Tamer Abdelkader, and Rasha Ismail; “SCIQ-CD: A Secure Scheme to Provide
Confidentiality and Integrity of Query results for Cloud Databases”; In 14th International
Computer Engineering Conference (ICENCO), December 2018.
3. Mai Rady, Tamer Abdelkader, and Rasha Ismail; “Securing Query Results for Cloud
Databases”; In International Journal of Intelligent Computing and Information Sciences (IJICIS),
Volume 21, Issue1, May 2021.
60
References
1. D. Attas, O. Batrafi, Efficient integrity checking technique for securing client data in cloud computing,
International Journal of Computer Science Issues (IJCSI), VOL. 11, 2011.
2. B. Joshi, K. Rani, Mitigating Data Segregation and Privacy Issues in Cloud Computing, International
Conference on Communication and Networks pp 175-182, 2017.
3. Cloud storage for cloud computing, 2009. [Online]. [Accessed: October 2017]. Available:
https://www.ogf.org/Resources/documents/CloudStorageForCloudComputing.pdf
4. I. Arora and A. Gupta, Cloud Databases: A Paradigm Shift in Databases, International Journal of Computer
Science Issues (IJCSI), VOL. 9, 2012.
5. D. Ma, R. H. Deng, H. Pang, and J. Zhou, Authenticating Query Results in Data Publishing, the International
Conference on Information and Communications Security, Berlin, 2005.
6. E. Shmueli ,R. Vaisenberg , E. Gudes, and Y. Elovici, “Implementing a database encryption solution, design
and implementation issues”, Elsevier, 2014.
7. R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, 2 nd Edition, 1994.
8. K. S. Slabeva, T. Wozniak, Cloud Basics: An Introduction to Cloud Computing, Grid and Cloud Computing,
p 47-61, 2010.
9. Cloud Computing Defined. [Online]. [Accessed: October 2017]. Available:
http://www.cloudcomputingdefined.com/
10. D. Attas, O. Batrafi, Efficient integrity checking technique for securing client data in cloud computing,
International Journal of Computer Science Issues (IJCSI), VOL. 11, 2011.
11. Rana M Pir, “Data integrity verification in cloud storage without using trusted third party auditor,
International Journal of Engineering Development and Research (IJEDR), 2014.
12. V. Mateljan, D. Cisic, and D. Ogrizovic. Cloud Database-as-a-Service (DaaS), the 33rd International
Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO ’10,
p 1185 –1188. Rijeka, 2010.
13. S. Jain, M. Alam, Comparative Study of Traditional Database and Cloud Computing Database, International
Journal of Advanced Research in Computer Science, 2017.
14. J.Wu, L. Ping, X. Ge, Y.Wang, and J. Fu. Cloud Storage as the Infrastructure of Cloud Computing,
International Conference on Intelligent Computing and Cognitive Informatics, ICICCI ’10, p 380 –383, 2010.
15. D. Panthri, C. Singh, A. Vats, Database Management System as a Cloud Service, International Journal of
Innovative Research in Technology (IJIRT), February 2017.
16. C. Curino, E. C. Jones, R. Popa, N. Malviya, Relational Cloud: A Database-as-a-Service for the Cloud, 5th
Biennial Conference on Innovative Data Systems Research (CIDR), 2011.
17. Brian Hayes. ‘Cloud computing’, ACM, p 9-11, 2008. [Online]. [Accessed: October 2017]. Available:
http://doi.acm.org/10.1145/1364782.1364786.
18. Martin, R., Introduction to Internet Services, Rutgers University – Department of Computer Science, 2003.
[Online]. [Accessed: October 2017]. Available:
http://www.cs.rutgers.edu/~rmartin/teaching/spring03/cs553/presentations/intro.pdf.
19. Motahari Nezhad, H., Stephenson, B. and Singhal, S, Outsourcing Business to Cloud Computing Services:
Opportunities and Challenges, IEEE Internet Computing, Special Issue on Cloud Computing, 2009.
20. R. Rajan, S. Shanmugapriyaa, Evolution of Cloud Storage as Cloud Computing Infrastructure Service, IOSR
Journal of Computer Engineering (IOSRJCE), June 2012.
21. IaaS, PaaS and SaaS – IBM Cloud service models. [Online]. [Accessed: October 2017]. Available:
https://www.ibm.com/cloud-computing/learn-more/iaas-paas-saas/
22. R Elmasri, S Navathe, Challenges of Database Security, Fundamentals of Database Systems - Additional
Database Topics: Security and Distribution - Database Security, 2017.
23. Y An, Z Zaaba and N Samsudin, Reviews on Security Issues and Challenges in Cloud Computing,
International Engineering Research and Innovation Symposium (IRIS), 2016.
24. E Bertino, J Byun, and N Li, Privacy-Preserving Database Systems, Springer, 2005.
61
25. K Peffers, T Tuunanen, M ROTHENBERGER, and S Chatterjee, A Design Science Research Methodology
for Information Systems Research, Journal of Management Information Systems, p 45-77, 2014.
26. P. Mell, P., Grance, T.,The NIST Definition of Cloud Computing, National Institute of Standards and
Technology, 2011.
27. Zhang, X., Wuwong, N., Li, H., Zhang, X., Information Security Risk Management Framework for the Cloud
Computing Environments, 10th IEEE International Conference on Computer and Information Technology,
2010.
28. Q Zhang, L Cheng, and R Boutaba, Cloud computing: state-of-the-art and research challenges, Journal of
Internet Services and Applications, 2010.
29. SQL in Microsoft Azure. [Online]. [Accessed: December 2017]. Available:
http://www.ingrammicrocloud.com/2015/04/09/microsoft-azure-sql-paas-vs-iaas/
30. Database Systems: A Practical Approach to Design, Implementation, and Management, By (author) Thomas
Connolly.
31. Database Concepts and Standards. [Online]. [Accessed: September 2017]. Available:
https://www.service-architecture.com/articles/database/index.html
32. E. F. Codd , The relational model for database management: version 2, 1990
33. DBA Guide to Databases on VMware, 2011.
34. A Inomata, T Morikawa, MIkebe, and Sk Rahman, Proposal and Evaluation of a Dynamic Resource
Allocation Method based on the Load of VMs on IaaS, 2011.
35. Shweta Dinesh Bijwe and Prof. P. L. Ramteke, Database in Cloud Computing – Database-as-a Service
(DBaas) with its Challenges, 2015.
36. Chapter 16: Quality Attributes. [Online]. [Accessed: December 2017]. Available:
https://msdn.microsoft.com/en-us/library/ee658094.aspx
37. F. Mehak, R. Masood, Y. Ghazi, M. A. Shibli and S. Khan, “Security Aspects of Database-as-a-Service
(DBaaS) in Cloud Computing”, In Proc. of Springer International Publishing Switzerland, 2014.
38. C Bezemer, and A Zaidman, Challenges of Reengineering into Multi-Tenant SaaS Applications, 1st
Workshop on Engineering SOA and the Web (ESW), 2010.
39. K. P. N. Puttaswamy, C. Kruegel and B. Y. Zhao, Silverline: Toward Data Confidentiality in Storage
Intensive Cloud Applications, Symposium on Cloud Computing (SOCC), Portugal, 2011.
40. E.Shmueli ,R.Vaisenberg , E.Gudes, and Y.Elovici, Implementing a database encryption solution, design and
implementation issues, Elsevier, 2014.
41. A. Butoi, N.Tomai, Secret sharing scheme for data confidentiality preserving in a public-private hybrid cloud
storage approach, IEEE/ACM 7th International Conference on Utility and Cloud Computing, 2014.
42. L. Arockiam, S. Monikandan, Efficient Cloud Storage Confidentiality to Ensure Data Security, International
Conference on Computer Communication and Informatics (ICCCI), India, 2014.
43. H. Wang, J. Yin, C. Perng, and P S. Yu, Dual Encryption for Query Integrity Assurance, Conference on
Information and Knowledge Management, 2008.
44. L Ferretti, F Pierazzi, M Colajanni, and M Marchetti, Security and Confidentiality Solutions for Public Cloud
Database Services, The Seventh International Conference on Emerging Security Information, Systems and
Technologies SECURWARE 2013 .
45. H. Hacigumus, B. Iyer, and S. Mehrotram, Ensuring the Integrity of Encrypted Databases in the Database-
As-A-Service Model, pages 61-74. Data and Applications Security XVII, Springer US, 2004.
46. E. Mykletun, M. Narasimha and G. Tsudik, Authentication and Integrity in Outsourced Databases, ACM
Transactions on Computational logic, 2006.
47. W. Wei and T. Yu, Integrity Assurance for Outsourced Databases without DBMS Modification, International
Federation for Information Processing, 2014.
48. M. Narasimha and G. Tsudik, Authentication of outsourced databases using signature aggregation and
chaining, International Conference on Database Systems for Advanced Applications (DASFAA), 2006.
49. D. Ma, R. H. Deng, H. Pang, and J. Zhou, Authenticating Query Results in Data Publishing, The International
Conference on Information and Communications Security, Berlin, 2005.
62
50. M.Niaz, and G.Saake, Merkle Hash Tree based Techniques for Data Integrity of Outsourced Data, The 27th
GI-Workshop on Foundations of Databases, Germany, 2015.
51. M. Xie, H. Wang, J. Yin, and X. Meng, Integrity auditing of outsourced data, Very Large Data Bases
(VLDB), Austria, 2007.
52. M. Xie, H.Wang, J. Yin, and X. Meng, Providing Freshness Guarantees for Outsourced Databases, The 11th
International Conference on Extending Database Technology: Advances in Database Technology (EDBT),
ACM, USA, 2008.
53. J. Hong, T. Wen, Q. Guo, G. Sheng, Query Integrity Verification based-on MAC Chain in Cloud Storage,
International Journal of Networked and Distributed Computing, VOL. 2, 2014.
54. J. Wang, X du, J. Lu, and W. Lul, Bucket-based authentication for outsourced databases, Journal Concurrency
and Computation: Practical and Experience, 2010.
55. J. Wang, X. Du, LOB: Bucket based index for range queries, 9th International Conference on Web-Age
Information Management, China, 2008.
56. Purushothama B R and B. B. Amberker, Efficient Query Processing on Outsourced Encrypted Data in Cloud
with Privacy Preservation, International Symposium on Cloud and Services Computing, 2012.
57. M. Sharma, A. Chaudhary and S. Kumar, Query Processing Performance and Searching over Encrypted Data
by using an Efficient Algorithm, International Journal of Computer Applications, VOL. 62, 2013.
58. Mark D. Ryan, Cloud computing security: The scientific challenge, and a survey of solutions, Elsevier, 2013.
59. F. Zhao, C. Li, and C. F. Liu, A cloud computing security solution based on fully homomorphic encryption,
International Conference on Advanced Communication Technology (ICACT), 2014.
60. M. TEBAA, S. EL-HAJJI, A. EL-GHAZI, Homomorphic Encryption Applied to the Cloud Computing
Security, The World Congress on Engineering, VOL. I, London, U.K 2012.
61. T. Xiang, X. Li, F hen, Y. Yang , and S. Zhang, Achieving verifiable, dynamic and efficient auditing for
outsourced database in cloud, J. Parallel Distributing Computing, Elsevir, 2017.
62. W. Song, B. Wang, Q. Wang, Z. Peng , and W. Lou, Tell me the truth: Practically public authentication for
outsourced databases with multi-user modification, InformationSciences, ELsevir, 2016.
63. J. Wang, X. Chen, J. Li, J. Zhao, J. Shen, Towards achieving flexible and verifiable search for outsourced
database in cloud computing, Future Generation Computer Systems, 2016.
64. K.Raghuvanshi, P.Khurana and P.Bindal, Study and Comparative Analysis of Different Hash Algorithm,
Journal of Engineering Computers and Applied Sciences (JECAS), VOL. 3, 2014 .
65. S. Singh, S.K. Maakar and Dr. S. Kumar, A Performance Analysis of DES and RSA Cryptography,
International Journal of Emerging Trends and Technology in Computer Science (IJETTCS), 2013.
66. F. Li, M. Hadjieleftheriou, G. Kollios, and L. Reyzin, “Dynamic Authenticated Index Structures for
Outsourced Databases”, In Proc. of ACM Management of Data (SIGMOD), USA, 2006.
67. “Advanced Encryption Standard”, NIST. FIPS PUB 197, 2001.
68. A Nadeem, Dr M Y Javed, “A Performance Comparison of Data Encryption Algorithms”, In Proc. of IEEE
Information and Communication Technologies, pp. 84-89, 2006.
69. Serialization: C#. [Online]. [Accessed: October 2018]. Available: https://msdn.microsoft.com/en-
us/library/mt656716.aspx
70. M Rady, T abdelkader, R Ismail, SCIQ-CD: A Secure Scheme to Provide Confidentiality and Integrity of
Query results for Cloud Databases, 2018.
71. S. Vimercati, S. Foresti, G. Livraga, S. Paraboschi and P. Samarati, Confidentiality Protection in Large
Databases, Springer, 2018.
72. W. I. Khedr, H. M. Khater and E. R. Mohamed, Cryptographic Accumulator-Based Scheme for Critical
Data Integrity Verification in Cloud Storage, IEEE Access, vol. 7, pp. 65635-65651, 2019.
73. A. P. Mohan, M. Asfak and A. Gladston, Merkle Tree and Block Chain Based Cloud Data Auditing,
International Journal of Cloud Applications and Computing, Vol. 10, 2020.
63
ملخص الرسالة
تتزايد كمية البيانات الرقمية بمعدل هائل ،مما يفوق قدرة التخزين المحلية للعديد من المنظمات .لذلك ،تعتبر االستعانة
بمصادر خارجية للبيانات حلا لتخزين المزيد من البيانات في خوادم سحابية فعالة .توفر بيئة الحوسبة السحابية الوصول
عند الطلب إلى العديد من خدمات الحوسبة التي توفر مزايا مثل انخفاض معدل الصيانة ،وخفض التكاليف األساسية
على الموارد المختلفة ،والوصول العالمي ،وما إلى ذلك .قاعدة البيانات كخدمة )( DBAASهي إحدى خدمات البيئة
السحابية البارزة .مزودو خدمة قواعد البيانات ) ( DSPSليهم البنية التحتية الستضافة قواعد البيانات الخارجية على
الخوادم الموزعة وتوفير مرافق فعالة لمستخدميهم إلنشاء قواعد البيانات وتخزينها وتحديثها والوصول إليها في أي
وقت ومن أي مكان عبر اإلنترنت .تتمتع االستعانة بمصادر خارجية للبيانات في الخوادم السحابية بالعديد من الميزات
مثل المرونة وقابلية التوسع والقوة ،ولكن من أجل توفير هذه الميزات ،غالباا ما يتم التضحية بسرية البيانات وتكامل
البيانات.
()SCIQ- CD تقدم هذه األطروحة مخط ا
طا آم انا لتوفير سرية وسلمة نتائج االستعلم لقواعد البيانات السحابية
حيث تدمج بنية المخطط بين تقنيات تشفير مختلفة مثل AES :و RSAو SHA2لتحقيق خصائص أمان البيانات.
يسمح النظام لمالك البيانات باالستعانة بمصادر خارجية لتخزين قاعدة البيانات الخاصة بهم ،والتي تتضمن بيانات
حساسة إلى ، DSPوتنفيذ عمليات ديناميكية وثابتة على قاعدة البيانات الخارجية .باإلضافة إلى ذلك ،يستخدم النظام
خادم طرف ثالث موثوق به يعمل كخادم وسيط بين DSPوالمستخدمين للتحقق من سرية البيانات وتكامل البيانات.
يوضح تحليل األمان أن بنائنا يمكن أن يحقق الخصائص األمنية المطلوبة .يوضح تحليل األداء أن مخططنا المقترح
فعال للنشر العملي.
كلية الحاسبات والمعلومات
قسم نظم المعلومات
جامعة عين شمس
رسالة مقدمة للحصول على درجة الماجستير في الحاسبات والمعلومات قسم نظم المعلومات
الباحثة
مي مصطفى عدلي راضي
بكالوريوس في نظم المعلومات
كلية الحاسبات والمعلومات
جامعة عين شمس
تحت إشراف
أ.د .رشا اسماعيل
2022