Data Security in Cloud Computing
Data Security in Cloud Computing
Forthcoming Titles:
This publication is copyright under the Berne Convention and the Universal Copyright
Convention. All rights reserved. Apart from any fair dealing for the purposes of research
or private study, or criticism or review, as permitted under the Copyright, Designs and
Patents Act 1988, this publication may be reproduced, stored or transmitted, in any
form or by any means, only with the prior permission in writing of the publishers, or in
the case of reprographic reproduction in accordance with the terms of licences issued
by the Copyright Licensing Agency. Enquiries concerning reproduction outside those
terms should be sent to the publisher at the undermentioned address:
www.theiet.org
While the authors and publisher believe that the information and guidance given in this
work are correct, all parties must rely upon their own skill and judgement when making
use of them. Neither the authors nor publisher assumes any liability to anyone for any
loss or damage caused by any error or omission in the work, whether such an error or
omission is the result of negligence or any other cause. Any and all such liability
is disclaimed.
The moral rights of the authors to be identified as authors of this work have been
asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Preface xiii
2.7 Experiments 34
2.7.1 Performance of the GPU-based parallelisation 34
2.7.2 CallForFire performance 37
2.8 Related work 38
2.9 Conclusion 39
2.10 Future research challenges 40
References 40
Index 297
Preface
It has been about a decade since the first introduction of cloud computing concepts
and business models. Despite the technology maturing and cloud evolving from a
‘buzzword’ into an integral component utilised by almost all organisations linked to
the Internet, one thing has not changed: the concerns about data security in cloud com-
puting. Compared to traditional information technology paradigms, cloud computing
is complex, multi-tiered and large scale. Cloud computing inherits the problems of
the traditional paradigms, but due to the above-mentioned characteristics it also has
challenges entirely of its own. Hence, it will not be unexpected that several practical
problems still remain unsolved.
We would like to propose that cloud computing’s security problems can only be
solved if we explore these challenges from a fundamental viewpoint: the data cen-
tric viewpoint. After all, whether it is traditional Information Technology or cloud
computing, everything revolves around data. The systems are in place to process
data, the processes or algorithms define how the data is processed and the poli-
cies and regulations are in place to make sure data is processed in a consistent and
secure manner. A data centric view of cloud security looks at everything as eventu-
ally affecting the security of data residing in the cloud. That said, another challenge
exists. A researcher new to this field may also become overwhelmed by the plethora
of scientific approaches proposed to handle the challenges of securing data, while
maintaining a practical level of privacy. This book (audaciously) attempts to address
this problem, and presents a selection of exciting research work by experts of data
security in cloud computing.
A great classification for data security issues will be one, which is based on the
state of data that gets affected from the issue, data-at-rest, data-in-use or data-in-
transit. In practice however the boundary between the states is not always distinct.
Therefore, there are many data security issues that lie at that boundary and affect two
states of the data. The organization of this book is based on this classification and
reflects the fuzzy boundary between the states. Chapter 1 presents an overview of data
security issues in cloud computing and sets the tone for the book. Chapters 2–4 discuss
data-at-rest, Chapters 4–6 discuss data-in-use, while Chapters 6–8 discuss issues in
data-in-transit. Some issues, such as data leakage, Governance, Risk Management and
Compliance, data provenance and security visualization, affect data in all states and
cannot be categorized under this classification. These are presented towards the end
in Chapters 9–13.
xiv Data security in cloud computing
Abstract
Cloud computing offers a massive pool of resources and services that cloud users can
utilize for storing and processing their data. The users can flexibly control and reduce
their operational expenditures, whereas resources provisioned from the clouds can be
dynamically resized to meet their demand and especially budgets. The user, however,
has to consider unanticipated and expensive costs from threats associated with attacks
aiming for the user’s data in the cloud. In this chapter, we discuss the primary causes
of new attack vectors that create a multitude of data security issues in clouds. We also
discuss specific data security challenges in clouds and provide a classification which
can help in an easier understanding.
1.1 Introduction
Cloud computing has changed the way we view capital and operational expenses in
information technology (IT) and has changed the way infrastructures are designed
in the current computing age. Cloud computing has also massively removed start-up
costs for new companies and has influenced how we store and process data. Organi-
zations may choose cloud computing because of a number of key business objectives
such as reduced operational costs, reduced complexity, immediate access to resources,
easy scale up and down, lower barrier to innovation etc. The eventual effect of using
clouds for IT needs, however, is that an organization’s data leaves its premises and is
offloaded to the cloud.
Across cloud computing environments, data is the fundamental asset that we
need to secure. Seen from a data-centric point of view, everything else, including the
cloud computing infrastructure, policies, processes and regulations, are peripherals
and are used to support the storage, movement, processing, logging and handling of
data. The cloud computing infrastructure which is used for the storage, movement,
processing, logging and handling of data does not belong to the data owner which has
1
Cyber Security Lab, University of Waikato, New Zealand
2 Data security in cloud computing
subscribed to the cloud services. This separation of data owners and their data raises
new questions for data security which require novel solutions. These solutions should
provide answers to user concerns such as, the trustworthiness of the cloud service
providers, accountability of the cloud service provider, security of data in distributed
environments and across devices, protection against insider attacks in cloud and so
on. Cloud computing also introduces a shared environment for users and their data.
The physical resources are virtualized and shared among many users. Thus, a user’s
application may be using the same physical hardware as many others. The opaqueness
in the cloud environment means the user does not know with whom the resources are
shared with. This shared environment and its opaqueness poses new challenges of
its own such as the issues of data isolation, data remanence, provable data deletion,
data leakage and data integrity among others. One of the characteristics of cloud
computing is globally located servers. Cloud service providers may migrate resources
from one location to another and redistribute them over multiple global data centres
to achieve internal goals such as resource consolidation, better resource utilization
and high network performance. This allows clouds to utilize their resources more
efficiently and also provides users with faster network access and higher reliability.
This geo-distribution, however, introduces security issues of its own such as the
complex interplay of the data privacy laws of multiple territories, issues with the
provenance, tracing and auditing of data that is distributed globally etc.
It is important to understand the threats and security issues that emerge when
data is offloaded from a traditional perimeter bound IT infrastructure to a cloud or
multiple clouds. In this chapter, we attempt to study the underlying causes of many
of these attacks. Moreover, as these issues do not emerge in isolation, we provide a
simple classification of data security issues in cloud computing on the basis of the
state of data. The remainder of this chapter is organized as follows: In Section 1.2,
we provide some cloud computing definitions that will be useful in understanding the
chapter. In Section 1.3, we discuss three underlying causes of the new attack vectors
in clouds and in Section 1.4 we discuss and classify the main data security issues in
cloud computing. Finally, we conclude the chapter in Section 1.5.
Cloud service
provider
Data owner
Cloud service
partners
Cloud users
● Cloud users access and use data stored on the cloud capability provided by the
CSP. Cloud users may or may not own the data stored on the cloud. For example,
let’s say A uploads data on the cloud and also uses this data. A further gives access
to B to use the data. In this case, A is the data owner, whereas both A and B are
cloud users.
In the rest of this chapter, we will generally distinguish data owners from cloud users;
however, in places where the functions of cloud users and data owners are similar, we
will use the term cloud users.
There are various views and definitions of cloud computing which define the
concept with varying degree of success. One of the most popular ones is from the
National Institute of Standard andTechnology (NIST), which defines cloud computing
as follows. Cloud computing is a model for enabling ubiquitous, convenient, on-
demand network access to a shared pool of configurable computing resources (e.g.
networks, servers, storage, applications, and services) that can be rapidly provisioned
and released with minimal management effort or service provider interaction [1].
According to the NIST definition, cloud computing services can be classified based
on the service model they use to create the services. There are three cloud computing
service models as shown in Figure 1.2.
Infrastructure as a Service (IaaS): The services, provided in this model, allow the
cloud user to interact directly with the hardware resources. The consumer is provided
the capability to provision, computing power, storage and network resources. The
consumer also has the responsibility of supplying software to run on the hardware
resources, which can include operating systems and application software. As a result,
although the user does not manage the underlying cloud resources, it has control over
4 Data security in cloud computing
User’s control
Cloud’s control
operating systems security and application security while having limited control over
network security [1].
Platform as a Service (PaaS): In the PaaS model, the user is provided with
a development environment with tools, services and libraries. The user can create
cloud services using the provided environment while bounded by the limitations of the
environment. In this service model, the user has control over the applications/services
which it creates but not the underlying hardware or software.
Software as a Service (SaaS): The SaaS model provides software to a cloud user
that it may need. It frees the user from resource maintenance to a large extent while
providing the required functionality. This model offers the least amount of control to
the user. It may provide customizability of the software to fit the user’s need but no
control over the software, the platform or the infrastructure.
The ISO standard ISO/IEC 17728 identifies these service models as cloud
capabilities and defines seven cloud service categories as follows:
1. CaaS: Communications as a Service
2. CompaaS: Compute as a Service
3. DSaaS: Data Storage as a Service
4. IaaS: Infrastructure as a Service
5. NaaS: Network as a Service
6. PaaS: Platform as a Service
7. SaaS: Software as a Service
1.3.1 Cryptography
Cryptography has been a cornerstone of data security in traditional IT and will con-
tinue to be in the cloud computing paradigm as well. Cryptography addresses two
basic security constructs of data confidentiality and data integrity. Within cryptogra-
phy, encryption has been an important tool along with access control to secure data
on both sides of the perimeter. Encryption has been applied to data travelling on a
network by way of the PKI infrastructure as well as data stored on secondary storage
devices and databases. One of the main challenges in using encryption for protect-
ing the confidentiality has been key management. As the number of authorized users
grow, so do the number of keys and with the growing number of keys, key distribution
and user revocation become more complex. Key management remains a challenge
in cloud computing. As mentioned previously clouds, however, introduce new attack
vectors and therefore new security challenges. To achieve data confidentiality, data is
kept encrypted on the cloud. If, however, the encrypted data is decrypted before being
processed, it will make the data vulnerable to attacks. Therefore, in the context of
cloud computing, there is a need for techniques that enable operations over encrypted
data. Such methods will allow confidentiality of data in the face of insider attacks
and would help in increasing a user’s trust in the cloud. As cloud computing has
enabled new forms of data collection and sharing, it has also created new challenges
in the access control of data. Previously established in-house access control methods
are no longer sufficient in an environment where the perimeter has blurred beyond
recognition. This has fashioned a need for functional encryption [4] and new forms
of access control based on it.
Traditionally, data integrity concerns have been confined to data on the network,
both inside and outside the perimeter. In cloud computing, however, the challenge of
protecting the integrity of data takes a whole new form. Data owners may be concerned
about unauthorized modifications, deletions and corruption of data, both malicious
and accidental [3], because they have no physical control over their data. In addition,
as the cloud is untrusted, there may be concerns about the validity of computations
performed in the cloud. There is, therefore, a need for techniques that can validate and
verify the computations performed in the cloud remotely. This becomes more chal-
lenging when the volume of data is large as is generally the case with cloud. Another
important factor with data integrity in cloud is that the validation and verification
should be performed at run-time when the modifications and deletions are done and
the data owner be informed simultaneously. This is a challenge that owes its existence
entirely to cloud computing.
While cryptography will play a huge role in solving many of the data security
concerns, cryptographic techniques are generally time consuming. Therefore, any
new cryptographic scheme or technique to solve these above-mentioned challenges
needs to take the induced latency in account as this will affect the availability of data
and services hosted in cloud.
A data-centric view of cloud security 7
to larger scale problems [14,15]. One example was the 2008 disestablishment of the
International Safe Harbor Privacy Principles after a Facebook user reported Facebook
for their storage of his data in other regions without his knowledge, at the European
Court of Justice [16]. If we look deeper into cross-boundary issues, there are two
inherent sub-problems [13], which are as follows:
1.3.3.1 Gaps and misalignments
Different countries’ laws are structured differently, with different definitions and
coverage of citizen rights. For example, the term ‘sensitive data’ is defined differently
across several countries, and even undefined in several countries. At the same time,
the actions and consequences of breaches of principles differ from country to country.
1.3.3.2 Different expectations on technology and control
Some territories, such as the European Union, would enforce the ‘right to be for-
gotten’ but it has been widely argued that it is technically impossible to achieve its
requirements. Other countries do not have a strong focus on individual rights but
focus on surveillance for citizen safety. Both ends of this spectrum are valid, and a
cloud service provider would need to navigate such legal minefields if they were to
have successful businesses across the world.
Technologists typically struggle to know in advance or stay current with the ever-
changing legal developments across all countries, and the overheads involved for
staying current with these developments could be reduced if there was a global or
regional alignment [13]. The European Council’s General Data Protection Regulation
(GDPR) [17,18], which will be launched mid-2017, is one such example. However,
one can also expect even more complexities between signatories of the GDPR.
Since the beginning of public cloud proliferation, provider liabilities have hinged
on service-level agreements and contracts and are mainly influenced by the local
laws of the respective countries. It is commonplace to see companies encouraging
users to sign biased contracts, letting go of most rights, and hosting specific data
sets in countries with the most ‘relaxed’ laws around data privacy and sovereignty
[12,15,19,20]. Conversely, most countries require sensitive data such as electronic
healthcare records or banking data to remain on premise or within the country. This
requirement creates a tension for the cloud service provider attempting to expand
into overseas markets [15]. Can technologies which assure privacy preservation and
data provenance reduce the mistrust of these cloud service providers? Also, if a cloud
service provider was to liquidate, what will happen to the data? Will there be a proper
process or legal safeguard for the data owners and stakeholders? In our opinion, these
open challenges are not just interesting for legal professionals but also for researchers
involved in the data security aspect of cloud computing.
Data at
rest
Data Data
generation destruction
Data in Data in
transit use
Data re-
Secure location
Data computation
Access control Perimeter
persistence
Machine security
isolation
Data
CSP Secure data Verifiable outsourcing
accountability sharing computing
Cross-cutting issues
Data Governance,
Information provenance & risk management and
leakage logging compliance
also for data in clouds. Most data security issues can be divided into three categories
depending upon when the issue arises during data’s lifetime. In this section, we go
over each of the three states of data and the security issues that arise therein. Some
security issues, however, are cross-cutting and can affect and arise during any or all of
the states. We discuss these in the cross-cutting issues sub-section. Figure 1.4 shows
a pictorial representation of the classification presented in this section.
well as instance storage connected to each instance. Although specialized storage ser-
vices such as Amazon S3, Windows Azure and Rackspace cloud storage etc. provide
data owners with highly scalable, customizable and reliable storage, instance storage
can be unreliable and is wiped off when the instance terminates. In both the cases,
however, data resides on the cloud, far removed from the user’s machine. This loss of
physical control on the data is compounded by the low trust between the data owners
and CSPs as discussed in Section 1.3. This gives rise to a host of data security issues.
One of the primary challenges in cloud is, how can data owners store data securely
on the untrustworthy cloud and share it with trusted users? Traditional approaches
either struggle with key management (symmetric key approaches) or do not provide
enough flexibility, when it comes to securing the data as well as tracking the data
usage of multiple users. (Traditional PKI approaches). In [21], the authors have tried
to solve this problem by using a progressive elliptic curve encryption scheme which
allows data to be encrypted multiple times. A similar multi-encryption approach has
been used in [22]. Reference [23] has further explored the problem of delegation of
data access privileges. The challenge of secure data sharing becomes more complex
when the data owner also wants to control the access of data in a fine grained manner.
The users of data may have different roles, attributes and may access data in different
contexts. This gives rise to the need of providing role-based, attribute-based or context-
based access [24–27]. Data persistence has also emerged as an important issue in
cloud computing. Cloud service providers need to be able to securely delete data
once it is no more required. Failure to do so may potentially expose data to co-
located users and to anyone who can lay their hands on discarded storage devices.
Cryptographic solutions such as Fade [28] and FadeVersion [29] help data owners in
obtaining verifiable data deletion whenever required. A related data-at-rest problem
is how to verify that the data has not been modified maliciously or accidentally by the
CSP. Verification of integrity of data is generally not a problem with small data but the
volume and remote location of data in the cloud makes this a challenge. This remote
verification also needs to be done in the untrustworthy cloud environment itself, as
downloading data for performing verification is not feasible [30,31]. Another CSP
accountability problem arises, when the cloud is asked to replicate data. How can the
data owner make sure that the CSP genuinely has multiple copies of data, particularly
when the data owner is being charged for the copies [32].
machine, the user will incur the extra cost of data transfer to and from the cloud.
Ideally, a user would want to perform computations on the data, in the cloud, but in a
secure manner such that the data is not exposed to attackers. Solutions to this prob-
lem of secure computation in cloud are either non-cryptographic or cryptographic in
nature. Non-cryptographic solutions usually entail encoding and distributing the data
over multiple service providers such as in [34,34]. Multi-party computation (MPC)
and fully homomorphic encryption have long been thought of as leading crypto-
graphic candidate technologies for secure computation. MPC protocols distribute the
computation of a function among multiple parties to provide secure computation of
the function. FHE, on the other hand, are encryption schemes that enable operations
on encrypted data. Both of these technologies, however, are inflexible and cannot
be adapted to the vast range of operations that are necessary in everyday compu-
tation such as searching, index-building, programme execution etc. This has led to
a variety of innovative solutions which solve the secure computation problem in a
specific, application-defined domain. References [35–37] deal with the problem of
secure search in the cloud, whereas [38,39] discuss privacy-preserving indexes. The
work in [40] discusses secure image-processing operations over 2D images, whereas
[41] describes real-time search over encrypted video data.
A problem related to secure computation is verifiable computation. In a low-
trust environment, verifiable computing provides the assurance that data has been
processed as expected by the user or data owner. A simple solution proposed in [42]
uses two clouds, a trusted cloud for verifiable aspects of computing and an un-trusted
cloud for regular computing needs. Other solutions such as the one in [43] approach
this as a problem of verification of the consistency of the database. Cryptographic
solutions to this problem are also discussed in [44,45]. Verifiable accounting or meter-
ing of resources used is also a challenging issue in cloud computing and one which
is of interest to both cloud service providers and cloud users. Verifiable accounting
can provide assurance to both the parties that the billing for the usage of resources is
appropriate and in accordance to an agreed upon policy [46].
Virtualization is one of the basic technologies on which cloud computing is built.
Virtualization allows the creation of multiple separate virtual machines (VM) on
the same physical hardware, thus increasing resource utilization from a CSP’s point
of view. These VMs, however, when running concurrently, use the same physical
resources such as the CPU, main memory, secondary storage and network devices.
This not only can introduce non-malicious fluctuations of hardware usage efficiency
but also creates a new attack vector. Attackers can try to co-locate themselves on the
same physical resources as the machine they want to attack [47]. This creates a need
for better isolation techniques that can allow VMs to operate in a space where they
are unaffected by the co-located VMs.
to the CSP’s cloud storage, it may be moving from the cloud storage to local storage.
Data may also move between multiple clouds, and it may also move within a cloud.
All these movements of data are covered under data in transit.
A CSP may move data internally between its datacentres (data relocation) to
achieve internal goals such as resource consolidation, better resource utilization and
high network performance as well as to provide lower latency access to its users.
The service provider, however, also needs to honour any SLA agreements and local
regulations; there may have been regarding the movement and location of data owner’s
data [16]. The data owner, on the other hand, may also need new techniques that can
help it verify the movement and transit of data once uploaded to the cloud. A CSP may
also delegate data to its partners or other CSPs. This may be done due to either a lack
of resources with the cloud or for cost-saving measures or for efficient provisioning.
The cloud partners, however, may or may not have the same level of security for the
data as negotiated by the data owner with the original provider. This results in a major
security challenge that we call data outsourcing, which calls for techniques that can
track data as it moves in and out of the cloud.
As data travels from the data owner to the cloud and on to the cloud user, it raises
another important issue in the data-in-transit category that is perimeter security. As
mentioned previously, the perimeter in cloud has blurred beyond recognition and,
hence, perimeter security has become an even greater challenge and requires new and
innovative solutions. One such promising perimeter security technique is software-
defined perimeter (SDP) [48].
important records or logs of data may be partially collected or absent [10]. Without
the availability of complete and perfect data provenance, provenance can result in
inadmissible evidence, in any kind of auditing. Reconstruction of data provenance by
utilizing existing logs [52], whereas completeness and perfection of provenance can
be assured is still an on-going research problem. Overheads and scalability issues of
data provenance need to be addressed too as the size of provenance data itself can
grow rapidly. This has the potential to degrade performance of machines storing or
collecting data provenance [53]. Maintaining security, privacy and trustworthiness of
data provenance is also an open problem [54].
Finally, data governance, risk and compliance management is another key cross-
cutting challenge for a data-centric view of cloud computing. The relative volatility
and flow of data across and within clouds have increased the complexity for CSPS,
data owners and cloud users [14]. Several researchers have attempted to automate the
data governance regulation and policy compliance alignment for cloud environments
hosting physical and virtual hosts, within and across countries [55]. Provenance and
auditing empowers a certain level of accountability but continued research into capa-
bilities to reduce human input reliance is required for truly automated data governance,
risk management and compliance [19].
1.5 Conclusion
Data security remains one of the top concerns of data owners when moving operations
to the cloud. Even after years of academic and industry research, there are still open
issues that need solutions. Most of these cloud specific issues arise due to the new
attack vectors. In this chapter, we have identified a disappearing perimeter, a new
type of insider and contrasting business objectives of the cloud service provider and
the cloud user as three primary causes of the new attack vectors. We also discussed
how the new attack vectors affect cryptography, data provenance and privacy and
security laws which have traditionally been used as security tools and policies for
data security. We have also provided the reader with a small survey of the open issues
and classified them under a simple taxonomy. We have attempted in this chapter to
provide the reader with a clear understanding of the open data security issues in cloud
computing and hope this will help in stimulating their interest in this exciting field
of research.
References
[1] National Institute of Standards and Technology. The NIST definition of cloud
computing; 2011. http://www.nist.gov/itl/cloud/upload/cloud-def-v15.pdf
[March 2017].
[2] Chandarekaran, K. Essential of Cloud Computing. Boca Raton, FL, USA:
CRC Press, 2014.
[3] Yu, S., W. Lou, and K. Ren. “Data security in cloud computing”, in Handbook
on Securing Cyber-Physical Critical Infrastructure (1st ed.), chapter 15, edited
14 Data security in cloud computing
by Sajal K. Das, Krishna Kant and Nan Zhang, Morgan Kaufmann, Boston,
MA, USA, 2012.
[4] Boneh, D., A. Sahai, and B. Waters. “Functional encryption: Definitions and
challenges”, in Theory of Cryptography Conference. Berlin: Springer, 2011.
[5] Tan, A. Y. S., R. K. L. Ko, G. Holmes, and B. Rogers. “Provenance for cloud
data accountability”, in R. Ko, & K.-K. R. Choo (Eds.), The Cloud Security
Ecosystem: Technical, Legal, Business and Management Issues (pp. 171–185).
Berlin, Heidelberg: Elsevier Inc., 2015.
[6] Gehani, D. T. “SPADE: support for provenance auditing in distributed envi-
ronments”, in Proceedings of the 13th International Middleware Conference.
New York: Springer-Verlag New York, Inc., 2012.
[7] Suen, C. H., R. K. L. Ko, Y. S. Tan, P. Jagadpramana, and B. S. Lee. “S2logger:
End-to-end data tracking mechanism for cloud data provenance”, in Trust,
Security and Privacy in Computing and Communications (TrustCom), 2013
12th IEEE International Conference on. Melbourne, VIC, Australia: IEEE,
2013.
[8] Ko, R. K. L., P. Jagadpramana, and B. S. Lee. “Flogger: A file-centric logger
for monitoring file access and transfers with cloud computing environments”,
in Third IEEE International Workshop on Security in e-Science and e-Research
(ISSR’11), in conjunction with IEEE TrustCom’11, Changsha, China, 2011.
[9] Ko, R. K. L., and M. A. Will. “Progger: An efficient, tamper-evident
Kernel-space logger for cloud data provenance tracking”, in Cloud Computing
(CLOUD), 2014 IEEE Seventh International Conference on.Anchorage, AK,
USA: IEEE, 2014.
[10] Tan, Y. S., R. K. L. Ko, and G. Holmes. “Security and data accountability
in distributed systems: a provenance survey”, in 2013 IEEE 10th Interna-
tional Conference on High Performance Computing and Communications &
2013 IEEE International Conference on Embedded and Ubiquitous Computing
(HPCC_EUC) (pp. 1571–1578). Zhangjiajie, China: IEEE.
[11] Taha, M. M. B., S. Chaisiri, and R. K. L. Ko. “Trusted tamper-evident
data provenance”, in Trustcom/BigDataSE/ISPA, 2015 IEEE. Vol. 1. Helsinki,
Finland: IEEE, 2015.
[12] Ko, R. K. L. “Cloud computing in plain English”, ACM Crossroads 16.3
(2010): 5–6. ACM.
[13] Scoon, C., and R. K. L. Ko. “The data privacy matrix project: Towards a
global alignment of data privacy laws”, Trustcom/BigDataSE/I SPA, 2016
IEEE. Tianjin, China: IEEE, 2016.
[14] Ko, R. K., P. Jagadpramana, M. Mowbray, et al. (2011, July). “TrustCloud:
A framework for accountability and trust in cloud computing”, in Services
(SERVICES), 2011 IEEE World Congress on (pp. 584–588). Washington, DC,
USA: IEEE.
[15] Ko, R. K. L., G. Russello, R. Nelson, et al. “Stratus: Towards returning data
control to cloud users”, in International Conference on Algorithms and Archi-
tectures for Parallel Processing, pp. 57–70. Cham: Springer International
Publishing, 2015.
A data-centric view of cloud security 15
Computer and Communications Security (New York, NY, USA), pp. 213–222,
New York, NY, USA: ACM, 2009.
[31] Wang, C., Q. Wang, K. Ren, and W. Lou. “Privacy-preserving public auditing
for data storage security in cloud computing,” in InfoCom2010, San Diego,
CA, USA: IEEE, March 2010.
[32] Mukundan, R., S. Madria, and M. Linderman. “Efficient integrity verification
of replicated data in cloud using homomorphic encryption.” Distributed and
Parallel Databases 32.4 (2014): 507–534.
[33] Will, M. A., R. K. L. Ko, and I. H. Witten. “Bin Encoding: A user-centric
secure full-text searching scheme for the cloud”, Trustcom/BigDataSE/ISPA,
2015 IEEE. Vol. 1. Helsinki, Finland: IEEE, 2015.
[34] Will, M. A., R. K. L. Ko, and I. H. Witten. “Privacy preserving computation
by fragmenting individual bits and distributing gates”, Trustcom/BigDataSE/I
SPA, 2016 IEEE. Tianjin, China: IEEE, 2016.
[35] Wang, C., N. Cao, J. Li, K. Ren, and W. Lou. “Secure ranked keyword search
over encrypted cloud data”, Distributed Computing Systems (ICDCS), 2010
IEEE 30th International Conference on. Genova, Italy: IEEE, 2010.
[36] Cao, N., C. Wang, M. Li, K. Ren, and W. Lou. “Privacy-preserving multi-
keyword ranked search over encrypted cloud data”, IEEE Transactions on
Parallel and Distributed Systems 25.1 (2014): 222–233.
[37] Ren, S. Q., and K. M. M. Aung. “PPDS: Privacy preserved data sharing scheme
for cloud storage”, International Journal of Advancements in Computing
Technology 4 (2012): 493–499.
[38] Hu, H., J. Xu, C. Ren, and B. Choi. “Processing private queries over untrusted
data cloud through privacy homomorphism”, Data Engineering (ICDE), 2011
IEEE 27th International Conference on. Hannover, Germany: IEEE, 2011.
[39] Squicciarini, A., S. Sundareswaran, D. Lin. “Preventing information leakage
from indexing in the cloud”, Cloud Computing (CLOUD), 2010 IEEE Third
International Conference on. Miami, FL, USA: IEEE, 2010.
[40] Mohanty, M., M. R. Asghar, G. Russello. “2DCrypt: Privacy-preserving image
scaling and cropping in the cloud”, IEEE Transactions Information Forensics
and Security (TIFS) 2016.
[41] Liu, J. K., M. H. Au, W. Susilo, K. Liang, R. Lu, and B. Srinivasan. “Secure
sharing and searching for real-time video data in mobile cloud”, IEEE Network
29.2 (2015): 46–50.
[42] Bugiel, S., S. Nürnberger, A. R. Sadeghi, T. Schneider (2011, October). “Twin
clouds: Secure cloud computing with low latency”, in IFIP International Con-
ference on Communications and Multimedia Security (pp. 32–44). Berlin:
Springer.
[43] Jana, S., V. Shmatikov (2011, June). “EVE: Verifying correct execution of
cloud-hosted web applications”, in HotCloud.
[44] Rosario G., C. Gentry, B. Parno. “Non-interactive verifiable computing: out-
sourcing computation to untrusted workers”, Proceedings of the 30th Annual
Conference on Advances in Cryptology, August 15–19, 2010
A data-centric view of cloud security 17
[45] Shuo, Y., A. R. Butt, Y. Charlie Hu, S. P. Midkiff. “Trust but verify: Monitoring
remotely executing programs for progress and correctness”, Proceedings of
the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming, June 15–17, 2005.
[46] Sekar, V., P. Maniatis. “Verifiable resource accounting for cloud computing ser-
vices”, Proceedings of the Third ACM Workshop on Cloud Computing Security
Workshop. New York, NY, USA: ACM, 2011.
[47] Ristenpart, T., E. Tromer, H. Shacham, S. Savage (2009, November). “Hey,
you, get off of my cloud: exploring information leakage in third-party
compute clouds”, in Proceedings of the 16th ACM Conference on Computer
and Communications Security (pp. 199–212). Chicago, IL, USA: ACM.
[48] Software Defined Perimeter Working Group, “SDP specification 1.0”,
April 2014. Available: https://cloudsecurityalliance.org/download/sdp-
specification-v1-0/. [retrieved Feb 21, 2017].
[49] Mulazzani, M., S. Schrittwieser, M. Leithner, M. Huber, E. R. Weippl (2011,
August). “Dark clouds on the horizon: Using cloud storage as attack vector
and online slack space”, in USENIX Security Symposium (pp. 65–76).
[50] Harnik, D., B. Pinkas, A. Shulman-Peleg. “Side channels in cloud services:
Deduplication in cloud storage”, IEEE Security & Privacy 8.6 (2010): 40–47.
[51] Squicciarini, A., S. Sundareswaran, D. Lin. “Preventing information leakage
from indexing in the cloud”, Cloud Computing (CLOUD), 2010 IEEE Third
International Conference on. IEEE, 2010.
[52] Magliacane, S. “Reconstructing provenance”, The Semantic Web–ISWC 2012
(2012): 399–406.
[53] Zhao, D., C. Shou, T. Malik, I. Raicu. “Distributed data provenance for large-
scale data-intensive computing”, Cluster Computing (CLUSTER), 2013 IEEE
International Conference on. IEEE, 2013.
[54] Hasan, R., R. Sion, M. Winslett. “Introducing secure provenance: problems
and challenges”, Proceedings of the 2007 ACM Workshop on Storage Security
and Survivability. Alexandria, VA, USA: ACM, 2007.
[55] Papanikolaou, N., S. Pearson, M. Casassa Mont, R. K. L. Ko. “A toolkit for
automating compliance in cloud computing services”, International Journal
of Cloud Computing 2 3, no. 1 (2014): 45–68.
This page intentionally left blank
Chapter 2
Nomad: a framework for ensuring data
confidentiality in mission-critical cloud-based
applications
Mamadou H. Diallo1 , Michael August1 , Roger Hallman1 ,
Megan Kline1 , Henry Au1 , and Scott M. Slayback1
Abstract
Due to their low cost and simplicity of use, public cloud services are gaining popularity
among both public and private sector organisations. However, there are many threats
to the cloud, including data breaches, data loss, account hijacking, denial of service,
and malicious insiders. One of the solutions for addressing these threats is the use of
secure computing techniques such as homomorphic encryption and secure multiparty
computation, which allow for processing of encrypted data stored in untrusted cloud
environments without ever having the decryption key. The performance of these tech-
niques is a limiting factor in the adoption of cloud-based applications. Both public and
private sector organisations with strong requirements for data security and privacy
are reluctant to push their data to the cloud. In particular, mission-critical defense
applications used by governments do not tolerate any leakage of sensitive data. In this
chapter, we present Nomad, a framework for developing mission-critical cloud-based
applications. The framework is comprised of: (1) a homomorphic encryption-based
service for processing encrypted data directly within the untrusted cloud infrastruc-
ture, and (2) a client service for encrypting and decrypting data within the trusted
environment, and storing and retrieving these data to and from the cloud. In order
to accelerate the expensive homomorphic encryption operations, we equipped both
services with a Graphics Processing Unit (GPU)-based parallelisation mechanism.
To evaluate the Nomad framework, we developed CallForFire, a Geographic Infor-
mation System (GIS)-based mission-critical defense application that can be deployed
in the cloud. CallForFire enables secure computation of enemy target locations and
selection of firing assets. Due to the nature of the mission, this application requires
guaranteed security. The experimental results show that the performance of homo-
morphic encryption can be enhanced by using a GPU-based acceleration mechanism.
1
US Department of Defense, SPAWAR Systems Center Pacific (SSC Pacific), San Diego, CA, USA
20 Data security in cloud computing
2.1 Introduction
Public cloud services are gaining popularity among both public and private sector
organisations due to their low cost and ease of use. According to the U.S. National
Institute of Standards and Technology, the cloud provides a pool of compute resources
which can be dynamically provisioned over the Internet, and which is both scalable and
measurable [1]. Public cloud service providers such as Google, Rackspace, Heroku,
and Amazon Web Services provide various software services over the Internet at low
cost, including services such as content management, accounting, virtualisation, and
customer relationship management.
Since computers have become highly commoditised in recent years, it can make
economic sense to outsource computation to clusters of distributed untrusted compute
units whose ownership and locations are unknown. The cloud provides an abstraction
for this compute model. Public clouds achieve cost efficiencies via economies of scale
and cost sharing among many customers, and therefore provide a means for organ-
isations to achieve high throughput computation at minimal cost without requiring
significant amounts of upfront capital investment in data center infrastructure. As a
result, organisations have been increasingly embracing the cloud as a platform for
outsourcing the hosting of web-based services. Virtual machines form the fundamen-
tal unit of the cloud abstraction, as they can be provisioned and destroyed completely
within software without labour-intensive changes to hardware configurations. This
fluid computational model enables software systems to be rapidly provisioned, con-
figured, deployed, and scaled in the cloud without the purchase of expensive dedicated
hardware.
The Cloud Security Alliance [2] has identified multiple security threats to the
cloud. These security threats discourage organisations, such as financial institutions,
healthcare organisations, federal and state governments, and defense agencies, from
using the public cloud to store their sensitive data. These threats include data breaches,
data loss, account hijacking, denial of service, and malicious insiders, among oth-
ers. Cloud security and privacy issues have been confirmed by different surveys [3].
For instance, virtual machine escape vulnerabilities, such as VENOM [4], enable a
guest virtual machine to access and execute code on the host machine. Additionally,
cross-VM side-channel attacks can take advantage of co-resident virtual machines
within a cloud infrastructure. One such example of this type of attack, known as a
cache attack, has been successfully demonstrated on public cloud service providers
[5–11]. As security researchers continue to explore and identify threats within public
cloud service offerings, more information about cloud vulnerabilities will be exposed,
thereby leading to the development of exploitation tools. For example, Nimbostra-
tus [12] is a toolset developed for fingerprinting and exploiting poorly configured
Amazon Web Services deployments. Since there is no guarantee that data stored and
processed in public clouds will remain confidential, organisations have a need for
Nomad: a framework for ensuring data confidentiality 21
secure computation mechanisms to guarantee the security and privacy of such data.
Fully homomorphic encryption is one such mechanism.
Starting with the first construction of fully homomorphic encryption in 2009
[13], the cryptographic research community has been actively developing new
homomorphic cryptography schemes, which would enable computation on encrypted
data in the cloud. A number of schemes have been developed, including lattice-based
[13], and ring learning with errors (RLWEs)-based approaches [14]. As a result, sig-
nificant progress has been achieved [15]. However, homomorphic encryption schemes
have not matured enough to be used as the mainstream data encryption mechanism
due to the fact that the homomorphic operations are computationally intensive. This
computational complexity limits the practicality of current homomorphic encryption
schemes.
Since secure computation techniques such as homomorphic encryption are cur-
rently impractical, public cloud service providers enable customers to perform their
own encryption before storing their sensitive data in the cloud (e.g., end-to-end
encrypted cloud storage provided by https://mega.nz), thereby precluding the cus-
tomer from performing any computation on their data when stored in the cloud. In
order to perform computations on the encrypted data stored in the cloud, it must be
decrypted first, which can lead to data leakage.
To address these limitations of current cloud service providers, we propose
Nomad, a framework for building mission-critical cloud-based applications. Nomad
consists of a distributed architecture for storage and processing of encrypted data,
which is deployed across trusted and untrusted cloud environments. The Nomad
framework leverages cloud-based Application Programming Interfaces (APIs) to cre-
ate a secure storage service hosted on virtual machines. This storage service achieves
data confidentiality via a homomorphic encryption-based data storage system. To
accelerate the homomorphic encryption operations within the storage system, Nomad
uses a GPU-based parallelisation technique. Nomad also provides a monitoring sys-
tem, which enables the system administrator to track the resource usage and state
of virtual machines running in the cloud. The monitoring system informs the system
administrator when the virtual machines are not meeting required performance objec-
tives, thereby enabling the system administrator to migrate the cloud-based services
to another cloud. The framework provides a means of outsourcing sensitive computa-
tions to an unsecure cloud environment, without concern for the loss of confidentiality
of the data. The primary benefit of this approach is that it enables organisations and
individuals to take advantage of the cost efficiencies of the cloud without having to
worry about the security (i.e., data confidentiality) ramifications of using the cloud.
Within the Nomad framework, the data is encrypted and decrypted on the trusted
client side. The encrypted data is homomorphically processed on the untrusted server
without ever decrypting the ciphertext representation of the data. The assumption is
that the untrusted environment has more computation power than the trusted envi-
ronment. In our current implementation of Nomad, we use HElib, an open source
homomorphic encryption (HE) library [16]. HElib provides low-level cryptographic
operations and is computationally intensive.
We developed an end-to-end application, called CallForFire, to analyse the fea-
sibility of the Nomad framework. CallForFire is a GIS-based defense application
22 Data security in cloud computing
for target acquisition and direction of fires against acquired targets. The applica-
tion includes a map for visualising observer and target locations. We performed
experiments to evaluate the performance of CallForFire.
Contributions. The following are the main contributions of this work:
● We designed and implemented a fully homomorphic encryption-based key/value
storage system. This storage system provides a means for efficient storage,
retrieval, and processing of encrypted data stored in the cloud.
● We built a framework, called Nomad, around the storage system, which facili-
tates the development of mission-critical applications that can be deployed in the
cloud. The framework simplifies the development of secure applications that are
deployed in the cloud.
● We used GPU parallelisation to accelerate HElib operations on both the trusted
client and the untrusted server.
● We implemented an application, called CallForFire, using the Nomad framework.
CallForFire is an interactive GIS application that demonstrates the feasibility of
the Nomad framework.
● We performed experiments to analyse the performance of the CallForFire
application. For specific use cases and interactive applications such as Call-
ForFire, Homomorphic Encryption is feasible, but it may still be too slow for
non-interactive applications.
UI API Hypervisor
Public/Private Key Database, and then sends the encryption command along with the
integer value to be encrypted over to the HE Processing Engine so that it can encrypt
the value. Once the plaintext value is encrypted, the ciphertext result is sent back to
the Client Management Engine so that it can be stored in the cloud using the Cloud
Storage Service.
applications making use of the Nomad framework use the UI API to store data in the
Cloud Storage Service. The Client Management Service is responsible for managing
all of the keys, data, and operations for the client application. The application server is
assumed to be running on virtual machines in the cloud alongside the Cloud Storage
Service itself. The application data is stored as ciphertext in the cloud. All operations
that the application performs on the data are performed on the ciphertext directly,
as the application never has access to the underlying plaintext data. After operations
have been performed on the ciphertext data in the cloud, the results are returned to the
trusted client application for display and consumption by the end-user. Upon first use
of the system, the user initialises the client and generates a public/private key pair. In
practice, key generation would be done by a trusted third party.
Storage of encrypted data. A description of how the data is encrypted and
stored in the cloud storage system follows. Note: Assume that each user has a single
public/private key pair for encryption and decryption of their data.
1. System initialisation: In order to use the system, the user must first send a
request to the Client Management Engine to generate a public/private key pair
(< IDuser , PK, SK >). The public key and private key are denoted by PK and
SK, respectively. The Client Management Engine sends a request to the HE Key
Manager to generate the key pair and store it in the Public/Private Key DB. The
Client Management Engine sends the User ID and Public Key (< IDuser , PK >)
to the Cloud Management Engine for later usage. The Cloud Management Engine
calls on the HE Key Manager to store the User ID and Public Key in the Public
Key DB.
2. The user initiates a request to store their Valueplaintext in the Cloud Storage Service.
3. The Client Management Engine submits a request to the HE Processing Engine
to encrypt the plaintext value to get the ciphertext
(Enc(Valueplaintext , PK) = Valueciphertext ).
4. The Client Management Engine then submits a request to the server-side Cloud
Management Engine to store the key/value pair
(< Keydata , Valueciphertext , IDuser >). The Keydata can be represented by a hash func-
tion computed by the Client Management Engine. Note that this key (Keydata ) is
used for indexing the data within the cloud storage, not to be confused with the
public key (PK) used for encryption or the private key (SK) used for decryption.
5. The Cloud Management Engine receives the storage request and calls on the
HE Processing Engine to store the data (< Keydata , Valueciphertext , IDuser >) in the
Ciphertext DB.
Retrieval of encrypted data. The following describes how the encrypted data is
retrieved from the cloud storage system and decrypted.
1. The user initiates a request to retrieve their Valueplaintext from the Cloud Storage
Service.
2. The Client Management Engine submits a request to the server-side Cloud Man-
agement Engine to retrieve the value associated with the user ID and the key
which is unique to the data (< Keydata , IDuser >).
26 Data security in cloud computing
3. The Cloud Management Engine receives the retrieval request and calls on the HE
Processing Engine to retrieve the data, Valueciphertext , from the Ciphertext DB.
4. The Ciphertext DB responds to the query for the data associated with the tuple
< Keydata , IDuser > and returns the Valueciphertext to the HE Processing Engine.
5. The HE Processing Engine returns the encrypted data to the Cloud Management
Engine.
6. The Cloud Management Engine returns the encrypted data to the Client
Management Engine.
7. The Client Management Engine submits a request to the HE Key Manager to
retrieve the Private Key, SK, associated with the user.
8. The HE Key Manager queries the Public/Private Key DB to retrieve the Private
Key assigned to the user.
9. Upon receiving the Private Key from the Public/Private Key DB, the HE Key
Manager returns the Private Key to the Client Management Engine.
10. The Client Management Engine sends < Valueciphertext , SK > to the HE Process-
ing Engine along with a request to decrypt the Valueciphertext .
11. The HE Processing Engine takes the Valueciphertext and returns the Valueplaintext to
the Client Management Engine (Dec(Valueciphertext , SK) = Valueplaintext ).
12. The Client Management Engine returns the Valueplaintext to the user.
Operations on encrypted data. Following is a description of the process taken
to perform an operation on the encrypted data in the cloud. Nomad supports the
following binary arithmetic operations on integers in the encrypted domain: addition,
subtraction, multiplication, and negation.
Note: This section assumes that the data has already been stored in the cloud
storage system—refer to the Data Storage section above for details on how this is
accomplished.
1. The user requests that a binary operation, OP, be performed on two integers (e.g.,
data1 and data2).
2. The Client Management Engine generates the request to perform the operation
on the two integers and sends it to the Cloud Management Engine
(< IDuser , Keydata1 , Keydata2 , OP >).
3. The Cloud Management Engine parses the request to identify the operation, the
operands, and the user who submitted the request.
4. The Cloud Management Engine retrieves the two encrypted integers from the
Ciphertext Database using their associated data IDs (Keydata1 and Keydata2 ).
5. The Cloud Management Engine retrieves the public key associated with the user’s
ID (IDuser ) from the Public/Private Key Database.
6. The Cloud Management Engine calls on the HE Processing Engine to perform
the operation on the ciphertext data
(OP(PK, Dataciphertext1 , Dataciphertext2 ) = Resultciphertext ).
7. The HE Processing Engine makes use of the GPU Device as necessary to
accelerate the operation on the two integer values.
8. The Cloud Management Engine then returns the Resultciphertext to the Client
Management Engine.
Nomad: a framework for ensuring data confidentiality 27
9. The Client Management Engine retrieves the user’s private key (SK) by requesting
it from the HE Key Manager.
10. Using the private key, SK, the Client Management Engine calls on the
HE Processing Engine to decrypt the Resultciphertext
(Dec(Keyprivate , Resultciphertext )=Resultplaintext ).
11. The Client Management Engine then sends the Resultplaintext to the user.
application, the security parameter λ is on the order of |λ| = 102 , s is the secret key,
ei is generated noise, and χ is assumed to be Gaussian.
The definitions needed for the encryption scheme are as follows. Let R = R(λ)
be a ring. HElib uses R = (Z)[x]/f (x) where f (x) = xd + 1 and d = d(λ) is a power
of 2 [25]. The dimension n = n(λ) is taken to be 1, and odd modulus q = q(λ), a
distribution χ = χ(λ) over R, and an integer N + N (λ) which is set to be larger than
(2n + 1) log (q). Computations are based on the plaintext space R2 , although it is
possible to use larger spaces.
The operations defined by the BGV scheme are as follows:
● Encryption. Given a message m ∈ R2 , let m = (m, 0, 0, . . . , 0) ∈ R2N , where N =
(2n + 1)( log (q)), for an odd modulus, q. Then the output ciphertext is given
as c = m + AT r ∈ Rqn+1 , where A is the matrix of public keys and r is a vector
of noise from χ2N .
● Decryption. Given a ciphertext c and the secret key s, compute
m = (c, s mod q) mod 2, where c, s mod q is the noise associated with s.
● Arithmetic operations. This scheme supports a number of operations includ-
ing element-wise addition, subtraction, and multiplication; scalar arithmetic; and
automorphic rotations that support efficient polynomial arithmetic (e.g., shift-
register encoding). These operations take ciphertext as their input and produce
ciphertext as their output. They produce accurate results, but the computation
overhead is very high. Our aim is to reduce this computation cost to make
homomorphic encryption feasible for widespread use.
2.3.2 HElib
This section describes the platform, modules, and algorithms that serve as the founda-
tion of HElib. Figure 2.2 shows the HElib block diagram. The structure of the platform
EncryptedArray
Routing plaintext slots
FHEcontext
parameters
Crypto
DoubleCRT
polynomial arithmetic
is based on two major components: the underlying Math layer and the operational
Cryptography layer.
The Math layer is implemented using the NTL math library [26], which consists
of modules to support the creation of the mathematical structures along with defini-
tions for operations on these structures. These modules include PAlgebra, NumbTh,
timing, bluestein, CModulus, PAlgebraMod, and IndexSet/IndexMap. These
are used to construct the mathematical structures needed for computation, includ-
ing the structure of Z∗m from PAlgebra, miscellaneous utilities from NumbTh, a
way to facilitate timing within the system from timing, and the fast computation
of arbitrary length FFTs from bluestein. CModulus enables modular arithmetic
over polynomials, PAlgebraMod defines component-wise plaintext-slot arithmetic,
and IndexSet/IndexMap provides indexing utilities that are needed for the complex
matrix operations that are involved. The second component of the Math layer consists
of a module to enable the Double-Chinese Remainder Theorem (Double-CRT) repre-
sentation of polynomials. The Double-CRT representation of polynomials provides a
way to efficiently perform arithmetic on large polynomials by encoding polynomials
into vectors in such a way that it allows for component-wise operations. See [16] for
additional details.
The second layer of the HElib platform is the Cryptography layer. The modules
included in this layer are KeySwitching, which is used for storing the matrices needed
for key-switching, and FHE, which consists of all of the cryptographic operations such
as Key Generation, Encryption, and Decryption. The Ctxt module is used to perform
all of the operations on the ciphertext, and the EncryptedArray is needed to support
routing of the plaintext slots for operations. The final module that is used in HElib is
FHEContext, which is used to keep track of all the parameters throughout the system.
HElib parameters. HElib uses the following parameters for configuration:
● k: The security parameter (default = 80) is analogous to the security parameter λ
and is typically on the order of 102 .
● m, p, r: provides the native plaintext space by Z[X ]/(m (X ), pr ), where m is
defined by the security parameter, k.
● d: degree of the field extension (default = 1). The degree of the field extension
defines how polynomials will factor in the field. The default value of d = 1
indicates that the polynomials will split into linear (degree 1) factors.
● R: number of rounds (default = 1).
● c: number of columns in the key-switching matrices (default = 2).
● L: number of levels in the modulus chain (default is heuristically determined).
● s: minimum number of plaintext slots (default = 0).
FO. Let also HVTeasting be the HVT easting, HVTnorthing the HVT northing, and θ the
HVT bearing. Then, the HVT ’s location is calculated as follows:
where the easting and northing distances are measured in meters (m) and the θ is
measured in angular mils (to the nearest integer) from 0 mils North.
It will be assumed that the positions of the FOs and FU s will be known to the
FDC. Multiple FOs may call for fire support on the same HVT . When this happens,
the nearest available FU to the HVT will be directed to fire on the HVT . The distance
between the FU and the HVT is calculated as follows:
Note that the actual distance between the FU and HVT is found by taking the
square root of the above equation. HElib currently does not support the square root
operation, but distances can still be compared using the above formula.
1. The FO detects an HVT in the field, estimates its distance and bearing, and enters
the data into the FO client application.
2. The FO client application uses the FDC public key to homomorphically encrypt
the FO’s location (easting and northing) and the HVT ’s distance and bearing, and
sends them to the FDC.
3. The FDC outsources the computation of the HVT ’s location to the Nomad cloud
service by sending the homomorphically encrypted FO’s location, the HVT ’s
bearing and distance, and locations of the available FUs over to the cloud.
4. The cloud homomorphically computes the HVT ’s absolute location and selects
the nearest FU to direct fire on the HVT.
5. The cloud sends the HVT ’s location and FU selection back to the FDC.
6. The FDC decrypts the HVT ’s location and the FU selection, then makes the final
decision to initiate the firing operation.
7. The FDC encrypts the HVT ’s location using the FU public key and sends the
firing command to the selected FU.
8. The selected FU decrypts the HVT ’s location and directs fire on the HVT.
Nomad: a framework for ensuring data confidentiality 33
2.6 Implementation
The Nomad framework is designed to be modular and extensible, using Thrift [31] as
the underlying client/server framework. We chose Thrift, which is based on the remote
procedure call (RPC) protocol, in order to abstract away the details of the communica-
tion between the client and server. The Cloud Storage Service uses Google’s LevelDB
[32] for its underlying database, but the framework can easily be adapted to use other
NoSQL databases such as Apache Accumulo [33]. The current implementation of
Nomad uses the Xen hypervisor [34], which is open source and used by major public
cloud service providers today. The GPU-based acceleration mechanism is integrated
with HElib and uses the Nvidia CUDA [27] parallel computing platform and pro-
gramming model, as described in Section 2.4. For the Client Management Service
implementation, we used the CppCMS [35] web development framework to integrate
the various C++ libraries, including HElib, Thrift, LevelDB, and CUDA. Nomad’s
modular design allows developers to extend the framework, including using different
hypervisors for virtual machine management, and choosing different key/value stores
for back-end storage.
We implemented the CallForFire application using the Nomad framework. Open-
Layers [36], an open source javascript library, is used for loading, displaying and
rendering map data in the browser. Since the map data needs to be completely
encrypted within the Cloud Storage Service, standard GIS databases such as Ora-
cle Spatial and Graph or PostGIS could not be used for this purpose. Traditional GIS
systems perform the computations on the server side, which means that the plaintext
data must be visible to the server when using these systems, thereby precluding their
use in the CallForFire application. Instead, LevelDB is used for storing the encrypted
GIS data. Since LevelDB provides an ordered mapping from keys to values, one of the
main advantages of using LevelDB is that the keys and values are arbitrary byte arrays
and are automatically compressed using the Snappy compression library [37]. The
Snappy compression library has fast and reasonable compression, which is advanta-
geous given the large size of the ciphertext. For performing operations in different
geographical coordinate systems within CallForFire, we leveraged the GeographicLib
[38] library.
Figure 2.3 is a screenshot of the actual CallForFire GUI in a web browser. It
shows an example scenario with the following players: one FDC, four FOs, five
FU s, and three HVT s. In this scenario, the FDC has computed the locations of the
three HVT s using the information given by the FOs. It has also selected the nearest
FU for each HVT as indicated by the lines between them on the map.
2.7 Experiments
We performed two different sets of experiments to analyse the feasibility of the overall
Nomad approach. The first set of experiments focused on the optimisation of the HElib
library using the GPU-based parallelisation approach. The second set of experiments
analysed the feasibility of the CallForFire application. HElib uses the following main
parameters: R (number of rounds), p (plaintext base), r (lifting), d (degree of the
field extension), c (number of columns in the key-switching matrices), k (security
parameter), L (number of levels in the modulus chain), s (minimum number of slots),
and m (modulus). HElib is initialised by configuring the parameters R, p, r, d, c, k,
and s. The parameters L and m are automatically generated by HElib based on these
other parameters.
parallelisation. The timing results reported are the averages (i.e., mean value) taken
over 50 runs of this simple workload. Table 2.2 details the comparison between the
CPU HElib and GPU HElib implementations. We used the TAU Parallel Performance
System [39] to perform the code profiling. It can be seen that almost all of the HElib
modules benefited from the GPU implementation. These results highlight the speedup
that can be gained when these methods are ported to the GPU.
Tables 2.3 and 2.4 compare the execution time of the CPU HElib implementation
to the GPU HElib implementation. It should be noted that the workload execution
times reported do not include the GPU initialisation time. From these tables, it can
be seen that implementing the BluesteinFFT() function using the GPU results in a
speedup of 2.1× compared to the CPU implementation.
CPU BluesteinInit(),
CPU BluesteinFFT() 0 3,970 3,970
GPU BluesteinInit(),
CPU BluesteinFFT() 56 3,836 3,892
CPU BluesteinInit(),
GPU BluesteinFFT() 46 2,119 2,166
GPU BluesteinInit(),
GPU BluesteinFFT() 43 2,033 2,077
Table 2.5 describes the GPU overhead associated with the respective GPU imple-
mentation of BluesteinInit() and BluesteinFFT(). The GPU overhead, along with the
one-time GPU initialisation, provide the minimum CPU execution time savings nec-
essary in order to make utilising the GPU worthwhile. This cost needs to be taken
into account when porting code to the GPU.
HElib GPU parallelisation. With BluesteinInit() and BluesteinFFT() ported to
the GPU, combinations of the CPU and GPU implementations were profiled. Table 2.6
shows these results in detail. It should be noted that the CUDA overhead is a com-
bination of the GPU initialisation and GPU kernel overhead as described previously.
This table shows how much time savings is gained from each of the two functions
when using combinations of the GPU and CPU for their implementation. For exam-
ple, implementing the BluesteinFFT() function in the GPU yields significantly higher
speedup than implementing the BluesteinInit() function in the GPU. This is because
the BluesteinFFT() function is called many more times than the BluesteinInit() func-
tion. From this, one can infer that the time savings is proportional to the number of
times each function is called.
Nomad: a framework for ensuring data confidentiality 37
Table 2.7 Comparison of workload execution time when varying the number
of threads per block using GPU BluesteinInit/FFT implementation
Multiple threads can be assigned to process the data stored in each memory
block within the GPU. This has the potential of speeding up data processing within
the GPU. The number of threads per block can be customised. In Table 2.7, various
numbers of threads per block were executed and profiled. The results presented in
the table reveal that there is only a slight decrease in execution time when increasing
the number of threads per block. This confirms NVIDIA’s CUDA GPU Occupancy
Calculator which details no effect of increasing the number of threads per block when
utilising Compute Capability 3.0 and 64 registers per thread. Also note that the average
overhead for the GPU kernel execution is 116 μs per 14 KB processed by the GPU.
locations. We also measured the time it took to store and retrieve ten encrypted
locations from the storage. In the firing asset selection, we measured the time it took
to compute the distance between ten FUs and ten HVTs pairwise. We repeated both
experiments 100 times and computed the averages. Table 2.8 summarises the results
of these experiments and gives a comparison between the performance of individual
and batched operations. When performing operations in batched mode, an input
array with multiple elements is passed in to the storage system. The homomorphic
encryption operations can then be performed on all of the elements of the array within
the same operation. With individual operations, one data element (i.e., an integer) is
placed into the input array, which is then passed to the storage system. Based on the
results of these experiments, it is best to use batch mode when possible, which can
reduce the overhead significantly.
Transmission and storage overhead. For the transmission and storage overhead,
we measured the time it took for the FO to encrypt and transmit the location infor-
mation to the FDC, and for the FDC to store the information in its database. We
considered scenarios for 100 FOs and calculated the averages. The time it takes to
transmit an encrypted location and store it in the database is about 22 times longer
than when the location is not encrypted. For the storage space overhead, the average
space used to store a location using FHE is 8.96 megabytes, whereas the average for
a location without using FHE is 17.6 bytes. This significant storage space overhead
is a limitation common to all lattice-based homomorphic encryption schemes.
2.9 Conclusion
In this chapter, we presented Nomad, a framework for developing secure and privacy-
preserving applications in the cloud. In particular, Nomad can be used to build
mission-critical cloud-based applications with ensured data confidentiality. Nomad
consists of a fully homomorphic encryption-based key/value storage system that pro-
vides a means for efficient storage, retrieval, and processing of encrypted data stored
in the cloud. The storage system was developed using HElib, an open source homo-
morphic encryption library. HElib implements the BGV homomorphic encryption
scheme, which is based on the RLWE technique. We employed a GPU-based approach
to parallelise some of the underlying HElib algorithms to accelerate the HElib oper-
ations. We performed experiments to analyse the performance of the GPU-based
acceleration. These experiments showed a 2.1× speedup of certain HElib operations.
The results of this approach are promising in that portions of HElib may be accelerated,
thereby leading to more uses of homomorphic encryption in practical applications.
To analyse the feasibility of the Nomad framework, we implemented CallForFire,
a cloud-based mission-critical defense application. CallForFire takes advantage of
Nomad’s Cloud Storage Service to encrypt and compute enemy target locations in
the battlefield. CallForFire is very secure, as the data stored in the system is never
decrypted within the cloud storage service, which is where all the computations take
40 Data security in cloud computing
place. CallForFire uses very limited operations on integers and is interactive, which
made it suitable for development using the Nomad framework. Despite the fact that
the homomorphic operations are slow, the performance of the CallForFire application
highlights the feasibility of such a system in a real-life scenario.
While the overall performance of HElib may still be impractical for many appli-
cations, certain interactive applications, such as CallForFire, can still make use of
HElib in a limited context to enhance data confidentiality. Further development of
secure computing techniques that protect data-in-processing, such as homomorphic
encryption, will likely accelerate the adoption of cloud computing by organisations
with sensitive data.
References
[1] Grance T, Mell P. The NIST Definition of Cloud Computing. Recom-
mendations of the National Institute of Standards and Technology. 2011
September.
[2] Alliance CS. The Notorious Nine: Cloud Computing Top Threats in 2013. Top
Threats Working Group. 2013.
[3] GCN: Like it or not, cloud computing is here to stay; 2011. Available
from: https://gcn.com/microsites/2011/cloud-computing-download/cloud-
computing-application-development.aspx. [Online; accessed 20-Dec-2016].
Nomad: a framework for ensuring data confidentiality 41
Abstract
With the evolution of cloud computing, organizations are outsourcing the storage and
rendering of volume (i.e., 3D data) to cloud servers. Data confidentiality at the third-
party cloud provider, however, is one of the main challenges. Although state-of-the-art
non-homomorphic encryption schemes can protect confidentiality by encrypting the
volume, they do not allow rendering operations on the encrypted volumes. In this chap-
ter, we address this challenge by proposing 3DCrypt—a modified Paillier cryptosys-
tem scheme for multiuser settings that allows cloud datacenters to render the encrypted
volume. The rendering technique we consider in this work is the pre-classification
volume ray-casting. 3DCrypt is such that multiple users can render volumes with-
out sharing any encryption keys. 3DCrypt’s storage and computational overheads are
approximately 66.3 MB and 27 s, respectively, when rendering is performed on a
256 × 256 × 256 volume for a 256 × 256 image space. We have also proved that
3DCrypt is INDistinguishable under Chosen Plaintext Attack (IND-CPA) secure.
3.1 Introduction
Cloud computing is an attractive paradigm for accessing virtually unlimited storage
and computational resources. With its pay-as-you-go model, clients can access fast
and reliable hardware paying only for the resources they use, without the risk of
large upfront investments. Nowadays, it is very common to build applications for
multimedia content hosted in infrastructures run by third-party cloud providers. To
this end, cloud datacenters are also increasingly used by organizations for rendering
of images [1–3]. In these cloud-based rendering schemes, the data is typically stored
1
Center for Cyber Security, New York University, United Arab Emirates
2
Department of Computer Science, The University of Auckland, New Zealand
46 Data security in cloud computing
and private cloud servers without disclosing both shape and color of the object to any
of them.
Our contributions are summarized as follows:
● We provide a full-fledged multiuser scheme for the encrypted domain volume
rendering.
● We can hide both color and shape of the object from the cloud. The color and
shape are protected from the public cloud server by encrypting both the color and
opacity information and by performing rendering in an encrypted domain. The
color is protected from the private cloud server by performing rendering on the
encrypted color information. The shape is protected from the private cloud server
by hiding pixel positions of the image space.
● 3DCrypt is such that if both the private and public cloud servers collude, they can,
at most, learn shape of the object. The cloud servers can never know the secret
color from gathered information. Therefore, we provide more security against
collusion attacks than the state-of-the-art Shamir’s secret sharing-based schemes.
Our preliminary analysis performed on a single machine shows that 3DCrypt
requires significant overhead. According to our analysis, the computation cost at
the user end, however, can be affordable. Security analysis shows that 3DCrypt is
INDistinguishable under Chosen Plaintext Attack (IND-CPA) secure.
The rest of this chapter is organized as follows. Section 3.2 reviews existing
approaches for encrypted domain rendering and provides an overview of pre-
classification volume ray-casting. In Section 3.3, we describe our system model
and threat model. In Section 3.4, we provide an overview of 3DCrypt. Section 3.5
describes how we integrated Paillier cryptosystem to the pre-classification volume
ray-casting. Section 3.6 explains construction details and Section 3.7 provides secu-
rity analysis. In Section 3.8, we report results and performance analysis. Section 3.9
concludes our work.
volume ray-casting scheme that uses block-based permutation (which creates sub-
volumes from the volume and permutes the sub-volumes) and adjustment of transfer
function to hide both the volume and rendering tasks from the rendering server. Their
work, however, lacks to achieve privacy due to the loss of information from the
adjusted transfer table and the use of permutation.
In literature, there are alternative schemes to address privacy issues by outsourc-
ing of volume rendering and visualization tasks to a third-party server. Koller et al.
[10] protected the high-resolution data from an untrusted user by allowing only the
user to view the low-resolution results during interactive exploration. Similarly, Das-
gupta and Kosara [11] proposed to minimize the possible disclosure by combining
nonsensitive information with high-sensitivity information. The major issue with
such schemes is that they leak sensitive information. To minimize information loss,
we consider supporting the encrypted domain rendering.
3.2.2 3D images
A 3D image has three dimensions: width, height, and depth. In contrast to 2D images
(which has only width and height), a 3D image better represents the real world, which
is also 3D in nature. To this end, a pixel in a 3D image is represented by four values:
the R, G, B colors and the opacity.
We are increasingly using 3D images in our day-to-day life in various ways,
such as in cinema (3D movies), Google map, and medical imaging (3D MRI, 3D
ultrasound, etc.). Typically, a 3D image is rendered either from a stack of 2D images
or from a 3D volume data. The former rendering scheme, which is known as surface
rendering, produces inferior image quality than the later rendering scheme, which is
known as volume rendering. In this chapter, we consider volume rendering. There
are a number of ways how a 3D image can be rendered from a given volume data.
Volume ray-casting is the most popular among them. We discuss volume ray-casting
in the following section.
Ray Projection
mal Estimation
Classification
Gradient/Nor
Interpolation
Composition
Sampling
Shading
Eye
Projected
Rays Image Space Volume Pre-Ray-Projection Post-Ray-Projection
(a) (b)
Figure 3.1 Pre-classification volume ray-casting: (a) a general overview and (b)
the rendering pipeline
where N (s) is the set of eight neighboring voxels of s, Cv is the color value, and
0 ≤ Iv ≤ 1 is the interpolating factor of the voxel v ∈ N (s). Likewise, we can
50 Data security in cloud computing
and
c
A= Oi (3.3)
i=1
where Oi = Asi cj=i+1 (1 − Asj ) and As is the opacity of s. Therefore, for a
constant Oi , this operation also requires only additions and scalar multiplications.
● After interpolation, the position and direction of a projected ray, which can dis-
close the coordinate of the pixel that casted it, are not required; rather, a ray
must be distinguished from other rays as sample points along this ray need to be
identified during the color/opacity composition.
Image User
Volume Outsourcer
server, delete/modify existing ones, and manage access control policies (such
as read/write access rights). In our scenario, the Volume Outsourcer is part of a
volume capturing hospital.
● Public Cloud Server: A Public Cloud Server is part of the infrastructure provided
by a cloud service provider, such as Amazon S3,1 for storing and rendering of
volumes. It stores (encrypted) volumes and access policies used to regulate access
to the volume and the rendered image. It performs most of the rendering on stored
volumes and produces the partially rendered data.
● Private Cloud Server: The Private Cloud Server sits between the Public Cloud
Server and the rendering requester. It can be part of the infrastructure, either
provided by a private cloud service provider or maintained by an organization as
a proxy server. The Private Cloud Server receives partially rendered data from
the Public Cloud Server and performs remaining rendering tasks on the volume.
It then sends the rendered image to the rendering requester. Note that the Pri-
vate Cloud Server does not store data, and it only performs minimal rendering
operations on partially rendered data received from the Public Cloud Server.
● Image User: This entity is authorized by the Volume Outsourcer to render a
volume stored in the Public Cloud Server. In a multiuser setting, an Image User
can (i) render an image (in encrypted domain) that will be accessible by Image
Users (including herself) or (ii) access images rendered by other Image Users. In
both cases, Image Users do not need to share any keying material.
● Key Management Authority (KMA): The KMA generates and revokes keys to
entities involved in the system. For each user (be a Volume Outsourcer or Image
User), it generates a key pair containing the user-side key and the server-side key.
The server-side key is securely transmitted to the Public Cloud Server, whereas
the user-side key is sent either to the user or to the Private Cloud Server depending
on whether the user is a Volume Outsourcer or Image User. Whenever required
(say, for example, in key lost or stolen cases), the KMA revokes the keys from
the system with the support of the Public Cloud Server.
Threat Model: We assume that the KMA is a fully trusted entity under the
control of the Volume Outsourcer’s organization. Typically, the KMA can be directly
controlled by the Volume Outsourcer. Since the KMA deals with a small amount of
data, it can be easily managed and secured. We also assume that the KMA securely
communicates the key sets to the Public Cloud Server and the Image User. This can
be achieved by establishing a secure channel. Except for managing keys, the KMA
is not involved in any operations. Therefore, it can be kept offline most of the times.
This also minimizes the chances of being compromised by an external attack.
We consider an honest-but-curious Public Cloud Server. That is, the Public Cloud
Server is trusted to honestly perform rendering operations as requested by the Image
User. However, it is not trusted to guarantee data confidentiality. The adversary can
be either an outsider or even an insider, such as unfaithful employees. Furthermore,
1
https://aws.amazon.com/s3/
52 Data security in cloud computing
mal Estimation
Ray Projection
Classification
Gradient/Nor
Interpolation
Composition
Sampling
Shading
Volume Outsourcer Image Public Cloud Private Cloud
User Server Server
we assume that there are mechanisms to deal with data integrity and availability of
the Public Cloud Server.
In 3DCrypt, the Private Cloud Server is an honest-and-semi-trusted entity. The
Private Cloud Server is also expected to honestly perform its part of rendering oper-
ations. The Private Cloud Server is semi-trusted in the sense that it cannot analyze
more than what can perceptually be learnt from the plaintext volume. We assume that
the Private Cloud Server is in conflict of interest with the Public Cloud Server. That
is, both cloud servers should not collude.
Policy
y (5)
(iii.b)
Access Access response User First-Stage
Manager Authorization Decryption
Second-Stage Access request
Policy Storage (3) Access Request Processor
Encryption
Key Store Anonymized
Store Keeper (9)
User Identity
Rendering parameter
Key
Trusted environment
Access policy
Second-Stage
Management Private Opacity Decryption
(ii) Authority Cloud
(KMA) (2) Server Rendering Finisher
Opacity Disclosed
(10)
Rendered Image
First-Stage Store
Encryption Requester Second-Stage
Access Requester
Color Decryption
Access
(i) policy Rendering Rendered
Volume (1) request (11) image
Volume Outsourcer Image User
Figure 3.4 The architecture of 3DCrypt, a cloud-based secure volume storage and
rendering system (taken from [8])
volume. To achieve this, the Volume Outsourcer interacts with the client module Store
Requester. The Volume Outsourcer provides plaintext volume and access policies
(Step i). The Store Requester performs the first-stage encryption on the input volume
using the user-side key and then sends the encrypted volume along with associated
access policies to the Store Keeper module of the Public Cloud Server (Step ii). On the
Public Cloud Server-end, the Store Keeper performs the second stage of encryption
using the server-side key corresponding to the user and stores the encrypted volume
in a Volume Store (Step iii.a). The Store Keeper also stores access policies of the
volume in the Policy Store (Step iii.b).
Once an Image User expects the Public Cloud Server to render a volume, his
or her client module Access Requester receives his or her rendering request in the
form of projected rays (Step 1). The module Access Requester, in turn, forwards the
request to the Access Request Processor module of the Public Cloud Server (Step 2).
In the request, the Access Requester sends rendering parameters (such as direction of
projected rays) and user credentials (which can be anonymized) to the Access Request
Processor. The Access Request Processor first performs the user authorization check
to find out if the user has authorization to perform requested operations. For this
purpose, the Access Manager is requested for checking access policies (Step 3).
The Access Manager fetches access policies from the Policy Store (Step 4). Next,
it matches access policies against the access request. Then, the access response is
sent back to the Access Request Processor (Step 5). If the user is authorized to
54 Data security in cloud computing
perform the requested operation, the Volume Renderer is invoked with rendering
parameters (Step 6). The requested volume is retrieved from the Volume Store (Step
7). Then, most of the rendering tasks, which do not require multiplication of opacities,
are performed in the encrypted manner and the partially rendered data is sent to
the Access Request Processor (Step 8). The Access Request Processor performs the
first round of decryption on the rendered data, hides voxel positions (e.g.permuting
coordinates of the voxels), and sends the data, protected pixel positions and the user
identity to the Private Cloud Server (Step 9). The Private Cloud Server performs the
second round of decryption and obtains partially rendered opacities in plaintext. The
plaintext opacities and encrypted colors are sent to the Rendering Finisher module,
which performs rest of rendering tasks involving multiplication of opacities. Since
the opacities are in plaintext, the multiplication of opacities with colors is reduced
to a scalar multiplication. The Private Cloud Server then sends the opacity disclosed
(but shape-protected as voxel positions are unknown) and color encrypted rendered
image to the Access Requester (Step 10). The Access Requester decrypts the colors
and shows the rendered image to the Image User (Step 11).
E(C),E(A)
Public Cloud Server Volume Renderer
E(C),E(A)
E(V')
Volume Store
E(V') Projected Rays
E(C),E(A)
Ei(Cs ) Ei(As )
1 1
Ei(Cs ) Ei(As )
2 2
E (V') KSC KSA KSA KSC
i i j j
Store Keeper Key Store Access Request Processor
KSA KSA
E (V')
* i j
Private Cloud Server
KUA Rendering Finisher
j
Figure 3.5 Encryption and decryption processes using 3DCrypt (taken from [8])
tasks: (i) pre-ray-projection rendering and (ii) encryption of output of the pre-ray-
projection rendering using user-side keys: KUCi and KUAi . As discussed in Section 3.2.3,
the pre-ray-projection rendering, consisting of gradient/normal estimation, classifi-
cation, and shading rendering components, maps the physical property vijk of the ijk th
data voxel to its corresponding color C and opacity A. After this step, an input volume
V can be represented as V , where the ijk th voxel of V contains colors and opacity
found by the physical property (typically represented as a floating point number) of
the ijk th voxel of V .
For a user i, the colors and opacities of V are encrypted using KUCi and KUAi ,
respectively. The encryption outputs Ei∗ (C) and Ei∗ (A) are sent to the Store Keeper
as an encrypted volume Ei∗ (V ). The Store Keeper then uses the server-side keys KSCi
and KSAi to re-encrypt Ei∗ (C) and Ei∗ (A) and stores the re-encrypted volume E(V ) in
the Volume Store.
In the encryption process, we adopt two main optimizations to decrease the stor-
age overhead. First, we use the optimization of the modified Paillier cryptosystem.
Second, we encrypt three color components—R, G, and B—in a single big num-
ber rather than encrypting them independently. We discuss both the optimizations
below.
Optimization of the modified Paillier cryptosystem. The modified Paillier
cryptosystem, explained in Section 3.6, is represented as: E(C) = (e1 , e2 ), where
e1 = g r and e2 = g rx .(1 + Cn), where C is the plaintext color. Likewise, we encrypt
the opacity A. Note that e1 is independent of the plaintext color. By using a different
e1 for a different color (a typical case), we need 2k bits (where k is a security param-
eter) for storing e1 of each color. We propose to optimize this space requirement by
using one e1 for encrypting t colors, requiring 2k bits for storing e1 for all t col-
ors. This optimization can be achieved by using the same random number r for all t
colors.
Encrypting color components. As discussed in Section 3.2.3, the three color
components (i.e., R, G, and B) undergo the same rendering operations. Each of
them requires 8 bits in plaintext, but is represented by 2 ∗ k bits (the minimum rec-
ommended value of k is 1,024) when encrypted independently. We can reduce this
overhead by representing three color components as a single big number and encrypt-
ing this number in the place of encrypting the color components independently. This
encrypted number will then undergo rendering in the place of rendering of color com-
ponents. After decryption, we must, however, be able to recover the rendered color
components from the rendered big number. One trick to create a big number from color
components is by multiplying 103∗m∗(d+f ) (where d + f is the total rounding places
during rounding operations in rendering) to m-th color component and adding all the
multiplications.
We use the same sampling technique as used by the conventional ray-casting. The
interpolation, however, is performed on the encrypted color E(C) and opacity E(A).
As discussed in Section 3.2.3, interpolation can be performed as additions and scalar
multiplications when the interpolating factor Iijk is available in plaintext. We therefore
disclose Iijk to the Public Cloud Server. Since the floating point Iijk cannot be used
with encrypted numbers, the Public Cloud Server converts Iijk to an integer Iijk by first
d
rounding-off Iijk to d decimal places and then multiplying 10 to the round-off value.
3DCrypt is such that it allows the Public Cloud Server to run additions and scalar
multiplications over encrypted numbers, as shown in (3.4) and (3.5), respectively.
E(C1 ) ∗ E(C2 ) = E(C1 + C2 ) (3.4)
and
I
E(C) ijk = E(Iijk ∗ C) (3.5)
Likewise, we can compute opacity in an encrypted manner. The interpolated color
E(Cs ) and opacity E(As ) for each sample point s are first-stage-decrypted using the
Image User j’s server-side keys KSCj and KSAj , respectively. The first-stage-decrypted
color Ej∗ (Cs ) and opacity Ej∗ (As ) are then sent to the Private Cloud Server along with
the proxy ray to which the sampling point s is associated.
3.5.3 Composition
In this step, the Private Cloud Server accumulates the colors and opacities of all the
sampling points along a proxy voxel position. Since this step involves multiplica-
tion of opacities (which are non-homomorphic to Paillier Cryptosystem), the Private
Cloud Server performs the second round of decryption on Ej∗ (As ) using the user j’s
user-side key for opacity, KUAj . The multiplied opacities Os , which will be multiplied
with encrypted color, however, is a floating point number. As discussed above, we can
convert this float to an integer by first rounding-off the float by f places and then multi-
plying 10f with the rounded-off value. Then, we can perform encrypted domain color
composition using (3.4) and (3.5). Note that since the available interpolated colors are
in encrypted form, the composited color Ej∗ (C) also remains encrypted. Furthermore,
in absence of voxel positions of the image space, the composited plaintext opacity A
does not reveal shape of the rendered image.
The plaintext rendered opacity A and encrypted rendered color Ej∗ (C) are sent
to the Image User, who decrypts Ej∗ (C) using KUCj and views the plaintext rendered
image.
corresponding to the user j and computes the first-stage-decrypted data Ej∗ (D) =
(ê1 , ê2 ), where
Definition 3.2 (Pseudorandom Function). A function f : {0, 1}∗ × {0, 1}∗ → {0, 1}∗
is pseudorandom if for all PPT adversaries A , there exists a negligible function negl
such that:
where s → {0, 1}n is chosen uniformly randomly and F is a function chosen uniformly
randomly from the set of function mapping n-bit string to n-bit string.
60 Data security in cloud computing
Our proof relies on the assumption that the Decisional Diffie-Hellman (DDH)
is hard in a group G, i.e., it is hard for an adversary to distinguish between group
elements g αβ and g γ given g α and g β .
Definition 3.3 (DDH Assumption). The DDH problem is hard regarding a group g
if for all PPT adversaries A , there exists a negligible function negl such that:
Theorem 3.1. If the DDH problem is hard relative to G, then the proposed Paillier-
based proxy encryption scheme (let us call it PPE) is IND-CPA secure against the
server S, i.e., for all PPT adversaries A there exists a negligible function negl such
that:
⎡ ⎤
(Params, MSK) ← Init(1k )
⎢ (KUi , KSi ) ← KeyGen(MSK, i) ⎥
⎢ ⎥
⎢ d0 , d1 ← A ClientEnc(·,KUi ) (KSi ) ⎥
A ⎢
SuccPPE,S (k) = Pr ⎢b = b ⎥
b←
R
− {0, 1} ⎥ (3.6)
⎢ ⎥
⎣ E ∗ (d ) = ClientEnc(d , K ) ⎦
i b b U
b ← A ClientEnc(·,KUi ) (E ∗ (d ), K )
i
i b Si
< 1
2
+ negl(k)
Proof. Let us consider the following PPT adversary A who attempts to solve the
DDH problem using A as a subroutine. Note that for the proof technique, we take
inspiration from the one presented in [17]. Recall that A is given G, n, g, g1 , g2 , g3 as
input, where g1 = g α , g2 = g β and g3 is either g αβ or g γ for some uniformly chosen
random α, β, γ ∈ Zn . A does for the following:
● A sends n, g to A as the public parameters. Next, it randomly chooses xi2 ∈ Zn
for the user i and computes g xi1 = g1 .g −xi2 . It then sends (i, xi2 ) to A and keeps
all (i, xi2 , g xi1 ).
● Whenever A requires oracle access to ClientEnc(.), it passes the data d to A .
A randomly chooses r ∈ Zn and returns (g r , g rxi1 .(1 + dn)).
● At some point, A outputs d0 and d1 . A randomly chooses a bit b and sends
(g2 , g2−xi2 g3 .(1 + db n)) to A .
● A outputs b . If b = b , A outputs 1 and 0 otherwise.
We can distinguish two cases:
Case 1. If g3 = g γ , we know that g γ is a random group element of G because γ
is chosen at random. g2−xi2 g3 .(1 + db n)) is also a random element of G and gives
no information about db . That is, the distribution of g2−xi2 g3 .(1 + db n)) is always
uniform, regardless of the value of db . Further, g2 does not leak information about db .
Preserving privacy in pre-classification volume ray-casting 61
So, the adversary A must distinguish d0 and d1 without additional information. The
probability that A can successfully output b is exactly 12 , when b is chosen uniformly
randomly. A outputs 1 if and only if A outputs b = b. Thus, we have:
Pr[A (G, n, g, g α , g β , g γ ) = 1] = 1
2
Case 2. If g3 = g αβ , because
g2 = g β and
g2−xi2 g3 .(1 + db n) = g −βxi2 g αβ .(1 + db n)
= g β(α−xi2 ) .(1 + db n)
= g βxi1 .(1 + db n)
Thus, (g2 , g2−xi2 g3 .(1 + db n)) is a proper ciphertext encrypted under PPE. So, we
have:
A
Pr[A (G, n, g, g α , g β , g α,β ) = 1] = SuccPPE,S (k)
Figure 3.6 Secure rendering for the Head image. Part figures (a)–(c) illustrate the
rendered image available to the Image User, the Public Cloud Server,
and the Private Cloud Server, respectively (taken from [8])
Results. Figure 3.6 illustrates how 3DCrypt provides perceptual security in the
cloud. An image available to the Public Cloud Server is all black since the Public
Cloud Server does not know the color and opacity of the pixels. The image available
to the Private Cloud Server, however, contains opacity information, which can disclose
shape of the image as voxel positions are disclosed to the Private Cloud Server.
Performance Analysis. In 3DCrypt, processing by the Volume Outsourcer and
the encryption by the Public Cloud Server are one-time operations, which could be
performed offline. The overheads of these operations, however, are directly propor-
tional to the volume size. The overhead for a volume is equal to the product of a
voxel’s overhead with the total number of voxels in the volume (i.e., the dimension of
the volume). In our implementation, we need approximately 4,064 bits more space to
store the encrypted color and the opacity of a voxel (as two encryptions of 1,024 bits
key size are required for encrypting 32 bits RGBA values). Thus, we require approx-
imately 8.6 GB of space to store a 256 × 256 × 256 volume in encrypted domain
(size of this volume in plaintext is approximately 67 MB). Similarly, for encrypt-
ing color and opacity of a voxel, the Volume Outsourcer requires approximately 540
ms. The Public Cloud Server requires approximately 294 ms more computation with
respect to the conventional plaintext domain pre-classification volume ray-casting
implemented on the same machine. Thus, the Volume Outsourcer and the Public
Cloud Server require approximately 2.52 and 1.37 h, respectively, for encrypting the
256 × 256 × 256 volume.
The rendering by the cloud servers and the decryption by the Image User are
performed at runtime, according to the ray projected by the Image User. The over-
head of performing these operations affects visualization latency, which is discussed
below.
In 3DCrypt, the overhead of transferring and performing the last round rendering
operations in the Private Cloud Server is equal to the product of the number of sample
points with the overhead of a sample point. The total number of sample points is
equal to the sum of the sample points along all the projected rays and the number of
sample points along a ray is implementation dependent. For rendering and decrypting
(the first round) the color and opacity of a sample point, the Public Cloud Server
requires approximately 290 ms of extra computation. For rendering and decrypting
Preserving privacy in pre-classification volume ray-casting 63
(the second round) opacity of a sample point, the Private Cloud Server requires
approximately 265 ms of extra computation (with respect to the conventional plaintext
domain pre-classification volume ray-casting).
In our implementation, for rendering and decrypting the 256 × 256 × 256
volume data for a 256 × 256 image project space, the Public Cloud Server and the
Private Cloud Server require approximately 16.5 and 15.2 extra minutes, respectively.
Note that for this data and image space, the data overhead at the Private Cloud Server
is approximately 1.75 GB.
The overhead of transferring and decrypting the color-encrypted rendered image
to the Image User is equal to the product of the number of pixels in the image space
(which is equal to the number of projected rays) with the overhead for a single pixel.
In 3DCrypt, the Private Cloud Server must send approximately 2,024 bits more data
per pixel to the Image User. Therefore, for rendering a 256 × 256 image, the Image
User must download 66.3 MB of more data than the conventional plaintext domain
rendering. In addition, the Image User needs approximately 408 ms of computation to
decrypt and recover rendered color of a pixel. Therefore, before viewing the 256 × 256
image, the Image User must work approximately 27 extra seconds.
3.9 Conclusions
Cloud-based volume rendering presents the data confidentiality issue that can lead to
privacy loss. In this chapter, we addressed this issue by encrypting the volume using
the modified Paillier cryptosystem such that a pre-classification volume ray-casting
can be performed at the cloud server in the encrypted domain. Our proposal, 3DCrypt,
provides several improvements over state-of-the-art techniques. First, we are able to
hide both color and shape of the rendering object from a cloud server. Second, we
provide better security to collusion attack than the state-of-the-art Shamir’s secret
sharing-based scheme. Third, users do not need to share keys for rendering volume
stored in the cloud (therefore, maintenance of per-volume keys is not required).
To make 3DCrypt more practical, our future work can focus on decreasing per-
formance overheads at both the cloud and the user ends. Furthermore, it would also be
interesting to investigate whether we can extend 3DCrypt for the encrypted domain
post-classification volume ray-casting.
References
[1] E. Cuervo, A. Wolman, L. P. C. et al., “Kahawai: High-quality mobile gaming
using GPU offload,” in Proceedings of the 13th Annual International Confer-
ence on Mobile Systems, Applications, and Services, Florence, Italy, 2015,
pp. 121–135.
[2] KDDI Inc., “Medical real-time 3d imaging solution,” Online Report, 2012,
http://www.kddia.com/en/sites/default/files/file/KDDI_America_Newsletter_
August_2012.pdf.
64 Data security in cloud computing
[3] Intel Inc., “Experimental cloud-based ray tracing using intel mic architecture
for highly parallel visual processing,” Online Report, 2011, https://software.
intel.com/sites/default/files/m/d/4/1/d/8/Cloud-based_Ray_Tracing_0211.pdf.
[4] M. Baharon, Q. Shi, D. Llewellyn-Jones, and M. Merabti, “Secure render-
ing process in cloud computing,” in Eleventh Annual Conference on Privacy,
Security and Trust, Tarragona, Spain, 2013, pp. 82–87.
[5] M. Mohanty, M. R. Asghar, and G. Russello, “2DCrypt: Image scaling and
cropping in encrypted domains,” IEEE Transactions on Information Forensics
and Security, vol. 11, no. 11, pp. 2542–2555, 2016.
[6] M. Mohanty, P. K. Atrey, and W. T. Ooi, “Secure cloud-based medical data
visualization,” in Proceedings of the 20th ACM International Conference on
Multimedia, Nara, Japan, 2012, pp. 1105–1108.
[7] J.-K. Chou and C.-K. Yang, “Obfuscated volume rendering,” The Visual
Computer, vol. 32, no. 12, pp. 1593–1604, Dec. 2016.
[8] M. Mohanty, M. R. Asghar, and G. Russello, “3DCrypt: Privacy-preserving
pre-classification volume ray-casting of 3D images in the cloud,” in Inter-
national Conference on Security and Cryptography (SECRYPT), Lisbon,
Portugal, 2016.
[9] M. Mohanty, W. T. Ooi, and P. K. Atrey, “Secure cloud-based volume
ray-casting,” in Proceedings of the 5th IEEE Conference on Cloud Computing
Technology and Science, Bristol, UK, 2013.
[10] D. Koller, M. Turitzin, M. Levoy et al., “Protected interactive 3D graphics
via remote rendering,” in ACM SIGGRAPH, Los Angeles, USA: ACM, 2004,
pp. 695–703.
[11] A. Dasgupta and R. Kosara, “Adaptive privacy-preserving visualization
using parallel coordinates,” Visualization and Computer Graphics, IEEE
Transactions on, vol. 17, no. 12, pp. 2241–2248, 2011.
[12] M. Levoy, “Display of surfaces from volume data,” IEEE Computer Graphics
and Applications, vol. 8, pp. 29–37, 1988.
[13] Sinha System, “Cloud based medical image management and visualiza-
tion platform,” Online Report, 2012, http://www.shina-sys.com/assets/
brochures/3Di.pdf.
[14] E. Bresson, D. Catalano, and D. Pointcheval, “A simple public-key cryptosys-
tem with a double trapdoor decryption mechanism and its applications,” in
Advances in Cryptology—ASIACRYPT 2003, ser. Lecture Notes in Computer
Science. Springer Berlin Heidelberg, 2003, vol. 2894, pp. 37–54.
[15] G. Ateniese, K. Fu, M. Green, and S. Hohenberger, “Improved proxy
re-encryption schemes with applications to secure distributed storage,” ACM
Transactions on Information and System Security, vol. 9, pp. 1–30, February
2006.
[16] E. Ayday, J. L. Raisaro, J.-P. Hubaux, and J. Rougemont, “Protecting and
evaluating genomic privacy in medical tests and personalized medicine,” in
Proceedings of the 12th ACM Workshop on Privacy in the Electronic Society,
Berlin, Germany, 2013, pp. 95–106.
[17] C. Dong, G. Russello, and N. Dulay, “Shared and searchable encrypted data
for untrusted servers,” Journal of Computer Security, vol. 19, pp. 367–397,
August 2011.
Chapter 4
Multiprocessor system-on-chip for processing
data in cloud computing
Arnab Kumar Biswas1 , S. K. Nandy2 , and
Ranjani Narayan3
Abstract
Cloud computing enables cloud customers to obtain shared processing resources and
data on demand. Cloud providers configure computing resources to provide different
services to users and enterprises. These cloud providers satisfy the need for high-
performance computing by bringing more PEs inside a chip (known as Multiprocessor
System-on-Chip (MPSoC)) instead of increasing operating frequency. An MPSoC
usually employs Network-on-Chip (NoC) as the scalable on-chip communication
medium. An MPSoC can contain multiple Trusted Execution Environments (TEEs)
and Rich Execution Environments (REEs). Security critical applications run in TEEs
and normal applications run in REEs. Due to sharing of resources (for example, NoC)
in cloud computing, applications running in two TEEs may need to communicate over
an REE that is running applications of a malicious user (attacker). This scenario can
cause unauthorized access attack if the attacker launches router attack inside the NoC.
Apart from this attack, an attacker can also launch misrouting attack using router attack
causing various types of ill effects. To deal with these security concerns, we discuss
in detail different hardware-based security mechanisms. These mechanisms mainly
employ monitoring to detect a router attack and possibly a malicious router location.
The hardware-based mechanisms can provide much-needed protection to users’ data
in a cloud computing MPSoC platform. Apart from the threat model with practical
examples, detailed hardware description of each security mechanism is given in this
chapter for easy understanding of the readers.
4.1 Introduction
1
Hardware and Embedded Systems Lab, School of Computer Science and Engineering, Nanyang
Technological University, Singapore
2
Indian Institute of Science, Bengaluru, India
3
Morphing Machines, Bengaluru, India
66 Data security in cloud computing
(SecaaS) to cloud customers taking many different forms. The cloud security alliance
(formed by many of these vendors) provides guidelines for implementing SecaaS
offerings [21]. The field of cloud security requires more research and development
to ensure proper security to cloud users. Recently, Viaccess-Orca has proposed a
TEE architecture called data center TEE (dcTEE) for cloud computing systems [22].
The dcTEE architecture is proposed to equip a data center with TEEs like mobile
systems but taking full advantage of cloud’s elasticity and scalability. We assume
a similar architecture where each MPSoC contains multiple TEEs and the cloud
platform (consisting of these MPSoCs) also contains distributed TEEs. We restrict
our current discussion in this chapter to a single MPSoC node and consider the router
attack vulnerability that can hamper the security of a chip by launching attack at the
network abstraction layer itself.
An MPSoC can contain multiple secure (TEE) and nonsecure (REE) regions
to provide both security and high performance. On the one hand, a secure region
(TEE) ensures that sensitive data is always stored, processed, and protected with a
level of trust. Non-secure regions (REEs), on the other hand, can be configured and
customized to meet certain performance requirements. Specifications about TEE and
REE are available in [23,24]. Figure 4.1 shows an MPSoC with two TEEs and one
REE. We assume that application mapping to a set of PEs is done according to the
existing literature on TEE and REE (like [22]) following the specifications in [23,24].
In case of Figure 4.1, we assume that one TEE application is mapped to the PEs (2,0),
(3,0), (2,1), and (3,1) and the other TEE application is mapped to the PE (0,0). One
REE application is mapped to the PEs (1,0), (0,1), and (1,1). Note that the MPSoC
is used to support multi-tenant public cloud environment and different users can run
their applications on the same MPSoC. Also note that all regions inside an MPSoC
share the same NoC for communication. An attacker can launch unauthorized access
attack using router attack in an REE if two TEEs communicate over that REE [25,26].
Apart from this unauthorized access attack, an attacker can launch misrouting attack
using router attack. In this chapter, we discuss router attacks in detail that can cause
various ill effects and also various hardware-based security mechanisms against the
router attacks. These security mechanisms mainly employ monitoring to detect and
locate a malicious router.
Various aspects of MPSoC cloud platform security are considered in literature. In [27],
authors have separated the whole system into two logical domains, viz. secure and non-
secure. They have provided a list of attacks that can affect an NoC-enabled MPSoC.
TrustZone Technology in [28] is used to implement TEE in ARM MPSoCs with
single secure region. This solution is only applicable to AMBA3 AXI/APB bus-based
communication. In [29], authors have presented data protection unit in every network
interface (NI) that can protect from software attacks or more specifically unauthorized
access to memory locations by the PEs. The authors of [30] have proposed Quality
of Security Service (QoSS) using layered NoC architecture to detect attacks based
on different security rules. An authenticated encryption-based security framework in
68 Data security in cloud computing
configuration information results in routing table attack. The attack can be launched
by an insider who is responsible for loading the routing tables in the MPSoC cloud
platform. Routing tables can be maliciously reconfigured at runtime by an application
that causes generation of malicious routing table update packets. Although authen-
tication mechanism can be implemented for the loading process, an insider attack
can still break that causing single point of failure. The mechanisms presented in this
chapter work at the NoC level and can also operate as a second level of protection if
authentication mechanism is present for loading.
● Suboptimal routing and increased delay: Routing table attack can result in
suboptimal routing which affects the real-time applications by routing the packets
over unnecessarily long routes and thus affecting the quality-of-service.
● Congestion and link overload: The attack can lead to artificial congestion if
packets are forwarded to only certain portions of the network depending on the
modified routing table. Large volume of traffic through a limited capacity link
can overwhelm the link making it unusable.
● Deletion of PE: Wrong entries in routing tables can delete a PE in an MPSoC. All
packets can be diverted before reaching that particular destination PE. Similarly
the PE may not be able to send any packet to any other PE, because the connected
router does not route any packet from this PE anymore. This is a serious case of
denial-of-service.
● Overwhelming critical PE: If packets are routed toward a certain critical PE at a
rate that is beyond its capacity of handling, then this PE can be made unreachable
to any legitimate PE. Here legitimate PE means those PEs that are not malicious
and want to communicate with this critical PE.
● Deadlock and livelock: Improper routing tables can cause a packet to loop around
and never reach its destination. It causes deadlock and livelock because no other
packet can use that route.
● Illegal access to data: If the attacker can get hold of a node in nonsecure region,
then he/she can route all data from secure region to that node and thus gain
unauthorized access to secret information.
70 Data security in cloud computing
PE PE PE PE
Nonsecure Secure
Region Region
Secure PE PE PE PE
Region
Figure 4.1 An example of router attack in a 2 × 4 mesh NoC with two secure
regions
Table 4.1 (a) Port ID assignments to different ports, (b) Normal X-coordinate and
Y-coordinate routing tables at Router (1,0), and (c) Malicious
X-coordinate and Y-coordinate routing tables for unauthorized access at
Router (1,0)
Figure 4.2 Circular path creation by four malicious routers. Each X and Y
coordinate routing table represents a router at that location.
from a secure region (say PE (3,0)) and going to another secure region (say PE (0,0))
can be diverted to the eject port of this router (1,0). The packet becomes available to
the nonsecure PE (1,0) and that means the attack is successful. The diversion happens
irrespective of the actual destination of the packet. This is the case of unauthorized
access attack. Routing logic implementation can also be maliciously changed using
hardware Trojan-based attacks but that is outside the scope of this chapter.
Figure 4.2 shows an example of misrouting attack—more specifically creation
of a circular path using four routers. Each X and Y table pair represents a router in
that location in the figure. If all ports apart from the south port routing tables of the
top right corner router of Figure 4.2 are modified, all packets entering into the router
are diverted to the south port (port ID 2). There are no packets that can enter through
the south input port because the bottom right corner router is diverting all packets to
its West port. Hence, the circular path is created as shown in Figure 4.2.
C C C C
To Aud
To Aud
Nonsecure
Region
Secure C Secure
C C C
Region Region
To Aud
From
Routers
Auditor
(Aud) RM: Runtime-Monitor
Figure 4.3 2 × 4 mesh NoC with runtime monitors in every row. The letter C inside
a circle represents a PE.
Figure 4.4 Runtime monitor block diagram with connections to secure and
nonsecure region routers
module checks the ack packet’s destination address. If the ack packet is destined for
the current monitor, it is passed to the count controller module through the arbiter and
crossbar module. Otherwise, the ack packet is sent to the next monitor through the
corresponding arbiter and crossbar module. Runtime monitor has as many counters
as there are routers outside this secure region in that row. In the example shown in
Figure 4.3, there are two counters. The first packet arriving from the secure region
destined to a particular router starts the corresponding counter. In every clock cycle,
this counter is incremented by 1. It will not be reset or restarted for subsequent pack-
ets while awaiting ack packet from the destination. Source addresses of all packets
traveling from secure to nonsecure region are changed in the runtime monitor with the
nearest secure region router address. This ensures that all corresponding ack packets
arrive at the concerned runtime monitor.
Every counter will count up to a max-count that can be expressed as:
Max-count (in cycles) = [Tmin + f (τ , I )]2NH + TPr + Tmon (4.2)
Here Tmin is the minimum time required for a packet to leave a router after entering it.
The function f (τ , I ) represents a term corresponding to the network traffic condition.
This function depends on traffic distribution τ and injection rate I . Max-count value is
decided based on the total packet latency (packet going and ack coming) correspond-
ing to the farthest destination. NH is the hop count of the furthest destination. TPr and
74 Data security in cloud computing
Tmon are PE processing time and delay in monitor, respectively. The max-count value
ensures that, in an attack free MPSoC cloud platform, an ack packet indeed reaches
the runtime monitor within the max-count number of cycles. Max-count of counters
does not have any influence on the performance of normal packet flow in the NoC. It
only influences the malicious router detection performance, i.e., the time required to
detect the location of a malicious router.
Runtime monitor has an ack packet corroborator module. It checks the destination
address of ack packets received from the nonsecure region. If an ack packet is destined
for the current monitor, it is passed on to the count controller module through the
arbiter and crossbar module. Otherwise, the ack packet is sent to the next monitor
through the corresponding arbiter and crossbar module. A counter is stopped by
the count controller module if the corresponding ack packet is received within the
stipulated max-count value. If not, the counter module sends a count-expired signal
to the decision logic. Decision logic changes the state of the monitor from normal
to “attack detected.” It stops all packet movement from secure region to nonsecure
region. Further, it signals the packet composer to form monitoring packets. Source
address field of these packets carry the address of the nearest secure region router.
As an example, in this chapter, we consider YX deterministic routing, which routes
packets in the horizontal (X) direction after traversing the vertical (Y) direction.
Hence, a packet traveling through the monitor has its destination in the same row as
the monitor. A malicious router could be any of the routers lying in the same row
as the destination router. In order to determine the exact location of the malicious
router, the packet composer sends monitoring packets starting from the router closest
to the monitor to the farthest router in the row (till the destination router). Payload
of a monitoring packet does not contain any meaningful data. These are only used
to generate ack packets in the destination PE. Counter-based technique mentioned
above is used to monitor these ack packets also (generated by monitoring packets).
This helps the monitor to exactly locate the malicious router. Note that all the PEs
located before the malicious router are able to return ack packets. The malicious router
diverts the packet and hence no ack packet arrives at the monitor and the location
of that malicious router is identified. After detection of a malicious router, runtime
monitor sends the router address to the auditor. This is achieved by forming a packet
with the malicious router address in the payload field. The packet travels through the
monitor chain and reaches the auditor at the end.
The auditor collects contents of routing table of the attacked router, so that the
attack can be read off-line and analyzed. Figure 4.5 shows the block diagram of the
auditor. First, it receives the malicious router address from the corresponding monitor.
Then the auditor’s Decision module sends an enable signal to the corresponding router.
The routing table uploader module of the attacked router, after receiving this signal,
uploads the routing table to the auditor.
Now there is a possibility that the attacker might send false ack packet with
legitimate destination address. To mitigate this problem of source authentication in
nonsecure region, router address is stored in a nonvolatile memory in the router and
cannot be changed by attacker. This source address is automatically added to a packet
when the packet is ejected from the PE. So, even if the attacker wants to send an ack
Multiprocessor system-on-chip for processing data in cloud computing 75
Routing
Table From first
Decision Row routers
holder
module
Routing
Table From second
Enabler holder Row routers
Enable signals to routing table up -loader
Module
Enable signal
(for restart
Auditor to all restart
Monitor)
monitors
Monitoring completion signal Control signal to all
from restart monitors switching modules
packet, it will not contain a legitimate source router address. An ack packet’s source
address is used to reset the corresponding counter. If the ack packet does not contain
legitimate source address, the corresponding counter will expire and router attack
will be detected.
C C C C
From Aud
Router Router Router Router
Sw
(0,1) (1,1) (2,1) (3,1)
To Aud
To Aud Restart-
Nonsecure
Monitor
Region
Secure C C C C
Region
Router Sw Router Sw Router Router
(0,0) (1,0) (2,0) (3,0)
To
Aud Restart-
RM To Aud Monitor Secure
Region
From
Routers
Auditor
(Aud) RM: Restart-Monitor, Sw: Switching module
and requires less area. Most of the modules and their operations in restart monitor
are same as runtime monitor.
After completion of monitoring all routers in the row outside this secure region,
the restart monitor sends a monitoring completion signal to the auditor. Auditor then
disables the monitors and sends control signal to switching modules so that the routers
of secure and nonsecure region get connected for start of normal operation. Since the
MPSoC cloud platform is not available for normal operation when restart monitor is
monitoring the routers, there is no unauthorized access to data during this period. The
only drawback of this technique is that the delay in starting normal operation of the
MPSoC cloud platform is proportional to the number of routers in the NoC.
Instead of counting up to max-count as in the case of the runtime monitor, each
counter in the restart monitor is set to count up to a specific value. This value is based
on the topological distance of the corresponding router from the restart monitor. The
count can hence be expressed as:
Count (in cycles) = Tmin 2NHi + TPr + Tmon (4.3)
Here Tmin is the minimum time required for a packet to leave a router after entering
it. Count value is decided based on the total packet latency (packet going and ack
coming) corresponding to a destination. NHi is the hop count of the i-th destination.
Here i ∈ {1, . . . , n} with 1 is for the closest destination. TPr and Tmon are PE processing
time and delay in monitor, respectively. Restart monitor can send monitoring packets
one after another without waiting for ack to return—this reduces the total time required
to monitor all routers. Ack packets arrive in the same order in which monitoring
packets were sent. Whenever there is no ack packet from a router, the corresponding
router is detected as malicious and reported to the auditor.
Packet toward
To PE
eject port Comparison
module
Packet from
Packet toward PE
routing
Table 4.2 Comparison between runtime monitor, restart monitor, and EAC
Delay added 1 0 1
(in clock cycles)
which clearly shows that it is a hardware module which is not configurable and hence
is protected from attacks similar to those directed toward routing tables. In case of
unauthorized access attack, packets destined for different routers are diverted to the
eject port for capturing. The EAC consists of a comparison module to compare the
destination address of a packet toward the eject port with the router’s address from
NVM. If both addresses are same, it allows the packet to pass to the PE. If the addresses
are different, comparison module signals the packet composer to form a packet to be
sent to auditor. Also the state is changed to “attack detected.” Packet composer takes
the source address from NVM, auditor destination address, and a particular payload
to indicate the attack detection. The packet is sent for routing toward the auditor. Here
it is assumed that a packet can be routed to the auditor through the NoC itself.
Table 4.2 compares runtime monitor, restart monitor, and EAC.
250 250
250
Throughput (M pkts/s)
Throughput (M pkts/s)
Throughput (M pkts/s)
200 200
200
150 150 150
100 100 100
50 50 50
0 0 0
0.9
2.7
4.5
6.3
8.1
9.9
0 10 20 0 10 20
Time (ns) Time (ns)
Time (ns)
(a) (b) (c)
Figure 4.9 Dataflow during normal operation. (a) Secure region to nonsecure
region dataflow at monitor, (b) dataflow at Router (1,0) eject port, and
(c) dataflow at actual destination Router (0,0)
Throughput (M pkts/s)
Throughput (M pkts/s)
200 200 200
50 50 50
0 0 0
0 10 20 0 10 20 0.9 3.6 6.3 9
Time (ns) Time (ns) Time (ns)
(a) (b) (c)
Figure 4.10 Dataflow after attack without runtime monitor in place. (a) Secure
region to nonsecure region dataflow at monitor, (b) dataflow at router
(1,0) eject port, and (c) dataflow at actual destination router (0,0)
It can be clearly seen from Figure 4.10 that traffic destined for router (0,0) is diverted
to router (1,0) as shown in Figure 4.10(b) and (c), respectively. So, it can be seen that
the attack is possible and an attacker can gain unauthorized access to all data coming
from or going to a secure region of MPSoC cloud platform. Constant dataflows (i.e.,
horizontal lines) in Figures 4.9 and 4.10 depict absence of runtime monitors and
resulting dataflow blocking. Figures 4.9(b), 4.10(c), and 4.11(c) indicate the absence
of dataflow.
NoC of Figure 4.3 is used to simulate the operation of runtime monitor. This NoC
is similar to Figure 4.1 except the presence of the monitors in each row. Detection
of the attacked router by the runtime monitor is shown in Figure 4.11. From the start
of packet transmission to the expiration of the corresponding counter, normal packet
80 Data security in cloud computing
Throughput (M pkts/s)
Throughput (M pkts/s)
50 50 50
0 0 0
0.9
2.7
4.5
6.3
8.1
9.9
0 10 20 0 10 20
Time (ns) Time (ns) Time (ns)
(a) (b) (c)
Figure 4.11 Dataflow after attack with runtime monitor in place. (a) Secure region
to nonsecure region dataflow at monitor, (b) dataflow at router (1,0)
eject port, and (c) dataflow at actual destination router (0,0)
120
103.4
100
100
Relative area in percentage (%)
80
60
40
26.6
22
20
0
Router Router with Ejection Runtime Monitor Restart Monitor
address checker
Figure 4.12 Relative areas of router including EAC, runtime monitor, and restart
monitor compared to a router
Multiprocessor system-on-chip for processing data in cloud computing 81
C C C C
Nonsecure From Aud
Region
Router Router Router Router
(0,1) (1,1) (2,1) (3,1)
To Aud
To Aud Restart-
IM Monitor Secure
Region
To Aud
Secure C C C C
Region
Router Router Router Router
(0,0) (1,0) (2,0) (3,0)
From restart-
monitors To Restart-
RM
Aud Monitor
To
From Aud To Aud
Auditor Routers IM
(Aud)
To Aud
From
RM: Restart-Monitor
IM
Figure 4.13 The whole MPSoC cloud platform structure including restart
monitors. C in circle stands for PE.
82 Data security in cloud computing
The nonsecure region is divided into sections. Here, one section consists of a row
of routers. A local monitoring module inside every nonsecure region router detects
the misrouting attack. An intermediate manager (IM) coordinates these modules in a
section of routers. All the IMs are managed by the auditor.
Malicious N
From bit reg Malicious
North setter bit reg
Header Address To Aud
input port
checking generator
module Malicious S
From bit reg Malicious
South setter bit reg To IM
input port Header
checking From IM
module E Decision
Malicious
From Malicious Logic
bit reg
East bit reg
setter
input port Header Blocking signals
checking to four I/O
module W
Malicious ports
From Malicious
bit reg
West bit reg
setter
input port Header
checking
module
Figure 4.14 Block diagram of local monitoring module. Reg stands for register in
the figure.
Multiprocessor system-on-chip for processing data in cloud computing 83
the concerned port. The reason is that sending data to a malicious router is waste of
valuable data. The address generator module generates the malicious router address
taking into consideration which input port has detected it. Malicious router is the
one that has sent a wrong packet detected by header checking module. The address
generator module forms a combined address message consisting of the current router
address and the generated malicious router address. This message is sent to Aud when-
ever requested by IM. The header checking module operates in parallel with other
operations at an input port like checking the routing table. So its operation does not
cause any latency increment of packets. Same is true for malicious bit register setter.
So it is evident that the local monitoring module in routers does not affect the MPSoC
cloud platform performance.
Malicious Malicious
bit register bit register
To Aud
Decision
Logic
From Aud
Intermediate Manager
This process can be automated in future to reload correct routing tables without
administrator intervention.
Interested reader can find further details of the techniques discussed in this
chapter in [25,37]. We have also presented an earlier version of the work in [26].
Acknowledgments
This work was supported by the PhD scholarship from the MHRD, Government of
India. We want to thank Prof Rajaraman of SERC, IISc for reviewing this chapter and
suggesting some changes.
References
ser. ASPLOS XVII. New York, NY: ACM, 2012, pp. 437–450. [Online].
Available: http://doi.acm.org/10.1145/2150976.2151022
[36] J. Szefer and R. B. Lee, Hardware-Enhanced Security for Cloud Computing.
New York, NY: Springer New York, 2014, pp. 57–76. [Online]. Available:
http://dx.doi.org/10.1007/978-1-4614-9278-8_3
[37] A. K. Biswas, “Securing multiprocessor systems-on-chip,” PhD dissertation,
Department of Electronic Systems Engineering, Indian Institute of Science,
Bangalore, August 2016. [Online]. Available: http://etd.ncsi.iisc.ernet.
in/handle/2005/2554
[38] Bluespec, 2013. [Online]. Available: http://www.bluespec.com
[39] Faraday Memaker, 2014. [Online]. Available: http://freelibrary.faraday-
tech.com/ips/65library.html
Chapter 5
Distributing encoded data for private
processing in the cloud
Mark A. Will∗ and Ryan K. L. Ko∗
Abstract
Traditional cryptography techniques require our data to be unencrypted and to be
processed correctly. This means that at some stage on a system we have no control
over, our data will be processed in plain text. Solutions that allow the computation
of arbitrary operations over data securely in the cloud are currently impractical. The
holy grail of cryptography, fully homomorphic encryption, still requires minutes
to compute a single operation. To provide a practical solution, this chapter proposes
taking a different approach to the problem of securely processing data. This is achieved
by each cloud service receiving an encoded part of the data, which is not enough
to decode the plain-text value. The security strength is shifted from a computation
problem to the sheer number possible options. Given the greater threat to data stored
in the cloud is from insiders, this is the primary attack vector the presented schemes
Bin Encoding and FRagmenting Individual Bits (FRIBs) aim to protect against.
5.1 Introduction
Private information in the cloud is at constant risk of attack. Recent information
regarding the Yahoo hack reveals that a billion accounts were stolen [1]. Furthermore,
insider attacks from cloud employees and administrators1 are a threat, and arguably
the bigger threat to customer data [3–5]. A survey by Kaspersky and B2B International
revealed that 73% of companies have had internal information security incidents, and
stated that the single largest cause of confidential data loss is by insiders (42%) [5].
Therefore, a current research challenge is to allow a user to control their own security,
providing the ability to protect data from both insider and outsider attacks while
maintaining functionality.
∗
Cyber Security Lab, University of Waikato, New Zealand
1
An engineer at Google abused his privileged administrator rights to spy on teenagers using the GTalk
service. Only after the teenagers’ parents reported the administrator, Google was made aware [2].
90 Data security in cloud computing
Given how impractical solutions are currently, we have taken a different approach
to the concept of processing data securely. We propose encoding and distributing data
to different cloud service providers. This chapter provides a summary of two previ-
ously published works: (1) Bin Encoding [6] and (2) FRagmenting Individual Bits
(FRIBs) [7]. Both schemes presented were designed to meet the following design
goals.
● No single server can reveal the full data. To protect privacy, each server should
not be able to decode any value.
● Full cloud service. The schemes should be easy to implement on current cloud
infrastructure, for example, Amazon AWS [8] or Microsoft Azure [9], and not
require any special hardware or equipment.
● Practical performance. Should be usable, allowing today’s users of the cloud to
be protected, while still getting computational functionality.
● Accurate results. For arbitrary secure processing, the correct result should be
returned 100% of the time, where approximate string searching is best effort
because of the nature of the problem.
The threat models against which we evaluate our methods are based on the fol-
lowing assumptions: (1) the communication channel between each distributed server
and the client is secure; (2) each server encrypts user data before storing it to disk;
(3) each server has no knowledge on other servers used to store data and (4) data is
stored across multiple cloud service providers.
Based on these assumptions, there are two types of attacker to evaluate against: a
malicious insider from the cloud service and a malicious user/outsider. Both present
similar threats; however, a malicious insider has an advantage because he or she
already has access to one cloud service providers system. If a malicious insider
manages to bypass all internal security, for example, access policies and permissions,
then they can discover parts of the distributed data. Now they become the same as any
other malicious user, as they can try to break into all the other cloud service providers.
This summaries into two attack vectors: breaking the data with one set of distributed
data and getting all the data from each system.
5.2.1 Encoding
One of the oldest encryption techniques known is the Caesar cipher, named after
Julius Caesar who used it to protect messages of high importance, such as military
instructions. This is a type of substitution cipher, which maps plain-text values to
their cipher-text counterparts at an shifted index in the alphabet. Decoding requires
the shift number, which is the secret key. However in the case of a random one-to-one
mapping as used by Mary, Queen of Scots,2 the fixed mapping would be the secret.
2
Mary’s secret messages were broken via frequency analysis, which led to her execution in 1587 [10].
Hence this attack vector for string securing is covered in [6] and briefly mentioned in Section 5.5.
Distributing encoded data for private processing in the cloud 91
An everyday case where encoding has offered data privacy to people is those who
can speak another language. For example, if you live in an English-speaking country
and understand Mandarin Chinese, anything you type or speak will be foreign to most.
Therefore, notifications on your phone’s lock screen have a degree of protection if
they are in another language. A substitution cipher follows this principle, where
it can be thought of mapping one language to another. But where a language can
be understood by many, secured data should only be understood by the intended
parties.
to delegate the task of building indexes and managing search to the service provider.
Current systems require the index to be downloaded, updated, and re-uploaded. More-
over, they do not support phrase or proximity searching, and at best are robust only to
simple (single-character) typographical and spelling errors. Little research has been
devoted to addressing these shortcomings.
Applications requiring multiple operations must therefore use FHE. FHE was
only proven plausible by Gentry as late as 2009 [24], many years after PHE.
Wang et al. [25] showed performance results of a revised FHE scheme by Gentry
and Halevi [26] in 2015 for the recrypt function. Central processing unit (CPU) and
graphics processing unit (GPU) implementations took 17.8 and 1.32 s, respectively,
using a small dimension size of 2,048 [25]. A medium dimension size of 8,192 took
96.3 and 8.4 s for the same function [25].
Currently hardware implementations [27] of FHE schemes cannot give practical
processing times, so it will be difficult to make this technology usable in the real world.
Combined with the fact that quantum computing is making huge advancements [28,
29], having data protected by traditional encryption schemes (for example, Rivest
et al. [12], Diffie–Hellman [30], and elliptic curves [31]) may not be as feasible in
the future as it is today. Lattice-based encryption [24] could be a solution; however,
it will result in even larger key sizes than current impractical FHE schemes.
5.2.2 Distribution
Cloud providers distribute their services for features like lower latency, load-
balancing, and redundancy [32,33]. But rarely do they use distribution to provide
better security and data protection. Currently proposed distributed processing
schemes run parts of a program on different servers, so that none have a full pic-
ture on what is being processed. These, however, do not fully protect data privacy as
values are in plain text while being processed on each server.
For example, one technique currently used is to distribute columns of tables in a
database over many servers [34]. Then in the event of a server being compromised,
only some of the data is lost. Another example is MultiParty Computation (MPC) [35],
where multiple parties securely compute a function over some secret input [36–39].
Each server has its own private inputs into the function, but these often are in plain
text. Other limitations of MPC include that only one server can be corrupted for
correctness and the encrypted circuits can require large network transfers.
Some PHE schemes have threshold variants which allow decryption to be split
across many servers [40] and have primarily been used for voting schemes [13,16,41].
This provides extra protection to the decryption key, as each server only possesses a
part of it. Now if a server is compromised, no relevant data is lost. However if more
servers are compromised, and additional parts of the decryption key are leaked, the
easier it becomes for a malicious entity to break the key. Sections 5.3 and 5.4 follow
this idea where data is distributed such that if some servers are compromised, it is still
computationally difficult to get any real data. Choosing servers and cloud services
to host this distributed data now becomes very important. There needs to be a range
Distributing encoded data for private processing in the cloud 93
of cloud service providers to prevent insider attacks, and a range of underlying soft-
wares such as operating systems to protect against a zero-day-attack compromising
all servers.
To explain this idea further, distributing data is the act of splitting up some piece
of information into smaller pieces. For example when complete, a puzzle is an image.
But when broken up it can be hard to tell what the image should be. If we save each
piece of information to a different location, then figuring out what the final image is
from one piece is difficult.
If we turn off the lights, the puzzle becomes even more difficult because the image
gives us no information to help solve the puzzle. This is equivalent to distributing
some information which has been encoded using a known algorithm (in this case, the
puzzle cutter). Without all the pieces we can’t decode the information, even though
we know the algorithm used.
However by not knowing the algorithm used to encode the data, we have to try each
possibility to join the pieces together where only once we get the correct order will the
light turn back on. But the light may turn on for false positives, because it is possible
to decode something into completely the wrong thing while appearing to look correct.
94 Data security in cloud computing
This example puzzle has 20! = 2.432902008 × 1018 combinations. Even computing
a million combinations a second, it would still require thousands of years to generate
all possibilities. This gives distribution of the computationally intensive property that
is required for something to be considered as encryption. Therefore in this chapter,
the definition of secure encoding with distribution is equivalent to encryption.
5.3.1 Overview
Privacy preserving string searching is the first scheme presented to verify the idea of
secure encoding with distribution. The scheme encodes documents and queries using
‘Bin Encoding’. This uses a lossy encoding technique – a simple trapdoor – that maps
characters individually to bins. There are several bins, and multiple characters map
to the same one. Hence the original string cannot be obtained from its encoding. For
example, here is a mapping with three bins A, B and C:
{a, b, c, d, e, f , g, h, i} ⇒ A
{j, k, l, m, n, o, p, q, r} ⇒ B
{s, t, u, v, w, x, y, z} ⇒ C
Distributing encoded data for private processing in the cloud 95
Bob
(This example is for illustration; in practice, we envisage many more bins for a
single index to reduce the number of false positives when searching.) Relative to this
mapping, the encoded values for hello and world areAABBB and CBBBA, respectively,
which can be obtained using Algorithm 1. Apart from world, another possibility for
CBBBA is slope (among others). However, these possibilities can only be generated by
someone who knows the bin mapping. Given the encoded value but not the mapping,
there are countless possibilities for CBBBA, such as hello (even though the above
bins map it to AABBB). The user’s data is protected by hiding in many possible bin
combinations (>1020 ).
Figure 5.1 shows a typical use case. Bob is saving a file in the cloud, and would
like it to be encrypted, while retaining the ability to search its contents at a later
date. Before transmitting the file, his device encodes it; the computational expense is
trivial. Bob separately encrypts the file using a secret key and sends both encrypted
and encoded versions to the cloud service. Filenames are encrypted, while smaller
documents can be padded. Note this example is not distributed, and this will be
introduced in Section 5.3.6.
although in practice it is likely that fixed sized grams such as bigrams or trigrams
will be used.
Environment B:
- Index I
- No Map Info
- No Files
Environment A: Environment C:
- No Index - Index II
- Map for I,II&III - No Map Info
- Encrypted Files - No Files
Environment D:
- Index III
- No Map Info
Bob - No Files
combining search results from each index, more accurate results can be returned. For
example if one index gives a match on part of the document but another does not
return the same match, this is likely to be a false positive.
Environment A in Figure 5.2 is responsible for distributing the query to the
indexes and for combining the results to return to the user. When the user sends
a query or edits a document, the number of bins for encoding can be large (e.g.
13). When Environment A receives encoded values, it uses a randomly generated
mapping, different for each user, to further encode the already encoded values for
each environment into a smaller number of bins. Although the user’s mapping is never
sent to the cloud, each index has a different mapping. Environment A also stores the
encrypted files received from the user (although they could be placed elsewhere).
lead to varying results. However the important aspect is that privacy is returned to
the user and could allow for a quick filter on the client side to move it up the list.
5.3.8 Summary
Overall the distributed index provides better security than a single index and allows for
accurate results when using a small number of bins. Even if one server is compromised,
the index is still protected. This is because the lossy encoding is computationally
intense to reverse, similar to encryption.
5.4.1 Overview
The FRIBs scheme has been designed to distribute each individual bit across many ser-
vice providers, while still allowing Negative-AND (NAND) operations to be computed.
We likened our proposed idea to the New Zealand terminology of ‘fribs’, which are
small pieces of unwanted wool removed after shearing. If we say a ‘bit’ is the woollen
fleece, then it cannot be recreated without all the fribs and wool. Distributing the bit
fragments can be seen as exporting the fribs and wool to different locations, known
as fragment servers. Once exported, the bit fragments can be processed securely by
building functions from NAND gates.
The fragmentation is similar to encoding into bins, but is lossless as accurate
results must be obtained. Therefore when the fribs are combined, the bit value can be
100 Data security in cloud computing
Value F0 F1
Low 0 0
Low 1 0
Low 0 1
High 1 1
Value F0 F1
Low 0 0
Low 1 0
Low 0 1
Low 2 0
Low 0 2
Low 2 2
High 1 1
High 2 1
High 1 2
decoded. This follows the same principle of a threshold cryptosystem [40], which has
N entities, but only requires t entities to decrypt a value (where t < N ). Therefore if
t entities are compromised, then the encrypted data is no longer protected.
Given a value {0, 1} or {low, high}, the AND function is used to encode/fragment
the bit.3 An example is shown in Table 5.3 where a value is encoded into two fribs.
A potential problem with this example is that 50% of the fribs are 0. Assuming an
equal probability (50:50) between high and low bits before encoding, each servers
can solve ≈ 1/3 of the bit values (using the fact that one-half are low, and two-thirds
of the fribs are 0 for low values). Depending on requirements, this could be seen
as too much information leakage, even though complete values (for example, 32-bit
integers or 8-bit characters) are still unknown.
One technique to reduce the number of 0 fribs is to introduce more frib states.
Table 5.4 gives an example of three states for two servers, resulting in one-third of the
fribs equalling 0. The fragmentation used in Table 5.4 is F0 ∧ F1 , where the value 2
is low unless the other frib is high. Now each server can only solve ≈ 1/4 of the bit
values. However the easier solution is to increase the number of fribs, and only allow
one frib to be 0 for an encoding, as shown in Table 5.5. This results in only 1/(N + 1)
3
Note that in this chapter we will only use the AND operation for fragmentation and only use two states for
easier explanations and examples. Future work will demonstrate how FRIBs can be extended to support
varying fragmentation algorithms, providing even more security.
Distributing encoded data for private processing in the cloud 101
Value F0 F1 F2 F3 F4
Low 0 1 1 1 1
Low 1 0 1 1 1
Low 1 1 0 1 1
Low 1 1 1 0 1
Low 1 1 1 1 0
High 1 1 1 1 1
fribs equalling 0, where N is the number of fribs, and a server only having knowledge
of 1/(2N ) of the bit values.
F0 F1 Result F0 F1 Result
A 0 ⊕ 0 0 A 0 0 1
⊕ ⊕ ⊕ (5.1)
B 1 ⊕ 1 0 B 1 1 0
1 ⊕ 1 0 1 1 ?
of NAND operations, as demonstrated in (5.2) and (5.3) where the same frib values
give different results.
F0 F1 Result F0 F1 Result
A 11 ∧ 1 1 A 1111 ∧ 11 0
(5.2)
B 11 ∧ 1 1 B 11 ∧ 1111 0
1111 ∧ 11 0 111111 ∧ 111111 1
F0 F1 Result F0 F1 Result
A 11 ∧ 11 0 A 111 ∧ 111 1
(5.3)
B 1 ∧ 1 1 B 111 ∧ 111 1
111 ∧ 111 1 111111 ∧ 111111 0
Since anything NANDed with 0 results in 1, FRIBs uses 0 to maintain order. This
is why the fragmentation is currently done with the AND function, so that if a server
has a 0, it knows the bit value is low. Equations (5.4) and (5.5) give the same examples
as in (5.2) and (5.3), but now maintains order. Equation (5.4) now gives the right frib
value of 11011011 instead of 111111, which represents 11 (11 11). But the left
frib value of 110110011 has an extra 0, giving (11 11) 11 as the order is different
to the right side. The second example in (5.5) is more straightforward, as both sides
are the same.
F0 F1 Result F0 F1 Result
A 11 ∧ 1 1 A 11011 ∧ 11 0
(5.4)
B 11 ∧ 1 1 B 11 ∧ 11011 0
11011 ∧ 11 0 110110011 ∧ 11011011 1
F0 F1 Result F0 F1 Result
A 11 ∧ 11 0 A 1101 ∧ 1101 1
(5.5)
B 1 ∧ 1 1 B 1101 ∧ 1101 1
1101 ∧ 1101 1 1101001101 ∧ 1101001101 0
5.4.2.3 Reduction
Concatenating states will affect the size of the result states, as they will get larger
after each operation. Therefore at a predefined point, the states need to be reduced.
Reducing the size of a frib requires information about all fribs. A separate server,
known as the reduction server, is used where all N servers send their fribs to during
the reduction step. Once the server has received each frib for a given value, it uses
a lookup table to retrieve the reduced fribs for each server. However if each frib was
sent to and returned from the reduction server in the current format, then some of the
data can be decoded. The reduction servers have no knowledge of the program being
run over the data, meaning any bits they can decode may still be worthless.
Distributing encoded data for private processing in the cloud 103
5.4.3 Addition
The addition of two 32-bit integers can be achieved with 31 full-adders and a single
half-adder. A full-adder comprised of NAND gates can be seen in Figure 5.3. To get the
best performance for our proposed scheme, we must reduce the number of network
N2 N6
Ai
N1 N4 N5 N8 Oi
Bi
N3 N7
Ci
N9 Ci+1
requests required by combining many reduction requests into a single request. First
we compute all values for N4 , which for worst-case, where both Ai and Bi are 1, gives
10111001101. The frib therefore can grow up to 10 bits during this step. We can then
combine all 32 fribs for N4 into a single network payload and send them to be reduced
to single bits.
How the carry bits are reduced can vary depending on implementation, however
we will allow the fragments to grow as large as needed for this step. If a limitation is
applied, more reduction requests will be needed. Because the first bit does not have
an input carry value, N9 input is N1 and N1 (equates to !N1 ). The other carry bits
involve gates N1 , N4 , N5 and N9 , where the result from N9 is connected to the next
bits N5 gate. Given that N4 will be a single bit, and that the worse case value for N1 is
11, each carry step will at most add 5 bits to the frib (11010). We only need a single 0
between each operation because we know the order is continuous. For example if the
carry output for the second bit is 1101011011, we know the order of operations is
(110(10(110(11)))).
This results in a worse-case frib size of 155 bits (16 × 10-bit values). We then
send these carry-bit fribs to be reduced, meaning we now have single bit values for
all N4 and N9 gates, allowing us to compute all N8 gates with a maximum frib size
of 10 bits again. This only totals three reduction requests.
5.4.4 Multiplication
Binary multiplication can be thought of as a series of AND operations, all added
together. Equation (5.6) shows an example of multiplying 5 and 11 on an 8-bit
machine. For each bit in 11, we AND it with each bit in 5, giving 8 values. Adding
each value together gives 55.
00000101
× 00001011
00000101
0000101
000000
00101 (5.6)
0000
000
00
+ 0
00110111
To make the additions more efficient, we add together the biggest and next biggest
values, then the next pairing, down to the smallest and second smallest. This is shown
in (5.7).
00000101 0000 000000 00
+ 0000101 + 000 + 00101 + 0 (5.7)
00001111 0000 001010 00
Distributing encoded data for private processing in the cloud 105
5.4.5 Conditional
Supporting an operation to compare two values can dramatically affect the security of
a secure processing scheme. For example if a group of cipher values only encrypted
the set {0, 1}, then the ability to calculate if two cipher values are equal will result in
two subgroups of cipher values. Where one subgroup must contain either encrypt a 0
or 1, the other subgroup must encrypt the opposite. However because our proposed
scheme has the bits fragmented across many servers, all the servers must compute over
the same instruction set. This prevents a compromised server trying to compare all the
fribs it has, as the other fragment servers would need to be doing the same malicious
action. Therefore our scheme has the ability to support conditional operations, which
can be implemented to return the result in either a secure or nonsecure manner.
5.4.5.1 Secure results
Returning results securely means the result is a fragmented bit, where <1 frag-
ment server has knowledge of the result. This can make some programs difficult
to implement as the result of the comparison is not known. Two examples are given
in Algorithms 2 and 3, for an equal and greater than or equal if statement. For both
examples, we have to increment c without knowing the result of the comparison.
5.4.5.2 Nonsecure results
Instead of returning a fragmented bit, this approach returns the whole bit by using a
different lookup table than for a standard operation. This allows each server to know
the result of the conditional statement, making programs easier to design and in some
cases compute faster.
1: if a = b then
2: c←c+1
3:
4: function ifEqual(a, b)
5: m←a−b
6: inout ← 0
7: carry ← 0
8: for i ← 0 to 32 do
9: tmp ← m[i] + inout + carry
10: inout ← tmp & 1
11: carry ← tmp >> 1
12: return !(inout | carry)
13: c ← c + (1 × ifEqual(a, b))
1: if a >= b then
2: c←c+1
3:
4: function ifGreaterEqual(a, b)
5: sign_neq ← a[31] ˆ b[31]
6: c←a−b
7: return (!sign_neq & !c[31]) | (sign_neq & !a[31])
8: c ← c + (1 × ifGreaterEqual(a, b))
when choosing server locations, as the bandwidth between data centres varies. The
cloud service providers used for our experiment/evaluation were Amazon Web Ser-
vices, Microsoft Azure and Google Cloud Platform. All instances were running with
the cheapest tier option and based in the United States.
The server configuration was a single reduction server and two fragment servers.
The reduction server was in California with Amazon, a fragment server was also
in California but with Microsoft and the final fragment server was in Iowa hosted
by Google. We used a proof-of-concept addition algorithm with a 27-bit maximum
fragment size which required 9 reductions, and averaged the time for 100 addi-
tion operations with 32-bit unsigned integers. The latency at the time of testing was
3.106 ms for Azure–Amazon, and 37.414 ms for Google–Amazon.
Our results produced an average of 346 ms for each addition operation. This is
directly proportional to the largest latency time, where 37.414 × 9 ≈ 346 − (some
small computation times). Therefore if all the fragment servers could be within
10 ms round trip from the reduction server, then addition times could be 99.274 ms.
Distributing encoded data for private processing in the cloud 107
The latency figures of Azure to Amazon could result in 37.228 ms. Allowing for a
larger fragment size would also increase the performance. For example, if only five
reductions are required for an addition, then we can nearly half the completion time.
Currently, multiplication is reliant on addition, and in Section 5.4.4 we showed that
for 32-bit integers, five addition steps are required. Therefore, we can look at the
addition results to solve for the multiplication results. These performance numbers
are much faster than FHE schemes described in Section 5.2.1 and show the potential
of distributing encoded data.
A B C D E F G H I J K L M
7.61 1.54 3.11 3.95 12.62 2.34 1.95 5.51 7.34 0.15 0.65 4.11 2.54
N O P Q R S T U V W X Y Z
7.11 7.65 2.03 0.10 6.15 6.50 9.33 2.72 0.99 1.89 0.19 1.72 0.09
different values increases. Looking at the bottom two entries in Table 5.7, helloworld
and purringcat, these are actually hello world and purring cat, respectively. Therefore
given an encoded string of two five-letter words, there are still words of length 10
which give a potential match.
Analysing the frequency of a bin occurring gives an estimation of the letters
mapped to it. For example, given the letter frequencies in Table 5.8 [48], if a bin occurs
at relatively small frequency, then it is more likely to contain letters that also have a
smaller frequency. This also gives a reduction in bin combinations for an malicious
user to try, because certain combinations do not fall within the estimated frequencies
calculated. Figure 5.4 shows the difference between the estimated frequency obtained
from counting bins in the index and the actual frequency of the letters in each bin.
These results show that with enough encoded documents indexed, it is possible to
predict within ±2.5% of the actual letter frequency. They also show that a smaller
number of bins for the index is harder to estimate. Figure 5.5 shows the distribution of
100 million unique random bins for a 3-bin configuration, giving a total of 300 million
bins, each containing 8–9 letters. The average summed frequency for a bin is around
33.3%, even though the English letter frequencies vary. Therefore when generating the
bin mapping, we can check if the bin frequencies are within the majority of all possible
Distributing encoded data for private processing in the cloud 109
20
13 Bins
% of bins generated 15 3 Bins
10
0
–2.5 –2.0 –1.5 –1.0 –0.5 0.0 0.5 1.0 1.5 2.0 2.5
Frequency % difference
3
% of bins
0
0 20 40 60 80 100
Sum of letter frequencies in a bin (%)
bins combinations. For example, using three bins the scheme might only accept bins
with frequencies between 20% and 46%. This means that even if a malicious user
knows the frequency of each bin to ±2.5%, it still gives over 20% of all possible bin
combinations. Note this experiment contained only bins of even size (8–9 letters),
where an implementation having a variation in the number of letters per bin would
have more possible bin combinations.
Given a Bin Encoded index, it is currently possible to rebuild the encoded version
of the document. However because there are no spaces, breaking the index with a
dictionary attack becomes harder than breaking the individual queries. Because FRIBs
does not expose character patterns, it is harder again, showing how computationally
difficult distributed encoding is to break with only one set of the data.
to name a few examples. However some exploits are out of a cloud services control,
for example, a zero-day vulnerability in their operating system. To reduce this risk, a
mixture of Linux and Microsoft servers can be used, such that any one vulnerability
cannot exploit every server. When a user/business is choosing the service providers,
they should also seek information regarding security measures in place. A common
approach is looking at a list of standards the service is compliant with.
International standards are now emerging for organisations using or providing
cloud services, with ISO/IEC 27018:2014 (with ISO/IEC 27002 as one of its norma-
tive references) being the first International code of practice that focuses on protection
of Personally Identifiable Information (PII) in the cloud. This increases the security of
their service, while providing more trust to their users. The cryptography recommen-
dations/requirements described in ISO/IEC 27018:2014 are the objectives specified
in clause 10 of ISO/IEC 27002. Examples are provided for use of cryptography in
the cloud, but at a minimum a cloud should implement controls for confidentiality
(data encrypted when stored or transmitted) and authentication. However there is no
mention of true secure processing like homomorphic encryption. To try and protect
data being processed, access controls are recommended. This makes it more difficult
for rogue employees or outside attackers to gain access to data in-flight. Therefore
by conforming with ISO/IEC 27018:2014 will reduce the chance of a breach, and
applying our scheme will enhance the security already provided.
End users also have control over their security, as the more distributed servers
used, the smaller the risk of their data being compromised. Ten servers will give more
security over five, but the running costs increase. Therefore evaluating against this
attack vector is implementation dependent. Bin Encoding and FRIBs can protect data
from rogue employees and malicious users who break into a few systems, but the
cloud service providers need to try protect the data they store as well.
80
Encryption
60 Proof
Time (s)
40
20
0
32 128 512 2048
Size of public key (bits)
With the simplicity offering superior performance and additional features, distributed
encoding for secure processing offers an exciting alternative to homomorphic encryp-
tion and traditional MPC with garbled circuits. Distribution allows Bin Encoding to
remove false positives in string searching without compromising security, and FRIBs
the ability to protect and process data. Future work aims to further improve the strength
of both schemes and provide more analysis on their security. Combining the schemes
is also an option for more advanced searching and returning accurate results quickly.
Bin Encoding can narrow the number of potential matches, before FRIBs performs
an accurate search.
These schemes provide secure functionality in a fraction of the time compared
to the holy grail of cryptography (i.e. FHE), reducing processing time from hours
to seconds. By allowing for varying performance and security, users can now take
control of their data in the cloud.
112 Data security in cloud computing
Acknowledgements
This research is supported by STRATUS (Security Technologies Returning Account-
ability, Trust and User-Centric Services in the Cloud) (https://stratus.org.nz), a science
investment project funded by the New Zealand Ministry of Business, Innovation and
Employment (MBIE).
References
[45] Kocher P, Jaffe J, Jun B. Differential power analysis. In: Advances in Cryp-
tology – CRYPTO’99. Santa Barbara, California, USA: Springer; 1999,
pp. 388–397.
[46] Gandolfi K, Mourtel C, Olivier F. Electromagnetic analysis: Concrete results.
In: Cryptographic Hardware and Embedded Systems – CHES 2001. Paris,
France: Springer; 2001, pp. 251–261.
[47] Ko RK. Cloud computing in plain English. ACM Crossroads 2010;16(3):5–6.
[48] Solso RL, King JF. Frequency and versatility of letters in the English language.
Behavior Research Methods & Instrumentation 1976;8(3):283–286.
[49] Board IA. Ethics and the Internet; 1989. Online. https://www.ietf.org/rfc/
rfc1087.txt (Accessed 04/12/14).
[50] Python NGram 3.3 Documentation. Online. https://pythonhosted.org/ngram/
(Accessed 04/02/15).
[51] Streetman BG, Banerjee S. Solid State Electronic Devices, vol. 5.
Prentice Hall, New Jersey; 2000. Online. http://trove.nla.gov.au/work/
8960772?selectedversion=NBD22063993.
[52] Boneh D. Twenty years of attacks on the RSA cryptosystem. Notices of the
AMS 1999;46(2):203–213.
[53] Zhang Y, Juels A, Reiter MK, Ristenpart T. Cross-VM side channels and their
use to extract private keys. In: Proceedings of the 2012 ACM Conference on
Computer and Communications Security. Raleigh, NC, USA: ACM; 2012,
pp. 305–316.
[54] 109582 English words; 1991. Online. http://www-01.sil.org/linguistics/
wordlists/english/wordlist/wordsEn.txt (Accessed 04/12/14).
[55] FAST Search Server 2010 for SharePoint;. Online. http://technet.microsoft.
com/en-us/library/gg604780.aspx (Accessed 07/01/15).
[56] Pering T, Agarwal Y, Gupta R, Want R. Coolspots: Reducing the power
consumption of wireless mobile devices with multiple radio interfaces. In: Pro-
ceedings of the 4th International Conference on Mobile Systems, Applications
and Services. Uppsala, Sweden: ACM; 2006, pp. 220–232.
This page intentionally left blank
Chapter 6
Data protection and mobility management
for cloud
Dat Dang1 , Doan Hoang1 , and Priyadarsi Nanda1
Abstract
Cloud computing has become an alternative IT infrastructure where users, infrastruc-
ture providers, and service providers all share and deploy resources for their business
processes and applications. In order to deliver cloud services cost effectively, users’
data is stored in a cloud where applications are able to perform requests from clients
efficiently. As data is transferred to the cloud, data owners are concerned about the
loss of control of their data and cloud service providers (CSPs) are concerned about
their ability to protect data when it is moved about both within and out of its own
environment. Many security and protection mechanisms have been proposed to pro-
tect cloud data by employing various policies, encryption techniques, and monitoring
and auditing approaches. However, data is still exposed to potential disclosures and
attacks if it is moved and located at another cloud where there is no equivalent security
measure at visited sites.
In a realistic cloud scenario with hierarchical service chain, the handling of
data in a cloud can be delegated by a CSP to a subprovider or another. However,
CSPs do not often deploy the same protection schemes. Movement of user’s data is an
important issue in cloud, and it has to be addressed to ensure the data is protected in an
integrated manner regardless of its location in the environment. The user is concerned
whether its data is located in locations covered by the service level agreement, and
data operations are protected from unauthorized users. When user’s data is moved to
data centers located at locations different from its home, it is necessary to keep track of
its locations and data operations. This chapter discusses data protection and mobility
management issues in cloud environment and in particular the implementation of a
trust-oriented data protection framework.
Keyword
Data mobility, data protection, cloud mobility, data location
1
University of Technology Sydney, School of Computing and Communications, Australia
118 Data security in cloud computing
6.1 Introduction
Cloud computing has been introduced as a new computing paradigm deployed in many
application areas ranging from entertainment to business. It offers an effective solution
for provisioning services on demand over the Internet by virtue of its capability of
pooling and virtualizing computing resources dynamically. Clients can leverage cloud
to store their documents online, share their information, consume, or operate their
services with simple usage, fast access, and low cost on a remote server rather than
physically local resources [1]. In order to deploy cloud services cost effectively, users’
data is stored in a cloud where applications are able to perform requests from clients
efficiently. As data is transferred to the cloud, data owners are concerned about the
loss of control of their data, and cloud service providers (CSPs) are concerned about
their ability to protect data effectively when it is moved about both within and out
of its own environment. CSPs have provided security mechanisms to deal with data
protection issues by employing various policy and encryption approaches; however,
data is still exposed to potential disclosures and attacks if it is moved and located
at another cloud where there is no equivalent security measure at visited sites [2].
Consequently, data mobility issues have to be addressed in any data security models
for cloud.
In a realistic cloud scenario with hierarchical service chains, handling of data in a
cloud can be delegated by a provider to a subprovider to another. However, CSPs do not
often deploy the same protection schemes [2]. Movement of user’s data is an important
issue in cloud and it has to be addressed to ensure the data is protected in an integrated
manner regardless of its location in the environment. When user’s data is moved to
data centers located at locations different from its home, it is necessary to keep track
of its locations. This requirement should be taken care of by a user-provider service
level agreement (SLA). The reporting service can be achieved through an agreement
that enables a monitoring service from the original cloud. When a violation against the
established SLA occurs, the monitoring component will be able to detect it through
the corresponding CSP and trigger protection services on the original cloud, which
can immediately analyze and audit the data. Moreover, data locations need to be
maintained at the original cloud and encoded within the data itself, in case it loses the
connection with its monitoring service. Under such circumstances, the location data
can be used subsequently to trace previous locations and data operations or trigger
original mobility services (MSs) if data is moved back to its original cloud. For the
users, they are concerned whether their data is located in locations covered by the
SLA, and data operations are protected from unauthorized users. Mechanisms should
be available to report information concerning the data, such as the data location, data
operation, or violation, to their owner if and when necessary.
Protecting data in the outsourced cloud requires more than just encryption [3]
which merely provides data confidentiality, antitampering, and antidisclosure. The
key to mitigate users’ concern and promote a broader adoption of cloud computing is
the establishment of a trustworthy relationship between CSPs and users. For users to
trust the CSPs, users’ data firstly should be protected with confidentiality maintained,
and no one should be able to disclose data-sensitive information except the authorized
Data protection and mobility management for cloud 119
users. Second, any actions or behaviors on the data should be enforced and recorded
as the attestation to avoid false accusation to data violation. Once the breach against
the SLAs subscribed between CSPs and users occurs, the attestation can be used
as a proof that the CSPs violate the agreed-upon service level and, consequently,
appropriate compensation may be offered to the users. Finally, the data should be able
to preserve itself independently in a heterogeneous cloud environment with diverse
protection frameworks.
This chapter discusses the above concerns from a novel perspective, offering a
data mobility management (DMM) model with enhanced transparency and security.
In this model, an active data framework that includes a location data structure and a
location registration database (LRD) to deal with mobility; protocols between clouds
for data movement are examined; procedures for establishing a clone supervisor at a
visited cloud for monitoring purposes are investigated; and a MS to handle requests
for moving the data among clouds is deployed. The data mobility model focuses on
two aspects: DMM and active protection. The DMM deals with physical location
changes of user’s data in various cloud environments and ensures these locations be
registered with the LRD and recorded within the data structure itself. The destination
of a move operation is also verified to ensure that the new physical location is within
the preestablished SLA. The active protection aspect deals with extending data pro-
tection when the data is moved to various locations. The active data unit [referred to
as an active data cube (ADCu)] with a new location-recordable structure allows the
data itself to keep record of data operations, their request parameters, and locations.
Furthermore, the LRD is also used to store changes of data location and other informa-
tion allowing data owners to track their data if necessary. The protocols are designed
to deal with data movement from one cloud to another. The process of moving data
involves a verification process and a MS at the original cloud. A clone supervisor is
established at the visited cloud with the same capability as the original supervisor for
monitoring operations on the data. The MS queries LRD to register the location of
the data when it is first generated and update its location when it is moved to a new
location. The chapter presents a comprehensive analysis of data mobility scenarios
in various cloud environments. It presents the design and the implementation of the
proposed DMM model as well as the evaluation of the implemented model.
The rest of the chapter is organized as follows. Section 6.2 discusses data mobility
scenarios, components, and a framework for mobility in cloud. Section 6.3 reviews
security mechanisms for data-in-transit. Section 6.4 presents the design, implemen-
tation, and evaluation of a trust-oriented data protection and mobility management
model. The conclusion is in the Section 6.5.
Table 6.1 Terms and description of components in the data mobility model
Terms Descriptions
information among CSPs because of the lack of models to ensure data protection and
data auditing.
Clearly, mobility management is one of the most important challenges in mobile
IP [5]. When a mobile is roaming away from its home network, it registers its current
location to its home agent on home network. Similarly, when a data unit moves from
its original cloud, similar mechanisms should be provided for the data to inform its
original cloud its current locations and/or the data owner its status if necessary. The
data itself, however, cannot execute these actions. For cloud data mobility, we leverage
the ideas from mobile network about location register for a mobile by using a LRD
located at original cloud for updating or retrieving data locations, and a recordable data
structure for recording cloud host location in the data itself when there is a data request
operation. Moving data to a new cloud environment, however, has to involve both the
data and its supervisor (in cellular networks only the mobile phone is involved). In
this model, a clone supervisor is established for monitoring the moved data and data
operations. A verification procedure will process data requests at the original cloud
for access permissions; however, the mobility management model may also delegate
the verification and access control to the visited cloud. With the deployment of the
supervisor, data protection can be achieved despite the fact that data is located at
visited cloud side.
S
DF
2 Request to establish new supervisor Original cloud
gT
New cloud
vin
3 Inform message for establishing new supervisor
mo
est
4 Move the TDFS
qu
Re
1
TDFS
● Data location management: The supervisor also reports timely data locations to
the original cloud as well as updating the LRD as data itself cannot send location
information to original cloud, but it is able to record current location for tracing
operations. It analyzes networking location in order to obtain the current network
address and send back to original cloud, which is set as default source address in
the report message.
The new supervisor will be responsible for monitoring the TDFS at the new cloud
as well as communicating with the original cloud as the new cloud does not provide
the same data protection services.
If the request originated from an entity in the original cloud, the destination
address of the new cloud where data is moved to has to be provided with the request.
When the destination address is identified, the original cloud can communicate
with the new cloud. In both scenarios, security procedures are performed by the
access control component, the Cloud-based Privacy-aware Role Based Access Control
(CPRBAC), and the auditing component, the Active Auditing Control (AAC) together
with the associated supervisor. Having satisfied with the conditions for moving the
requested data, the original cloud sends a request in order to establish the clone super-
visor at the new cloud. By doing this, the link between the supervisor and its data is
still kept when data is stored at another location. Details of the procedure (Figure 6.2)
are as follows:
The service request from the new cloud or from within the original cloud is
sent to the original cloud. This request is authorized by the access control and the
auditing control components of the original cloud. If it is a valid request having correct
required parameters, the supervisor is triggered to analyze data operations. As it is
a “move” request, the supervisor has to communicate with mobility component for
establishing a clone supervisor at the new cloud. Invalid requests against predefined
policies will be triggered by the data protection unit (APDu) and captured by the
auditing component. Violation triggers in the auditing may also be triggered to assist
the security administrator to execute some certain security prevention operations.
The mobility component at the original cloud sends a request to the visited cloud
for the permission to install a new service. The new cloud also verifies and evaluates
the request to see if it can create the service. If an agreement is in place to create a
new supervisor between two clouds, a confirmed message will be sent from the new
cloud to the original cloud. At this step, necessary information for creating the clone
supervisor will be supplied to the new cloud. Once the new supervisor is created, the
mobility component invokes instructions to move the data. Hence, the new location
of TDFS is also updated at LRD.
In the second case, data at an old cloud is moved to a new cloud and original cloud
will process request verifications. The procedure is similar but the verification request
needs to be forwarded from the old cloud to the original cloud for authorization.
After analyzing the request, the original cloud sends a request to the new cloud for
establishing a new supervisor at the new cloud. Once, the new service is established,
the original cloud will inform the old cloud to move TDFS. Otherwise, original cloud
will invoke a message for old cloud to terminate the request. Figure 6.3 depicts
procedures of the second data-moving case.
In the third case, data at an old cloud is moved back to the original cloud; and
in the last case, data at a cloud is moved to local storage (offline) with or without
permissions. Therefore, the procedure for establishing supervisor is not performed at
this stage, but the moving request has to be authorized at original cloud in order to
approve for the permission.
isor ’s information
supervisor or send superv
3a. Request to establish new Mobility
ervisor management
3b. Inform message for establishing new sup component
1a. M
ove T Trusted security
DFS
management
2a. Verify component
itor
New cloud 1a. Req
2c. Establish supervisor
Mon
uest mo request
ving
ing
Cloud CPRAC
est mov
interface
r
ge Supervisor
Trig
.
2b
1b. Requ
AAC
Original cloud
123
124 Data security in cloud computing
r
iso Original 2. R
uperv e
mov quest t
hs cloud ing o
is perm verify
abl
est issio
n
es t to Old cloud
equ for isor 4. In
a. R
ge
ssa per
v mov form m
ess
FS e
3 me w su eT
TD mov
m e
termin DFS o age to Supervisor
for ing n ate r
t to
I n reque
3b. ablish
ues
New st
t
Req
es 1a. Request moving TDFS
cloud
1b.
TDFS
5. Move TDFS
Figure 6.3 General procedure in establishing supervisor from old cloud to new
cloud
ing
r 1. Request to verify mov
v iso Original permission to new clou
d
super cloud
ish
abl
FS
st e TD FS or
oe sage to mov
TD
est
t 3. Inform mes
est
equ
ess
requ
R for isor term inate
2a. ge
Acc
v
e ssa super
m w
r m n e
fo ing
New . In Local disk
2b ablish
cloud est 4. Move TDFS
Figure 6.4 Details of processing moving request from old cloud to original cloud
For the last case, the regular verification procedure is still executed at the original
to obtain the move permission. However, an exception will be raised when the original
cloud detects a data move without a request or it cannot communicate with destina-
tion address provided in the request to establish supervisor. In this case, the current
supervisor has to report the current TDFS location to the original cloud for updating
the LRD as well as triggering the data to update this address in its core component
before moving. From this point, the TDFS has to record data access locations in its
location list whenever users access the data even in the offline mode. The moving
operation occurs when TDFS is stored at the original cloud or at visited cloud. Figure
6.4 depicts procedures of the last moving case.
and data mobility based on location register database. In this section, we will present
a review of these solutions for data mobility and data protection.
The users can create a request to read, modify, or even transfer the data via mobile
applications. When a request was raised, a message from a trace service will notify
the data owner that his/her private data was accessed. This approach only considers
the case that the data is accessed by the data owner, and there is no data movement
among clouds. Especially, a request to send data is simply performed by creating a
copy of the data. In other words, the data sent by user will not be traced by the service.
Shell Core
Verifier & Identifier
Executable segment
Logger
Header
Probe
Data blocks
Communicator
for performance consideration. Once the data operation finishes, the logger leverages
the communicator in the shell to upload the log records to ADCu’s external supervisor.
However, a log record marked with an emergency tag will be immediately triggered
by the probe, which then notifies the communicator to raise an exception. Time Stamp
uses the Network Time Protocol to take into account the fact that cloud resources may
be distributed across different time zones. Each ADCu’s log information is transpar-
ent to its data owner. When the log records are stored in cloud, they are encrypted
using the RSA encryption to avoid possible information leakage. Only the data owner
has the corresponding key to disclose those records. In addition, sending out the
log information of data usage rather than storing it inside the ADCu is activated to
maintaining the light-weight feature. Increasing log information could raise the cost
of storage, dissemination, or replication of the ADCu.
Each ADCu has a corresponding supervisor deployed in the same domain, which
takes charge of monitoring external data operations (such as move and copy) that
cannot be detected by the internal probe inside the ADCu, and communicating with
its ADCu. If the ADCu cannot establish a proper network connection or cannot contact
its supervisor, it would switch to the termination state to avoid an offline attack.
Once the verification and identification procedure succeeds, the shell delegates
control to the data core. The core of ADCu is wrapped by an ES, a header, and
data blocks. We leverage dynamic delegation approach in the ES to call back the
shell to trigger the core and execute data operations. The header refers to a manifest
specifying the basic information of supplementary data residing at the beginning of
data blocks.
As the ADCu can only protect and manage the contents within, it cannot be
triggered if adversaries attempt to execute operations such as move or delete the
whole data cube. The supervisor thus is designed to monitor operational system level
data manipulations that cannot be detected by the ADCu itself. An atomic data security
model, called the active data protection unit (ADPu), is needed for supporting the
active protection functions of the ADCu within the protection framework. The APDu
is an atomic component combining two entities: a supervisor and an ADCu.
environment. It supports the management of data and data locations as well as ensures
protection data at different clouds. The model is designed to achieve the following
objectives:
We focus on providing a data mobility solution for cloud data moving among clouds
while ensuring data protection, auditing relevant data locations and accessed data
operations. The new features of our proposed model include an active data framework
with appropriate data structure and a LRD to deal with mobility; protocols between
clouds for data movement; procedures for establishing a clone supervisor at the visited
cloud for data protection purposes. In particular, there will be a MS agent responsible
for updating and retrieving data location from the LRD. Figure 6.6 depicts the model
and its four core components: (1) the data core protection component, (2) the mobility
management component, (3) the trusted security management component, and (4) the
cloud interface.
● The data core protection component: This component is designed to enable active
surveillance, smart analysis, self-protection, and self-defense capabilities directly
on to the data itself. The data is transformed and encapsulated in a TDFS that sup-
ports confidentiality, integrity, and intrusion tolerance [6]. To support mobility, a
special encrypted element namely recordable mobility data (RMD) is designed to
record user’s operations when they access the data. Only the data owner who has
Data protection and mobility management for cloud 131
Shell
Active
TDFS Header Header
auditing
Data blocks Data blocks
control
Core
Same working domain
the decryption key can trace previous data operations. The management and coor-
dination of cloud data for each tenant is processed by the supervisor, whose active
service instance is activated when the corresponding tenant subscribes to a cloud
storage service. Several supervisor service instances can be deployed to deal with
a large number of requests from diverse virtual servers or machines (VMs). An
atomic APDu contains an active data cube (the TDFS) and its supervisor.
● The mobility management component: This component includes the MS and the
LRD. It aims to store and manage information about the supervisor and the TDFS
at original cloud. The component centers around the location registration proce-
dure when the TDFS is moved by maintaining connections with its responsible
supervisor.
The MS is responsible for creating queries to the LRD. When the data is created, the
supervisor invokes the MS to update the information about the TDFS in the LRD. In
addition, the MS also supports the establishment of the new supervisor at the visited
cloud.
The LRD stores the TDFS information related to data location, data operations,
and data owner for data-status-monitoring purposes. The visitor location register
database (VLRD) is located at the visited cloud and structured similar as LRD with
some additional fields presenting for the location of visited cloud. When a TDFS
is subscribed to a cloud, it needs to register and is allocated a supervisor that is
responsible for the data welfare including monitoring and raising an alarm if illegal
data operations are detected. Therefore, whenever a TDFS moves out of its original
cloud, the supervisor will invoke a query to extract information from the database
necessary for the establishment of the clone supervisor at the new cloud. Apart from
132 Data security in cloud computing
the data, there are also tables holding system data, including information about servers
that are allowed to connect to the system. In the design phase, we designed database
tables to achieve following functions:
New data subscription: This function allows users to subscribe information about
their data such as data owner and profile data.
Updating changes of location: This function supports the location update proce-
dure when there are requests from the MS to update data locations. Despite the fact
that data is stored at a visited cloud, its location is still updated if there are requests
to move the data to a new cloud.
Retrieving VLRD lists: This function allows the LRD to locate the VLRD that
holds the current location information of the TDFS so that the location management
can be utilized.
Providing system data: This function allows the MS to access system data infor-
mation about LRD and VLRD. As a result, the CSPs are able to identify cloud hosts
and the location register database.
For TDFS location registration at a visited cloud, the VLRD is used for storing
TDFS locations when the TDFS moves from its original cloud to a visited cloud. The
VLRD is the other location register used to retrieve information about data location
when data is stored at the visited cloud. Because a TDFS can be moved anywhere in
clouds of different infrastructures, mobility management is very essential. When a
TDFS moves about, location registration for tracking and tracing purposes is always
needed. Therefore, if visited locations of TDFS are stored and managed as a dis-
tributed database system, the MS at visited cloud can query directly the VLRD rather
than the home LRD.
When a request to access the data at the visited cloud is permitted, the MS will
insert or update the VLRD. If the TDFS location is not stored in the VLRD, the request
is forwarded to original cloud where the MS queries the LRD. When it is authorized,
a new record will be added to the VLRD. When a TDFS visits a new cloud from
the old cloud, the registration process in the new VLRD is as follows: (1) the MS
sends a request to the new visited cloud in order to register its information in the new
VLRD; (2) the new VLRD informs the MS’s LRD of the TDFS’s current location,
the address of the new cloud; (3) the MS’s LRD sends an acknowledgment including
TDFS’s profile; (4) the new VLRD informs the TDFS of the successful registration;
and (5) the LRD sends a deregistration message to the old VLRD and the old VLRD
acknowledges the deregistration.
recording audit data created as the attestations, the CSPs can report the evidences
of data violations to their users. The users are more inclined to adopt the cloud
solution for their businesses as they can establish more acceptable SLA with their
subscribed CSPs in a firmed trustworthy relationship. The auditability can be
achieved by the AAC.
● Cloud interface: Cloud interfaces provide data service interfaces to access active
data in cloud storage systems. It forwards requests with parameters to security
management component to verify access permission.
Data blocks
Recordable mobility data
reducing the risk of use by adversaries. The data core protection block employs an
active security approach whereby the data is made active against any invocation,
whereas the data security control block and data operation and management block
support this active approach by providing data auditing and other measures including
secure access control, data replication and mobility.
The management and coordination of cloud data for each tenant are processed
by the supervisor, an active service instance that is activated when the corresponding
tenant subscribes to cloud storage services.
In this framework, two services are proposed for data security control: the
CPRBAC service and the AAC service. The CPRBAC service defines and describes
the security boundary on data operations in cloud. Resource requests that are not
allowed in the policy repository will be rejected. The AAC is introduced to execute
and audit users’ requests in a secure and consistent manner. Users’ requests must be
actively audited under a distributed transaction management session.
6.4.3 Implementation
6.4.3.1 Data structure design
Data is structured utilizing the active data-centric framework [6]. A special structure
called recordable mobility data (RMD) is designed to record information associated
with users’ access request. The information includes Subject_ID, Data_ID, Operation,
Time Stamp, and Cloud location. This information is transparent to its data owner.
In other words, it is invisible from users’ data operation. The stored information is
encrypted using the RSA encryption to avoid possible information leakage. Only the
data owner has the corresponding key to disclose them for tracing previous operations.
The new TDFS structure is shown in Figures 6.8 and 6.9.
Data protection and mobility management for cloud 135
TDFS
TDFS_ID varchar(10)
Description varchar(30)
File_size numeric(10, 2)
Data OwnerUser_ID integer(10)
SupervisorSupervisor_ID integer(10)
Data Owner
Supervisor User_ID integer(10)
Supervisor_ID integer(10) Username varchar(20)
Description varchar(30) Password varchar(20)
Role integer(10)
Location Registration
Id integer(10)
Address varchar(20)
TimeStamp date
Operation varchar(10)
TDFSTDFS_ID varchar(10)
needed for tracking and tracing purposes. Therefore, if visited locations of TDFS
are stored and managed as a distributed database system, the MS at visited cloud
queries directly the VLRD rather than the LRD at the original cloud. A record in
the VLRD composes of seven fields including TDFS_ID, Supervisor_ID, Operation,
Source Address, Current Address, Time Stamp, Data Owner. VLRD’s data structure
inherits from LRD with an additional field as Current address. Current address A set
of CIP = {cip1 , cip2 , …, cipi , …, cipn } and VLRD table set L = {tni , spi , ipi , cipi , opi ,
tsi , owi }.
6.4.3.3 Data mobility management workflows
When a customer subscribes to a cloud service, the CSP will assign roles associated
with the data for users, allowing them to access a virtual user directory and workspace.
An initial set of empty active data cubes will be created according to the regular data
types. After assigning roles, the supervisor will be invoked to send a request to the
MS for data location registration. The request containing parameter such as UserID,
DataID, Location, and Time Stamp will be processed to update to the database. Finally,
the user will receive a data location registration acknowledgment message via the MS.
Figure 6.11 shows the workflow for a new data location registration.
When a user needs to execute data operations such as read, insert, write, and
move the data, he/she will send a request to the cloud interface including a set of
parameters such as UserID, DataId, Operation and Location, and Time Stamp. In
Data protection and mobility management for cloud 137
1 Cloud 2 3 Mobility
CPRBAC
service Supervisor service
service
Interface service
Data is
registered
User is informed timely about data 5 at location
database
Verification
process at Verification Mobility
A user service in
original service
requests data original cloud
Verifying cloud side
operations
through the request 4
cloud service at cloud side
Cloud 2 3
1 Verify Supervisor
service
request service LRD
Interface
5
Recording
Notifying user 6 data
location TDFS
Query data
location
Data location
operator LRD
2
A request
Update
of data
VLRD
location
4
3
1
Mobility Upload data Visitor
service location location
database
5 Location 6
message
service
User is informed
data locations
6.4.4.2 Evaluation
In our previous ADC framework, a data-moving operation was simply performed
between two hosts in the private cloud environment. One host is to create a request
to move the data, whereas to the other is to provide various data access scenarios to
ADCu in the same cloud. In this section, we deploy various data-moving cases among
clouds to address not only data mobility issues but also privacy and security. Requests
will be created from both inside and outside the original cloud. Through authorization
and authentication scheme based on the CPRBAC service and the LRD at the original
cloud, any data-moving operations will be verified regarding user’s permissions or
data locations. The implementation of TDFS is based to our previous work [4] and
added new fields to support data mobility.
We assume that data owners do not release sensitive information to unauthorized
parties including secret keys that were used to generate signature and encrypt data
and personal privacy. We assume that the LRD and CPRBAC services are trustworthy
and behave correctly.
1) tLookup service is the time that client spent looking up server’s RMI interface and
sends the request.
2) tVerification and mobility is the security and MS latency. The CPRBAC and Location
register will be processed during this period.
3) tData operation is the data transfer time between the original cloud and the visited
cloud.
From the results, it is clear that the processing time for a TDFS is slightly longer
than that for a normal file (1765.25 ms in comparison with 1755.4 ms of the latter).
The main components of the whole moving process are the lookup service time and
the moving data time. The verification and MS time, the main process of the model,
however, only constitutes a small amount of time (353.93 of 1765.25 ms for TDFS
and 351.98 of 1755.4 ms for normal file, respectively). The comparative results are
illustrated graphically in Figures 6.14 and 6.15. The x-axis represents the file size and
the y-axis represents the execution time.
From our experiments, we found that the verification time and the lookup service
of the model are approximately same for both TDFS and normal files. Therefore, the
source of extra delay must be introduced by the transfer time. Hence, we run the same
Data protection and mobility management for cloud 141
request with different data sizes. The results show that the transfer data time is indeed
the significant source of latency.
This means the data protection and MS do not introduce significant overheads
when the data size was increased. Overall, the amount of overheads is considered as
a small price to pay for security and data protection. With our proposed model, the
supervisor can trigger TDFS into an active state to self-protection; self-defense and
record accessed addresses, whereas these operations on cannot perform normal files.
Figure 6.16 shows alerting messages for move requests as received by the user via
the mobile phone in the move process. The message includes essential information
about the moving process such as file name, moving addresses, time. It is demon-
strated that the supervisor can detect data move operation and invoke the MS to send
the alerting message to user’s mobile device immediately when there is a request to
move the data at both the original cloud and visited cloud.
Data-moving cases test among different cloud
In the following experiments, we deploy the model for different data-moving cases for
ADCu described in Section 6.2. Requests were created from both inside and outside
the cloud for different data-moving cases. Our primary goal is to demonstrate the
efficiency, security and transparency of the proposed model. In addition, we recorded
and compared the verification and MS service duration of requests from inside and
outside the cloud to identify the source of latency. The results of outside requests
within different data-moving cases are compared in Figures 6.17 and 6.18.
As can be seen from Figure 6.17, the verification and MS costs of the second case
and the third case are substantially higher in comparison with that of the first case (the
average times are 837.6 ms, 705.5 ms, and 353.9 ms, respectively) whereas there is
only a small difference in the costs of the last two cases. This can be explained by the
142 Data security in cloud computing
fact that for the last two cases, the move process requires two stages: sending requests
to the old cloud and relaying these requests to the original cloud. All requests have
to be forwarded to the original cloud which is responsible for verifying the requests
and executing the MS even when requests are created from original cloud (the third
data-moving case). We also recorded the response time of verification and MS by
requests from inside cloud to compare with that of requests from another cloud. The
requests from inside the cloud can be created from two sources. One is from inside
the original cloud to move the data to a new cloud and the other is from a visited cloud
to move the data to new cloud or back to the original cloud. As the transferring data
costs are similar for both two kinds of request within moving cases on the same data.
In fact, the verification and MS cost is also the same for an inside request at visited
cloud for the second and third data-moving case. Thus, we only record the average
response time for these two kinds of request (251.4 ms and 612.6 ms, respectively).
Similarly, the average response times of the second and third data-moving case are
about double that of the first data-moving case due to the relay process.
The overhead for data retrieval from a TDFS involves verification and identifi-
cation, data loading, and network communication cost. To evaluate the costs of the
verification and MSs, we run a number of tests. A total of 50 working threads are
allocated to generate user requests in parallel. Hence, we initialized six sets of tests
Data protection and mobility management for cloud 143
Figure 6.19 Time cost of verification and mobility services for different
data-moving cases and requests
in which the number of requests ranged from 50 to 500, and each set of test runs
five times circularly. Figure 6.19 illustrates the time cost of executing time cost of
verification and MSs for different data-moving cases and requests. The x-axis rep-
resents the number of requests, and the y-axis represents the execution time. As we
can observe, along with the increasing of the number of requests, the time cost of
verification and MSs increases obviously. As we observe, the average time of inside
requests for the first data-moving case (47920.1 ms) is similar to that for the outside
requests (48074.55 ms). This is repeated similarly for the second data-moving case
and the third data-moving case. However, the average time of requests for the second
data-moving case and the third data-moving case (115285.8 ms) is approximately
twice than that for the first data-moving case (47997.34 ms). It can be explained that
the requests have to be forwarded back to the original cloud for executing verifica-
tion and MSs. Generally, the time cost grows linearly with the increase of traffic
requests. This indicates that the DMM model can efficiently avoid an exponential
increase in response time when handling multiple requests. Replicas of the services
to geographically separate hosts may be used to achieve load balancing and improve
performance.
Figure 6.20 presented the data-moving operations for different data-moving cases.
All data-moving cases will be triggered by the supervisor. The first one is triggering
the MS and TDFS when data is moved from the original cloud to the visited cloud;
the second one is triggering the MS and TDFS when the data at visited cloud to a new
cloud; the third one is triggering the MS and TDFS when the data at visited cloud
is moved back to the original cloud; and the last one is triggering the MS and TDFS
when data is moved out from cloud. In all these test cases, the supervisor was able
detect and trigger the MS and the TDFS when data is moved. Meanwhile, our mobile
device received the alert message immediately. A message including source address
and destination address actively notified the data owner when data was moved to a
new cloud.
Performance analysis: Our model does not rely on complex algorithm and encryp-
tion requirement. The performance of our model in terms of the computation overhead
is as follows:
Verification service: The available computation overhead on these two processes
includes access request generation, policy matching, verification between access
144 Data security in cloud computing
request and targeted policy, calculating verification token, and access response gen-
eration. Among these operations, the policy matching complexity is O(N); verification
complexity relies on the number of context variables required to analyze; other
operations complexity are O(1).
Mobility service: The MS creates requests to query the database, which only adds
negligible overhead to database as it does not incur initialization cost and database join
algorithms. It only takes charge of triggering task, the other fetching data locations
will be handled by the database instance.
6.5.1 Discussion
The last section focuses on our DMM model and the trust-oriented data protection
framework. This section discusses several assumptions and measures that can be
improved and extended in the future.
When an ADCu moves from its original cloud to another cloud, the responsibility
to protect the data depends on the SLAs between the data owner and its original
cloud provider as well as between the original cloud provider and the visited cloud
provider. The mechanisms to deal with security and protection can be complex and
vary depending on the agreements and assumptions. If we assume that the original
cloud is mainly responsible for its registered data, then the tasks of authentication and
authorization at a foreign cloud have to be done at the original cloud. This implies
that the clone a supervisor established at the visited cloud has little responsibility as it
146 Data security in cloud computing
will pass any requests to do with its data to the original cloud to deal with. However,
if we assume that the new cloud will be mainly responsible for its visited data, the
supervisor has to be generated with adequate capabilities. In this work, we assume a
light-weight supervisor as it passes requests to the original cloud.
The assumptions also affect the performance of the moving process. As in our
cases, requests to move data always have to be relayed back to the original cloud and,
hence, the MS to processing time and the verification time can be doubled or tripled
depending whether the move involves two or three clouds. If the authentication and
authorization processes are delegated to the visited cloud, these processing times can
be reduced.
The way the active ADCu communicates with other entities can be selectively
designed. The ACDu when moved to a visited cloud may or may not be able to com-
municate with its owner. Even if it may, the owner may not wish to be disturbed
unnecessarily, and some acceptable communications mechanisms should be devel-
oped. It is important that the ACDu must be able to protect itself and to communicate
with some reliable external entity if the protection model is to be credible.
Clearly, those assumptions can be relaxed and the basic model can be extended
to deal with various situations and provide more comprehensive data mobility and
protection management.
6.5.2 Conclusion
This chapter discusses data protection and mobility management in cloud environ-
ments. It also presents an active framework for data protection and an extended
trust-oriented framework for data mobility and protection for handling secure data
mobility in a cloud environment that involves data moving within and among the orig-
inal cloud and visited clouds. It also proposed a novel LRD that is capable to serve
for tracing and tracking data locations. Furthermore, a new TDFS structure with
recordable structure was designed to actively capture locations of requests. More
importantly, a proposed establishing supervisor at visited cloud is able to deploy the
equivalent data protection scheme at both cloud side to achieve an intrusion tolerant
scheme. The experimental outcomes demonstrate feasibility, efficiency of the model.
Further, the reliability of the system is guaranteed in terms of processing time.
References
[1] L. Schubert and K. Jeffery, “Advances in clouds,” Report of the Cloud
Computing Expert Working Group. European Commission, 2012.
[2] I. Foster, Z. Yong, I. Raicu, and L. Shiyong, “Cloud Computing and Grid Com-
puting 360-Degree Compared,” in GCE’08 Grid Computing Environments
Workshop, 2008, pp. 1–10.
[3] A. Juels and A. Oprea, “New approaches to security and availability for cloud
data,” Commun. ACM, vol. 56, pp. 64–73, 2013.
Data protection and mobility management for cloud 147
[4] L. Chen and D. B. Hoang, “Active data-centric framework for data protection in
cloud environment,” in ACIS 2012: Location, location, location: Proceedings
of the 23rd Australasian Conference on Information Systems, 2012, pp. 1-11.
[5] L. Jae-Woo, “Mobility Management Using Frequently Visited Location
Database,” in Multimedia and Ubiquitous Engineering, 2007. MUE’07.
International Conference on, 2007, pp. 159–163.
[6] L. Chen and D. B. Hoang, “Active data-centric framework for data protection in
cloud environment,” in ACIS 2012: Location, location, location: Proceedings
of the 23rd Australasian Conference on Information Systems 2012, 2012, pp.
1–11.
[7] A. Albeshri, C. Boyd, and J. G. Nieto, “GeoProof: Proofs of Geographic Loca-
tion for Cloud Computing Environment,” in 32nd International Conference on
Distributed Computing Systems Workshops (ICDCSW), 2012, pp. 506–514.
[8] K. Benson, R. Dowsley, and H. Shacham, “Do you know where your cloud
files are?,” presented at the Proceedings of the 3rd ACM workshop on Cloud
computing security workshop, Chicago, Illinois, USA, 2011.
[9] Z. N. J. Peterson, M. Gondree, and R. Beverly, “A position paper on data
sovereignty: the importance of geolocating data in the cloud,” presented at the
Proceedings of the 3rd USENIX conference on Hot topics in cloud computing,
Portland, OR, 2011.
[10] T. Ries, V. Fusenig, C. Vilbois, and T. Engel, “Verification of Data Location in
Cloud Networking,” in Fourth IEEE International Conference on Utility and
Cloud Computing (UCC), 2011, pp. 439–444.
[11] S. Betge-Brezetz, G. B. Kamga, M. P. Dupont, and A. Guesmi, “Privacy
Control in Cloud VM File Systems,” in Cloud Computing Technology and
Science (CloudCom), 2013 IEEE 5th International Conference on, 2013,
pp. 276–280.
[12] A. Noman and C. Adams, “DLAS: Data Location Assurance Service for
cloud computing environments,” in Tenth Annual International Conference
on Privacy, Security and Trust (PST), 2012, pp. 225–228.
[13] E. Androulaki, C. Soriente, L. Malisa, and S. Capkun, “Enforcing Loca-
tion and Time-Based Access Control on Cloud-Stored Data,” in IEEE 34th
International Conference on Distributed Computing Systems (ICDCS), 2014,
pp. 637–648.
[14] A. Squicciarini, S. Sundareswaran, and D. Lin, “Preventing Information Leak-
age from Indexing in the Cloud,” in Cloud Computing (CLOUD), 2010 IEEE
3rd International Conference on, 2010, pp. 188–195.
[15] S. Sundareswaran, A. Squicciarini, D. Lin, and H. Shuo, “Promoting Dis-
tributed Accountability in the Cloud,” in IEEE International Conference on
Cloud Computing (CLOUD), 2011, pp. 113–120.
[16] S. Sundareswaran, A. C. Squicciarini, and D. Lin, “Ensuring Distributed
Accountability for Data Sharing in the Cloud,” IEEE Transactions on Depend-
able and Secure Computing, vol. 9, pp. 556–568, 2012.
[17] S. Kamara and K. Lauter, “Cryptographic Cloud Storage,” in Financial Cryp-
tography and Data Security. vol. 6054, R. Sion, R. Curtmola, S. Dietrich,
148 Data security in cloud computing
A. Kiayias, J. Miret, K. Sako, and F. Sebé, Eds., ed: Springer Berlin Heidelberg,
2010, pp. 136–149.
[18] N. Virvilis, S. Dritsas, and D. Gritzalis, “Secure Cloud Storage: Available
Infrastructures and Architectures Review and Evaluation,” in Trust, Privacy
and Security in Digital Business. vol. 6863, S. Furnell, C. Lambrinoudakis,
and G. Pernul, Eds., ed: Springer Berlin Heidelberg, 2011, pp. 74–85.
[19] Y. Shucheng, W. Cong, R. Kui, and L. Wenjing, “Achieving Secure, Scalable,
and Fine-grained Data Access Control in Cloud Computing,” in Proceedings
of IEEE INFOCOM, 2010, pp. 1–9.
[20] A. Juels and J. Burton S. Kaliski, “Pors: proofs of retrievability for large files,”
presented at the Proceedings of the 14th ACM conference on Computer and
communications security, Alexandria, Virginia, USA, 2007.
[21] K. D. Bowers, A. Juels, and A. Oprea, “Proofs of retrievability: theory and
implementation,” presented at the Proceedings of the 2009 ACM workshop on
Cloud computing security, Chicago, Illinois, USA, 2009.
[22] H. Shacham and B. Waters, “Compact Proofs of Retrievability,” presented
at the Proceedings of the 14th International Conference on the Theory and
Application of Cryptology and Information Security: Advances in Cryptology,
Melbourne, Australia, 2008.
[23] M. Kallahalla, E. Riedel, R. Swaminathan, Q. Wang, and K. Fu, “Plutus:
Scalable Secure File Sharing on Untrusted Storage,” presented at the Proceed-
ings of the 2nd USENIX Conference on File and Storage Technologies, San
Francisco, CA, 2003.
[24] E.-J. Goh, H. Shacham, N. Modadugu, and D. Boneh, “SiRiUS: Securing
Remote Untrusted Storage,” NDSS, vol. 3, pp. 131–145, 2003.
[25] S. Wang, D. Agrawal, and A. E. Abbadi, “A comprehensive framework
for secure query processing on relational data in the cloud,” presented at
the Proceedings of the 8th VLDB international conference on Secure data
management, Seattle, WA, 2011.
[26] M. W. Storer, K. M. Greenan, E. L. Miller, and K. Voruganti, “POTSHARDS:
secure long-term storage without encryption,” presented at the 2007 USENIX
AnnualTechnical Conference on Proceedings of the USENIXAnnualTechnical
Conference, Santa Clara, CA, 2007.
[27] L. Chen and D. B. Hoang, “Addressing Data and User Mobility Challenges
in the Cloud,” in IEEE Sixth International Conference on Cloud Computing
(CLOUD), 2013, pp. 549–556.
[28] M. C. Mont, S. Pearson, and P. Bramhall, “Towards accountable manage-
ment of identity and privacy: sticky policies and enforceable tracing services,”
in Database and Expert Systems Applications, 2003. Proceedings. 14th
International Workshop on, 2003, pp. 377–382.
[29] P. Maniatis, D. Akhawe, K. Fall, E. Shi, S. McCamant, and D. Song, “Do you
know where your data are?: secure data capsules for deployable data protec-
tion,” presented at the Proceedings of the 13th USENIX conference on Hot
topics in operating systems, Napa, California, 2011.
Data protection and mobility management for cloud 149
Abstract
7.1 Introduction
Traditionally, the security perimeter in network infrastructure has defined the bound-
ary between the private and the public networks. Everything inside the perimeter is
trusted, and everything outside the perimeter is untrusted. This perimeter deploys
various security tools and technologies to protect users’ data and applications from
different types of attacks from the untrusted zone. Although this seems straightfor-
ward, the evolution of network, network technologies and new attack vectors have
meant that protecting network infrastructure has become increasingly hard. According
to SafeNet 2014 Survey [1], although 74% of IT decision-makers trust their organi-
zation’s perimeter security, yet about half (44%) of them admitted that their perimeter
has been breached or did not know if it had been breached. With cyber-attacks on
1
Cyber Security Lab, University of Waikato, New Zealand
152 Data security in cloud computing
the rise, the cost of IT security is becoming an increasingly heavy burden for every
organization. According to the latest Gartner report [2], the worldwide spending on
information security was $75.4 billion in 2015, which will jump to $101 billion in
2018 and soar to $170 billion by 2020.
The modern network has become very sophisticated and expansive as more and
more potential entry points are added into the network. With the wide use of Bring-
Your-Own-Device (BYOD) and cloud computing, the traditional network perimeter
is unable to protect against malicious software and unauthorized users accessing the
private network and data [3]. In 2014, the Cloud Security Alliance (CSA) launched the
software-defined perimeter (SDP) project, with the goal of defending against network-
based attacks by reducing the perimeter surface [4]. This protocol was previously only
used by US government agencies such as the Department of Defense, SDP only allows
TCP connections from preauthorized users and devices based on the “need-to-know”
model [5].
At the time of writing of this chapter, very little material about SDP was available.
This chapter therefore attempts to provide a clear understanding of SDP and put it in
the context of cloud-based perimeter security. We go over the current state of the art
in perimeter security and try to explain SDP and its architecture to our readers in a
lucid manner. We will use the SDP specification 1.0 [5] as our main source of infor-
mation. The chapter is structured as follows: Section 7.2 introduces the idea of SDP.
It also describes some similar products and software-defined systems for the cloud.
Section 7.3 gives a detailed description of the SDP and how its different components
work together. We describe the SDP architecture, configurations and workflow before
explaining the communication that takes place between various components of SDP.
In Section 7.4, we discuss why SDP is able to provide high levels of perimeter security
before finally concluding it in Section 7.5.
PKI IPS
Internal
network
IDS
VPN
Border routers
an internal network from the outside network. It may also provide a set of firewall
functions that can block external access while allowing internal users to connect to
outside network. Overall, a network perimeter enables users to manage their network
and have better control of its security. Usually, the perimeter network is bounded
by firewalls, but the firewall is not the only component in the perimeter. A series
of systems may be applied to define a network perimeter, such as border routers,
firewalls, intrusion detection systems, intrusion prevention systems, virtual private
network (VPN), public key infrastructure (PKI). We discuss these systems below.
7.2.1 Firewalls
A firewall is a network security system which is used to monitor and control traffic
in a network. The first generation of firewalls was based on packet filtering, which
inspects the network addresses and port numbers to determine if the packet is allowed
or blocked. The user can set up rules based on network addresses, protocols and
port numbers. When packets go through the firewall, they are tested against filtering
rules. If the packets do not match allowing rules, they are dropped. The second
generation of firewalls was called “stateful” firewalls, which maintain the state of
all packets passing through them. With such stateful information about packets, the
firewalls could assess whether a new packet is the start of a new connection or a
part of an existing connection [6]. The stateful firewalls can speed up the packet
filtering process. When a packet has been inspected as part of an existing connection
based on a state table, it will be allowed without further analysis. If the packet has not
been recognized as an existing connection, it will be inspected as a new connection by
firewall security rules. Application layer firewalls are the third generation of firewalls
having the ability to understand certain applications and protocols as they operate and
can be used to detect the misuse of certain protocols such as HTTP, FTP and DNS [7].
154 Data security in cloud computing
Proxy servers can also be used as firewalls. A proxy server acts as an intermediary
for traffic between two specific networks [8]. Both sides of the network need to build
connection through the proxy so that the proxy firewall can allow or block traffic
based on different security rules.
Although firewalls can be used as an efficient security control for network security
infrastructures, some experts feel it has become outdated as the networks have become
more complicated [9]. There are many entry points and many access control roles
based on different users in the modern network. Thus, modern network needs other
newer techniques and methods working together to make it secure.
Internet VPN
Regional
office
Internet Regional
office
Regional
office
Remote
users
compromises an authenticated user’s machine, the VPN will give the attacker an
encrypted channel to connect into the network perimeter. VPN is unable to protect
against such attacks which are on the rise now [10]. Furthermore, the capital costs
associated with VPN gateways setup for individual servers are considerably high [4].
integrated through a common information model that stores the necessary network
information [13]. The other main component in a DEN is a set of policies. In a dis-
tributed network, a network manager needs policies to control all resources. In general,
a policy defines which resources or services can be accessed by a corresponding user.
With DEN, an application on the network can provide different resource access privi-
leges to authorized users automatically. Any DEN-enabled application in the network
can know the detailed information about the network from the central directory. When
a user wants to access an application, the application should check the privilege of the
user and provide suitable service according to the role of the user. The abstraction of
a user through profiles helps DEN to provide correct access to users even when they
change locations or positions within the company. Microsoft Windows NT operating
system is an example of the application of DEN.
7.2.5.2 BeyondCorp
BeyondCorp is a model proposed by Google, which aims to improve the security of
the traditional network architecture [14]. BeyondCorp breaks the traditional security
perimeter that only allows internal users to access specific services from an internal
network. Instead, in BeyondCorp, the access solely depends upon the device and
the user credentials, irrespective of the location of the user [14]. The BeyondCorp
system does not differentiate between local network and the internet and applies the
same authentication and authorization processes for every user. This ensures that no
firewalls or VPN-like systems are needed. BeyondCorp uses the concept of a managed
device, wherein every device is explicitly managed by the enterprise. Only managed
devices can access services. Along with managed devices with verified identities, a
single sign-on system is used for all the users using those devices.
change the location of the perimeter; in fact, the location of the applications, servers
and data might not even be known to provide an effective perimeter. Thus, techniques
used for traditional perimeter security are not enough to protect application in today’s
network infrastructure due to their dependency on physical network devices.
The principles behind SDP are not entirely new. Top security organizations have
previously implemented similar network architectures based on authentication and
authorization before the application data exchange [4]. SDP, however, is the first
approach to bring those ideas in public domain. The objective of SDP is to give
application owners the ability to define and deploy perimeter functionality according
to the requirements. With SDP, every server or application is initially hidden and
invisible to users who want access [4]. Users need to be authenticated and provide
identity in order to be authorized to access protected services. On the other hand,
servers’ services also need to be authenticated before users are able to access them.
It means that SDP maintains the benefits of the need-to-know model and do not
require a remote access gateway. In SDP, endpoint users need to authenticate and to
be authorized before the real data exchange. Once both sides have been allowed to
connect to each other, encrypted connections are created between requesting hosts
and the application servers.
SDP controller
Accepting SDP
Initiating SDP host
host
of as clients, whereas AHs accept connections and thus are akin to servers. Before
a connection can be made, an IH communicates with the SDP controller to request
the list of the available AHs that it wishes to connect with. The SDP controllers may
request device and user information from a database for identification and authenti-
cation before providing any information to the IH. The controller communicates with
AHs periodically to update them with the authorized IHs. AHs reject all connections
from all hosts except controllers and IHs which have already been authenticated by
controllers. SDP can be viewed as an advanced access control system which brings
a lot of benefits. SDP preauthentication mechanism enables the servers to remain
invisible to unauthenticated users and systems. This characteristic isolates servers
which can help to defend server exploitation. SDP’s end-to-end control contributes
to preventing connection hijacking. In contrast to a blanket access like in VPNs, SDP
provides access for specific authorized users to get specific applications on servers.
It means SDP architecture does not provide network connectivity to every port on the
server. Only the authorized services or applications can be connected to, from the
client side. This feature of SDP architecture makes it more secure than VPN.
Authentication,
Location IdP PKI
authorization and access MFA
service service service Servers
control services
providing
services
Authenticating hosts
(AHs)
Controller
Clients (IHs)
Cloud
for authentication. Accepting hosts communicate with the controller but do not
respond to any other unauthenticated entity. The specification does not mention
how the servers are authorized by the AHs in the client-to-gateway configu-
ration; however, this will need to be done prior to the AH connecting to the
controller. When the AH communicates with the controller, it also needs to make
the controller aware of the services that are behind it.
3. An initiating host that wants to avail a service first needs to be authenticated and
registered with the controller.
4. Once the controller authenticates an IH, it selects which appropriate servers can
provide the required service to the IH and which AHs do they belong to.
5. The SDP controllers inform AHs to accept connection from the authorized IH.
6. The SDP controllers send a list of AHs; this IH can communicate along with any
other information or optional policies needed for communication.
7. The IH starts to communicate with AHs by establishing a mutual TLS connection
with each AH to perform secure communication. Note that an IH does not com-
municate directly with a server in the client-to-gateway configuration instead the
communication is through the AH that acts as a proxy.
verification is successful. The SPA protocol is based on the HOTP protocol defined
in RFC 4226 [18].
In SDP, an SPA packet is sent from the client to the server. Server in this case can
either be the controller or the AH. The server does not need to respond to this packet.
The format of the SPA packet is shown in Figure 7.6. Here, IP and TCP fields are
the IP and TCP headers; AID is the agent ID or the universal identifier of the client
wishing to communicate to the server. The counter is the variable input which changes
in each packet and helps to protect against replay attacks and the OTP, created using
the HOTP protocol. The OTP can be created by simply hashing the secret seed with
the variable counter. As the seed is secret and the counter is variable, this creates a
unique secret password for every packet. The SPA packet in the SDP spec 1.0 does
not provide much implementation detail. The SPA implementation of fwknop [17],
however, randomizes the packet by adding 16 bytes of random data to each packet.
In addition, to protect the integrity of the packet, it creates an MD5 digest over the
entire packet to ensure the packet is not modified before it reaches the recipient. SPA
provides the following benefits to an SDP architecture:
1. It blackens the server (either AH or the controller). The server does not respond to
any connection attempt from anyone unless a valid SPA packet has been received.
2. This helps in mitigating DoS attacks as the server doesn’t spend any resources in
processing spurious packets.
3. It helps in detecting attacks as any connection attempt that does not start with an
SPA packet can immediately be classified as an attack.
AH Controller
SPA packet
Services message
Format Example
{“services”: {“services”:
[ [
“port” : <Server port>, “port” : 443
“id” : <32-bit Service ID>, “id” : 12345678,
“address” : <Server IP>, “address” : 100.100.100.100,
“name” : <Service name> “name” : SharePoint
[ [
{ {
0×01 Status code (16 bits) AH session ID (256 bits) 0×07 Mux ID (64 bits)
(b) Login response message (g) Open connection request message
0×02 No command-specific data 0×08 Status code (16 bits) Mux ID (64 bits)
0×04 JSON formatted array of services 0×09 Data length (16 bits) Mux ID (64 bits)
(d) AH service message (i) Data message
(e) IH authenticated message, sent from controller to AH (j) Closed connection message
IH Controller
Open TCP connection
SPA packet
Services message
where sid is the session ID for the IH, and id is the service ID of the service the IH
wants to use. Seed and counter fields will be used by the AH to validate the IH SPA
packet.
Format Example
{“services”: {“services”:
[ [
“address” : <AH IP>, “address” : <200.200.200.200>,
“id” : <32-bit Service ID>, “id” : <12345678>,
“name” : <Service name>, “name” : <SharePoint>,
“type” : <Service type> “type” : <https>
[ [
{ {
send a logout request message to the controller to indicating it wants to quit the SDP.
There is no response from the controller for this request, and the format is as shown
previously in Figure 7.9(c). When the IH has been authenticated and is logged in,
the controller needs to inform the IH of the available services and the IP addresses
of the servers offering those services. The controller does this by sending a service
messages to the IH. The format of this message is shown in Figure 7.9(f). The JSON
formatted array can hold information about multiple services. An example with one
service is shown in Figure 7.11.
7.3.5.5 IH–AH protocol
Once the AH is online and knows the services, it can offer and the IH has been
authenticated by the controller and has received a list of services and AHs which
offer them, they are ready to perform the client–server communication. In the IH–
AH protocol, the communication is always initiated by the IH. The initial message
exchange during the IH–AH protocol is similar to the ones in AH-controller and IH-
controller protocols. The IH opens a TCP connection and sends an SPA packet to get
authenticated by the AH. Once the authentication is successful, IH and AH perform
the mutual TLS handshake. This will allow both IH and AH to send and receive data
securely. The IH then sends an open connection request message as shown in Figure
7.9(g) to the AH to signal its request for a service. The message consists of an 8-bit
command and a 64-bit mux ID. The 64-bit mux ID is used to multiplex connections
for multiple services between the same IH–AH pair. The leading 32 bits in the mux ID
come from the 32-bit unique service ID of the service the IH is requesting. The trailing
32-bit identify a unique session ID for the session between the IH and the AH. The AH
then responds with an open connection response message which indicates whether
the connection was successful or not. The message format is shown in Figure 7.9(h).
Once a connection has been established, IH and AH can begin data transmission to
each other. The data transmission on either side is done with the help of the data
message, whose format is shown in Figure 7.9(i). This data message will precede the
actual payload, and it consists of a 16-bit field indicating the size of the payload to
follow, along with the command and the Mux ID. The data message does not need a
response or an acknowledgment message. If any of the IH or the AH needs to close
166 Data security in cloud computing
IH AH Service
Open TCP connection
SPA packet
Data message
<Raw data>
Connection closed
the connection a closed connection message is sent. The closed connection message
is the last message of an SDP connection. The receiver of this message closes the
connection without any further response. The format of the message is shown in
Figure 7.9(h). The messages sent between IH and an AH that is protecting a service
requested by the IH is shown in Figure 7.12.
In this section, SDP security features and components are discussed as follows:
Zero visibility—SDP was designed to mitigate network-based attacks by dynam-
ically employing software-based security perimeter to protect any kind of network,
including clouds, demilitarized zones, data centers and even small networks [19]. So
far, SDP has proved to be a good defense against network attacks under simulated tests
[19,20]. In various hackathons conducted by CSA so far, no breach has been possible.
Each of the four hackathons conducted by CSA has been centered around specific
themes such as insider threats, DDoS attacks, credential theft and high-availability
public cloud. In each of the scenarios, the attackers were provided information which
Understanding software-defined perimeter 167
7.5 Conclusion
The scenarios in the SDP hackathons mimic the real-world scenarios very closely and,
therefore, it can be said with a high degree of confidence that the SDP architecture
provides an easy-to-deploy and still highly secure way of protecting network infras-
tructure. Having said that, it must be kept in mind that SDP has not been deployed
widely yet and public data about attacks on real-world deployments of SDP does
168 Data security in cloud computing
not exist. There is considerable amount of excitement in the cloud and open source
community, however, regarding SDP which is seen as a potential solution to the cloud
perimeter security problem. At the time of writing this chapter, the first open source
implementation of the client, the gateway and the controller for the client-to-gateway
SDP configuration was available [21].
The security afforded by SDP, however, comes at a cost. The security cost comes
in the form of connection delay introduced by SDP and reduced throughput. As
explained in Section 7.3.5, before a client can establish a secure connection with a
server in SDP’s client-to-gateway configuration, it has to get authenticated first by
the controller and then by the AH. Each of these authentications has three stages
where the client first opens a TCP connection then gets authenticated using SPA and
then performs a mutual TLS handshake. This has the potential to add significant
a connection delay, which will be exacerbated by the network latency of the cloud
itself. The other form of cost in SDP is the reduced throughput. The data message
in the current version of SDP is an 88-bit message, which precedes the transmission
of data. This along with mTLS’s own overhead and the communication between IH
and controller and the IH and AH to establish secure communication contributes in
reducing the throughput of the SDP communication. The overhead and the maximum
payload can give a rough idea about the throughput of SDP. The additional overhead
and in turn reduced throughput caused by the SDP therefore cannot be ignored.
Thus, although SDP is capable of providing unbreakable (as of yet) security, any
implementation of SDP will need to consider the effects of both connection delay and
reduced throughput.
The analysis and understanding of the SDP presented in this chapter are based
on SDP specification version 1.0 [5] which was available at the time of writing this
chapter. The SDP working group of the CSA is currently working on specification
version 2.0, which will possibly be more detailed and will focus on protecting services
in IoT and cloud computing.
References
[1] SafeNet Inc, Oct. 2014, Baltimore, Available: http://www.safenet-
inc.pt/news/2014/data-security-confidence-index-results/ [retrieved February
21, 2017].
[2] S. Moore, “Gartner Says Worldwide Information Security Spending Will Grow
4.7 Percent to Reach $75.4 Billion in 2015” Gartner, 23 September 2015,
Available: http://www.gartner.com/newsroom/id/3135617 [retrieved February
21, 2017].
[3] Z. Dimitrio and L. Dimitrios, “Addressing cloud computing security issues,”
Future Generation Computer Systems, vol. 28, no. 3, pp. 583–592, 2012.
[4] Software Defined Perimeter Working Group, “Software Defined Perime-
ter,” December 2013. Available: https://downloads.cloudsecurityalliance.
org/initiatives/sdp/Software_Defined_Perimeter.pdf [retrieved February 21,
2017].
Understanding software-defined perimeter 169
Abstract
Transportation Cyber-Physical Systems (TCPS) strive to achieve the seamless inter-
operability of a rich set of sensors embedded in vehicles, roadside units, and other
infrastructure with computing platforms ranging from smartphones to cloud servers,
through a variety of communication mechanisms. A successful TCPS will provide
smart and scalable solutions to some of the major problems urban societies facing
today including high fatalities in road crashes, time and emission costs of traffic con-
gestion, and efficient allocation of parking spaces. However, the practicality of such
a TCPS is challenged by (1) stakeholders with different and often conflicting security
and privacy requirements, (2) the demands of real-time data intensive computing and
communication, and (3) a high level of heterogeneity in the types of technologies
deployed.
Transportation Cloud Computing, which is the integration of Cloud Comput-
ing with TCPS, is a promising solution to the challenges listed above for a scalable
implementation. This chapter presents the security, trust, and privacy issues posed
by integrating cloud computing with TCPS as in the first challenge above. We will
survey the state of the art with respect to countermeasures which are capable of provid-
ing improved security and privacy for cloud computing in TCPS. More specifically,
we will first discuss the unique challenges and the current state of the art in TCPS as
well as the integration of cloud computing techniques into a TCPS application sce-
nario. Next, we will present a comprehensive literature review on attack surface and
strategies for cloud computing in TCPS. To address these attacks, we will describe
various techniques to enhance security, trust, and privacy to help better safeguard
cloud computing paradigms for TCPS.
1
New York Institute of Technology (NYIT), Department of Computer Science, USA
2
New York Institute of Technology (NYIT), Department of Electrical and Computer Engineering, USA
172 Data security in cloud computing
8.1 Introduction
There are over 1.2 billion vehicles on the roads worldwide, and the number of electric
vehicles passed the 1 million mark in 2015 [1]. Transportation provides the backbone
of the modern society, yet it has also been taking its toll on the environment. Moreover,
traffic accidents have claimed 1.25 million lives every year since 2007, being the top
cause of death among people aged 15–29 years [2]. Transportation is one of the most
important application domains of Cyber-Physical Systems (CPS), which are systems
in which physical components are tightly coupled with cyber components, such as
networking, sensing, computing, and control.
The application of CPS in transportation sector, called transportation CPS
(TCPS), will transform the way people interact with transportation systems, just
as the Internet has transformed the way people interact with information. A desirable
CPS must be safe, secure, and resilient to a variety of unanticipated and rapidly evolv-
ing environments and disturbances in addition to being efficient and inexpensive to
maintain. Today each vehicle is equipped with 70–100 sensors [3]. For instance, one
TCPS application utilizing these sensors will be vehicle prognostics, which will allow
vehicle owners to assess how long the parts in their vehicles will last by the continuous
monitoring and detailed reporting of the state of each part via the on-board prognos-
tics systems [4]. This technique will allow vehicle owners to be more confident that
their cars will not have any issue the next time they are behind the wheels. Moreover,
drivers today can get real-time traffic information and can avoid traffic jams so that
they can predict their schedule precisely. More importantly, the drivers can be alerted
to an accident occurring in front of them by other vehicles or roadside units (RSUs)
nearby so that they can avoid a fatal accident. The drivers can reserve parking spots
ahead of time and can be directed to available parking spots as they become available
in real time [5].
The effective operation of a TCPS is based on data intensive computing. By
using a network of remote servers to store, manage, and process data rather than
a local server or a personal computer, cloud computing has the potential to enable
transportation CPS to operate in a desirable manner.
However, to reap the benefits of cloud computing, there are some challenges
which must be addressed. One of the key challenges in making TCPS and Trans-
portation Cloud Computing (TCC) ubiquitous is keeping the drivers, passengers, and
pedestrians safe and secure, while preserving their privacy.
In addition, TCPS can generally be used to help enhance the safety and efficiency
of the ground transportation system. One such example is the Traffic Estimation and
Prediction System (TrEPS), which generally offers the predictive information that is
required for proactive traffic control [6]. However, sometimes the TrEPS may receive
confusing or even conflicting traffic information reported by multiple traffic sensors,
some of which may have been compromised by attackers. If the trustworthiness of
these traffic sensor data cannot be properly evaluated, then it is possible to result
in traffic jams or even life-threatening accidents because most of the vehicles may
be incorrectly redirected to the same route according to the fake traffic information.
Thus, it is critical to ensure the security and trustworthiness of the TCPS.
Security, trust, and privacy for cloud computing in TCPS 173
In this chapter, first the general components of TCPS and TCC are introduced.
Then, attack surfaces in TCPS are discussed in depth, followed by the current state-
of-the-art security mechanisms for the TCPS. Later, both trust and privacy techniques
are evaluated against the unique challenges posed in the TCPS context.
as a bridge between the CAN and LIN buses. Since LIN hardware can cost much less
than CAN hardware, this topology reduces the overall system cost.
8.2.3.3 Flexray
Flexray [22] was proposed for X-by-wire systems such as steer-by-wire for higher
speed and fault-tolerance. FlexRay is a multi-master bus and it uses Time Division
Multiple Access (TDMA) to assign fixed time slots to each node in the guaran-
teeing bandwidth for each device. FlexRay also has two channels for redundant
communication.
part of a database software offering. Services provided within the transportation cloud
regardless of whether the vehicles are provider, consumer, or both can be classified
into the following categories [27,28]:
8.3.1.1 Network-as-a-Service (NaaS)
Many vehicles today have Wi-Fi and cellular connectivity. Although, they may not
utilize this connectivity all the time at their full capacity. Similarly, RSU can provide
Internet connection to the cars that are passing by or parked as a service. The vehicles
with excess bandwidth they would like to share, and the RSU’s providing Internet
connection can advertise these network resources. Nearby vehicles that are in need of
Internet access can purchase Internet access from these providers. Furthermore, the
vehicle providing Internet connectivity, and the vehicle which needs the connectivity,
may be out of range of each other. In such a case, intermediate vehicles can act as
relays to carry traffic between the provider vehicle and the consumer vehicle, thus
providing another network service [29].
8.3.1.2 Sensing-as-a-Service (S2 aaS)
Today vehicles are equipped with many sensors such as GPS, temperature sensors,
dashboard cameras. The data from these sensors can be useful for nearby users, as
well as any entity who has an Interest for the information related to the geographic
area a vehicle is in, regardless of where the entity is located in the world. For example,
a nearby pedestrian may use his or her mobile phone GPS to get location informa-
tion. However, GPS can drain the phone battery quickly. Instead, the pedestrian can
collaborate with a nearby vehicle with GPS and save power [30].
When news breaks, the media and law enforcement agencies may want to have
eyes on the scene rapidly, but may not have nearby camera crew that they can deploy
quickly. In such a case, these agencies can tap into the dash cam of vehicles in the
area to reach to visual information quickly.
8.3.1.3 Computation-as-a-Service (CaaS)
Intra-vehicle computers from manufactures such as Nexcom, Logic Supply, or
Acrosser have comparable computing power to the desktop computers [3]. Yet, espe-
cially private vehicles are parked most of the time, leaving these resources idle. It
is conceivable that vehicle owners may want to make the computing power in their
vehicles to be made available as part of a CaaS framework.
Similarly, a vehicle may have computational needs that can be offloaded to a more
traditional cloud, where the vehicle is a consumer for the CaaS. Bitam and colleagues
proposed the ITS-Cloud, where conventional cloud computing resources provide the
backbone of the computing services, and the vehicular computing resources (includ-
ing on-board computers as well as the mobile devices in the vehicle, e.g., mobile
phones provide a temporary cloud) [31,32].
8.3.1.4 Storage-as-a-Service (StaaS)
Storage can be required for keeping results of large calculations, multimedia files,
as well as other resources. Unlike other services, storage requirements can be long
term [33], thus their cloud model could be different than other services. Nevertheless,
178 Data security in cloud computing
there are potential applications that StaaS can benefit from. One such example is given
by Yu et al. [34], where the need of StaaS is outlined for Video Surveillance. They
reported that many city buses are equipped with in-bus monitoring high-definition
cameras. These cameras generate large amount of data daily, which is expensive to
store on-board due to disk cost and physical space. These videos are only available to
the decision makers after they are downloaded from the storage on the bus. Yu et al.
proposed to use a roadside cloud, which assigns a virtual machine (VM) with required
storage when a bus passes by a roadside coverage area for storing video files. As the
bus moves from one coverage area to another (i.e., between roadside cloutlets), the
VM is also migrated to the new coverage area. The video can then be requested to be
transferred to a datacenter from the roadside cloutlets as needed.
transmissions may occur without the vehicle owner’s permission or knowledge. Even
if the owners are aware of such transmissions, it may also be impossible to disable
them without turning off the desirable wireless capability.
It is difficult to gauge the number of incidents in which attacks have occurred
against vehicular networks in the wild due to a dearth of data in part because of
manufacturers’ reluctance to share such information [41]. However, such attacks are
no longer theoretical in nature, having been demonstrated against actual vehicles being
driven on public roadways. In 2013, Miller and Valasek demonstrated that attacks
could be launched against various ECUs over a car’s CAN bus with direct physical
access via an OBD-II interface [42]. This included control over the speedometer,
odometer, onboard navigation, steering, braking, and acceleration [42]. The authors
were also able to modify the firmware of some ECUs [42], raising the possibility of
advanced persistent threats which reside on an intra-vehicular network for a prolonged
period of time. Threats which apply to firmware in general also apply to vehicular
systems [43], with a heightened risk due to safety issues associated with transportation
systems.
Initially, the automotive industry downplayed the impact of security research by
emphasizing the implausibility of the direct physical access which was required. In
2015, Miller and Valasek challenged these claims when they extended their attacks
by carrying them out remotely. In the specific instance tested, access was gained
over a long-range cellular network to a vehicle’s entertainment system, granting them
the same control over critical car systems [44]. The results of this research were
widespread, including a recall of over a million vehicles, lawsuits against cellular
carriers, and the aforementioned congressional report on the current state of insecurity
in vehicular systems [41,44].
the CAN bus by adding hardware security modules to each ECU. To coordinate keys
between different components, an ECU acts as a key manager [64]. On the other hand,
if there exists transportation infrastructure, such as RSUs, traffic sensors, and cam-
eras, then these pre-deployed devices are generally regarded as trusted because they
are connected via wired networks, which are generally better protected and thus much
harder to be tampered or compromised when compared to the vehicles. Thus, in the
presence of the various transportation infrastructure equipments, trust management
will become a viable and indispensable solution to help better secure TCPS.
An alternative approach is to modify the CAN standard by adding new fields
which can be used as indicators of intrusion [65]. In [66], the authors discuss the
requirements for formally verifying trust within an intra-vehicular network, includ-
ing identifying all ECUs, their connections to themselves and cloud providers, and
establishing rules that govern how these entities are allowed to interact.
Under the assumption that the data collected by TCPS nodes is authenticated by
one of the aforementioned techniques, the next step to creating a trustworthy sys-
tem is to authenticate the messages exchanged between participant nodes. There
have been several different proposals as to how to achieve this, including tech-
niques based on asymmetric cryptography [54], group cryptography [67], and timing
synchronization [68].
As mentioned in Section 8.4.2, TCPS systems that connect to the cloud will
be subject to all the additional security considerations that users of cloud services
face in other contexts. Fortunately, there has been a great deal of research attention
provided to developing security solutions for cloud systems. These include distributed
intrusion detection solutions [69,70] and encryption techniques based on identity-
based cryptography [71] or homomorphic ciphers [72].
observation that is directly made by a node itself, and the first-hand observation can
be collected either passively or actively. If a node promiscuously observes its neigh-
bors’ actions, the local information is collected passively. In contrast, the reputation
management system can also rely on some explicit evidences to assess the neighbor
behaviors, such as an acknowledgment packet during the route discovery process. The
other kind of observation is called second-hand observation or indirect observation.
Second-hand observation is generally obtained by exchanging first-hand observations
with other nodes in the network. The main issues of second-hand observations are
related to overhead, false report and collusion [74,75].
Trust management has been proven to be an important security solution to
cope with various misbehaviors in wireless networks, which act as the primary key
enabling technology for TCPS. Therefore, we will start reviewing the well-known
trust management schemes for wireless networks.
peers or direct communication with an anchor node. Malicious node can be identified
if the data they present is invalidated by the validation algorithm.
Li et al. [79] proposed a novel trust management scheme for MANETs, in which
the trust of a mobile node is evaluated from multiple perspectives rather than merely
one single trust scalar. More specifically, the proposed scheme evaluates trustwor-
thiness from three perspectives: collaboration trust, behavioral trust, and reference
trust. Different types of observations are used to independently derive values for these
three trust dimensions. This research work was extended in [80], in which declara-
tive policies are used to better evaluate the trust of a mobile nodes under different
environmental factors.
In [81], the authors proposed a cluster-based hierarchical trust management
scheme for wireless sensor networks to effectively deal with selfish or malicious
nodes, in which the authors consider both trust attributes derived from communi-
cation and from social networks to evaluate the overall trust of a sensor node. To
demonstrate the utility of the proposed hierarchical trust management protocol, the
authors apply it to trust-based geographic routing and trust-based intrusion detection.
For each application, the authors identify the best trust composition and formation to
maximize application performance.
He et al. [82] proposed ReTrust, an attack-resistant and lightweight trust manage-
ment approach for medical sensor networks. In this approach, the authors delegate the
trust calculation and management functionality to master nodes (MNs) so that there
will be no additional computational overhead for resource-constrained sensor nodes
(SNs), which is a critical factor for medical sensor networks. Moreover, the authors
also discuss the possibility to use the ReTrust approach to detect and fight against
two types of malicious attacks in medical sensor networks, namely on-off attacks and
bad mouth attacks.
Chen et al. [83] studied a dynamic trust management protocol for secure routing
optimization in delay tolerant networks (DTNs) in the presence of well-behaved,
selfish and malicious nodes, in which the concept of dynamic trust is highlighted in
order to determine and apply the best operational settings at runtime in response to
dynamically changing network conditions to minimize trust bias and to maximize the
routing application performance. Furthermore, the trust-based routing protocol can
effectively trade off message overhead and message delay for a significant gain in
delivery ratio.
In [84], Wei et al. presented a unified trust management scheme that enhances the
security in MANETs that uses recent advances in uncertain reasoning that originated
from the artificial intelligence community. In the proposed trust management scheme,
the trust model has two components: trust from direct observation and trust from
indirect observation. With direct observation from an observer node, the trust value
is derived using Bayesian inference, which is a type of uncertain reasoning when the
full probability model can be defined. On the other hand, with indirect observation,
which is also called secondhand information that is obtained from neighbor nodes of
the observer node, the trust value is derived using the Dempster–Shafer theory (DST),
which is another type of uncertain reasoning when the proposition of interest can be
derived by an indirect method.
Security, trust, and privacy for cloud computing in TCPS 185
Ren et al. [85] proposed a novel trust management scheme for unattended wireless
sensor networks (UWSNs), which are characterized by long periods of disconnected
operation and fixed or irregular intervals between sink visits. The absence of an
online trusted third party implies that existing WSN trust management schemes are
not applicable to UWSNs. To address this limitation, the authors studied a trust
management scheme for UWSNs to provide efficient and robust trust data storage and
trust generation. For trust data storage, the authors employed a geographic hash table
to identify storage nodes and to significantly decrease storage cost. In addition, the
authors used subjective logic-based consensus techniques to mitigate trust fluctuations
caused by environmental factors. Finally, the authors exploited a set of trust similarity
functions to detect trust outliers and to sustain trust pollution attacks.
In [86], an energy efficient collaborative spectrum sensing (EE-CSS) protocol,
based on trust management, is proposed. The protocol achieves energy efficiency by
reducing the total number of sensing reports exchanged between the honest secondary
users (HSUs) and the secondary user base station (SUBS) in a traditional collaborative
spectrum sensing (T-CSS) protocol.
the signal will be. By this means, the proposed model discourages the sender nodes
from behaving in a malicious fashion. Similarly, cooperation of the sender nodes will
be rewarded proportionally to the signal’s value. This research idea was extended
in [89].
Liao et al. proposed a trust-based approach to decide the likelihood of the accu-
racy of V2V incident reports considering the trustworthiness of the report originator
and those vehicles who have forwarded it [90]. In particular, the approach leverages
the existing V2I communication facilities to collect vehicle behavior information
in a crowdsourcing fashion so as to build a more comprehensive view of vehicle
trustworthiness.
The authors in [91] identified that the problem of information cascading and
oversampling, which are generally common in social networks, also adversely impacts
trust management schemes in VANETs. Moreover, the authors also demonstrated
that simple voting approach for decision making can lead to oversampling and yield
incorrect results in VANETs. To address this issue, the authors proposed a new voting
scheme, in which each vehicle has different voting weight according to its distance
from the event. In other words, the vehicle which is more closer to the event possesses
a higher weight.
In a recent study conducted by Li et al. [92], an attack-resistant trust management
scheme (ART) was proposed for VANETs, in which the trustworthiness in vehicular
networks is separately evaluated in multiple dimensions: data trust is evaluated based
on the data sensed and collected from multiple vehicles; node trust is assessed in two
aspects, i.e., functional trust and recommendation trust, which indicate how likely a
node can fulfill its functionality and how trustworthy the recommendations from a
node for other nodes will be, respectively.
form this may take, although other sensor data, such as accelerometers and gyro-
scopes, can also reveal data about a vehicle’s location [95]. Connected vehicles can
also be tracked by their connections to cell towers or wireless access points as they
travel through an environment, providing even better location granularity than GPS
coordinates [96].
Different privacy preservation techniques have been developed which provide
varying degrees of privacy guarantees under differing contexts. Various techniques
involving pseudonym rotations based on vehicle time and location have been proposed
[97]. In [98], threshold authentication is used to preserve the privacy of V2X messages
while still allowing for the possibility of verifying message trustworthiness.
Users of a TCPS may also be tracked based on other devices which they carry,
such as cell phones, laptops, or payment tokens. Transportation payment tokens are
increasingly equipped with short-range wireless capabilities, frequently based on
Radio Frequency Identification (RFID) technology [99]. To protect against informa-
tion leakage by payment tokens, techniques from the field of RFID security can be
applied which utilize contextual sensing to determine when it is safe or unsafe to
transmit data [100–102].
Although it may at first seem that privacy and the public nature of mass transporta-
tion are at odds with one another, cryptographic techniques can be used to provide
riders with privacy assurances. In one such proposed approach, e-cash techniques are
combined with re-authentication to create session tickets which can be used to pay
for transportation services without revealing the identity of the passenger [103].
Once collected, transportation cloud services must also take privacy consid-
erations into account during processing by the backend. Common techniques for
providing privacy assurances include aggregating and masking data [104]; however,
it can be difficult to verify the efficacy of such mechanisms on a third-party system.
8.6 Conclusion
In recent years, we have witnessed a rapid growth to the number of smart and intercon-
nected devices that have been deployed to the transportation systems, such as on-board
sensors equipped in each vehicle, traffic monitoring sensors installed in RSU, and
smart phones carried by pedestrians. All these devices sense the surrounding traffic
and other transportation-related information, some of which will be further processed
using the integrated computational capabilities, and each device will closely inter-
act and exchange information with other devices via the communication links. All
these smart devices together with the transportation infrastructure compose the TCPS,
which has the potential to revolutionize the traditional transportation system.
However, it has also been noted that TCPS is vulnerable to various security
threats. Therefore, it is essential to better secure TCPS. In this chapter, we first
summarize the basic idea of TCPS and how cloud computing plays a role in it. The
attack surface of TCPS is then described in details. We then provide a comprehensive
overview regarding the current state-of-the-art security mechanisms for TCPS. More
188 Data security in cloud computing
specifically, we focus on both trust and privacy techniques which aim at addressing
the unique challenges posed in the TCPS context.
As for the future research directions, we envision that the following research
challenges should be better addressed regarding the security, trust, and privacy issues
in TCPS.
References
[1] Z. Shahan, “1 million electric cars will be on the road in Septem-
ber,” https://cleantechnica.com/2015/08/08/1-million-electric-cars-will-be-
on-the-road-in-september/, accessed: 16-09-2016.
[2] “Global status report on road safety,” World Health Organization, 2015.
[3] S. Abdelhamid, H. S. Hassanein, and G. Takahara, “Vehicle as a resource
(VaaR),” IEEE Network, vol. 29, no. 1, pp. 12–17, January 2015.
[4] B. Fleming, “Advances in automotive electronics,” IEEE Vehicular Technol-
ogy Magazine, pp. 4–13, December 2015.
[5] W. He, G. Yan, and L. D. Xu, “Developing Vehicular Data Cloud Services in
the IoT environment,” IEEE Transactions on Industrial Informatics, vol. 10,
no. 2, pp. 1587–1595, May 2014.
[6] Y. Lin and H. Song, “Dynachina: Real-time traffic estimation and prediction,”
IEEE Pervasive Computing, vol. 4, p. 65, 2006.
[7] G. Karagiannis, O. Altintas, E. Ekici et al., “Vehicular networking: A sur-
vey and tutorial on requirements, architectures, challenges, standards and
solutions,” IEEE Communications Surveys & Tutorials, vol. 13, no. 4,
pp. 584–616, Fourth Quarter 2011.
Security, trust, and privacy for cloud computing in TCPS 189
[37] R. N. Charette, “This car runs on code,” IEEE Spectrum, vol. 46, no. 3, p. 3,
2009.
[38] S. Checkoway, D. McCoy, B. Kantor et al., “Comprehensive experimental
analyses of automotive attack surfaces,” in USENIX Security Symposium.
San Francisco, CA, USA, 2011.
[39] I. Studnia, V. Nicomette, E. Alata, Y. Deswarte, M. Kaâniche, and
Y. Laarouchi, “Survey on security threats and protection mechanisms in
embedded automotive networks,” in Dependable Systems and Networks Work-
shop (DSN-W), 2013 43rd Annual IEEE/IFIP Conference on. Budapest,
Hungary: IEEE, 2013, pp. 1–12.
[40] “Connected car devices,” http://postscapes.com/connected-car-devices/,
accessed: 10-08-2015.
[41] E. Markey, “Tracking & hacking: Security & privacy gaps put American
drivers at risk,” US Senate, 2015.
[42] C. Miller and C. Valasek, “Adventures in automotive networks and control
units,” DEF CON, Las Vegas, NV, USA, vol. 21, pp. 260–264, 2013.
[43] A. Cui, M. Costello, and S. J. Stolfo, “When firmware modifications attack:
A case study of embedded exploitation.” in NDSS, San Diego, CA, USA,
2013.
[44] C. Miller and C. Valasek, “Remote exploitation of an unaltered passenger
vehicle,” Black Hat USA, Las Vegas, NV, USA, 2015.
[45] N. Lu, N. Cheng, N. Zhang, X. Shen, and J. W. Mark, “Connected vehicles:
Solutions and challenges,” IEEE Internet of Things Journal, vol. 1, no. 4,
pp. 289–299, 2014.
[46] T. Bécsi, S. Aradi, and P. Gáspár, “Security issues and vulnerabilities in
connected car systems,” in Models and Technologies for Intelligent Trans-
portation Systems (MT-ITS), 2015 International Conference on. Budapest,
Hungary: IEEE, 2015, pp. 477–482.
[47] T. Heer, O. Garcia-Morchon, R. Hummen, S. L. Keoh, S. S. Kumar, and
K. Wehrle, “Security challenges in the ip-based internet of things,” Wireless
Personal Communications, vol. 61, no. 3, pp. 527–542, 2011.
[48] S. Raza, H. Shafagh, K. Hewage, R. Hummen, and T. Voigt, “Lithe:
Lightweight secure coap for the internet of things,” IEEE Sensors Journal,
vol. 13, no. 10, pp. 3711–3720, 2013.
[49] D. Le, X. Fu, D. Hogrefe et al., “A review of mobility support paradigms for
the Internet,” IEEE Communications Surveys and Tutorials, vol. 8, no. 1–4,
pp. 38–51, 2006.
[50] W. M. Eddy, “At what layer does mobility belong?” IEEE Communications
Magazine, vol. 42, no. 10, pp. 155–159, 2004.
[51] A. M. Vegni, R. Cusani, and T. Inzerilli, Seamless Connectivity Tech-
niques in Vehicular Ad-hoc Networks. INTECH Open Access Publisher,
2011.
[52] J. J. Blum, A. Eskandarian, and L. J. Hoffman, “Challenges of intervehicle
ad hoc networks,” IEEE Transactions on Intelligent Transportation Systems,
vol. 5, no. 4, pp. 347–351, 2004.
192 Data security in cloud computing
Abstract
Manipulating and delivering data in heterogeneous environments such as those
underlying cloud systems is a critical task because of confidentiality issues. Cloud
technology remains vulnerable to data leakage attacks due to its applications in gath-
ering information about multiple independent entities (e.g. end users and VMs) and
sharing cloud resources. Furthermore, the number of threats are increased when
the cloud users are using cloud computing services compared to PC users, due to
loss of control, privacy and outsourced data storage. Consequently, hackers exploit
security vulnerabilities to launch attacks to take advantage of sensitive data such as
secret keys.
When data is manipulated and shared between different parties in cloud systems,
it will be vulnerable to threats in cloud systems. This chapter explores data vulnera-
bility throughout its life cycle to categorise existing data leakage attack techniques in
terms of where they can be implemented and what can be stolen in this untrusted envi-
ronment, and also classifies data leakage attack techniques according to the type of
data, such as files and secret keys. Furthermore, this study explores core technologies
upon which cloud computing is built, such as the web, virtualisation and cryptog-
raphy, and their vulnerabilities prone to such attacks. We also propose existing data
leakage detection and protection techniques to mitigate and alleviate such attacks.
9.1 Introduction
Cloud computing is a new model of client server paradigm that uses Internet services
to enable a number of technologies offering better solutions to end users without
a required knowledge for any of these technologies and services. The richness of
various models in cloud computing makes it a ubiquitous computational environment,
wherein cloud users can access cloud services anywhere, at any time through the
Internet. Moreover, boosting hardware resources on demand and improving flexibility
1
School of Computing, University of Portsmouth, UK
198 Data security in cloud computing
9.2.1 Data-At-Rest
In this state, data is stored in storage devices. In computer systems, the internal hard
disk of a computer is a permanent storage for data; DAR is a representation of data
when it is stored in such devices. In cloud computing, there is a concern about the
location transparency of data due to the complexity of the cloud infrastructure being
hidden from cloud consumers; cloud consumers do not know where their data is or
Review of data leakage attack techniques in cloud systems 199
has been stored. This disruption causes hesitancy among organisations considering
involvement in cloud systems, worrying about who deals with sensitive data. More-
over, consumers are also worried about malicious insiders who co-locate on the same
storage. Ristenpart et al. [36] proposed a hard disk-based convert channel attack
against arbitrary VMs on Amazon EC2. They successfully identified cloud storage
co-residency and transmitted data between the VMs when they were accessing the
same storage devices.
Another concern about large files, due to limited space in main memory, is that
the OS loads a portion of the file in contiguous pages in main memory, and the rest
remains on the storage device of the system (it is coordinated with the same physical
addresses that it – the file – originated from). Bernstein [6] exploited this feature
to construct a covert channel attack to identify which parts of the file have already
been loaded and those still on the disk by time variation. Accessing those parts of the
file that have been accessed recently takes shorter time than those that have not been
touched.
9.2.2 Data-In-Motion
DIM describes the transfer of data over networks, which is delivered from a source to
a destination. Data will be at risk in this media due to eavesdropping attack, which is
able to inspect packets. Cryptographic technology have been used to protect the data
such as Secure Shell (SSH). However, such protection techniques are still vulnerable
to information leakage attack. Song et al. [40] showed a possible leakage attack against
network packages; particularly, when they are currying credential information and
moving through SSH. The authors developed a Hidden Markov Model (HMM) and
key sequence prediction algorithm to guess correct pressed keys, which forms the
password, by victim.
In cloud systems, Software-as-a-Service (SaaS) enables web technology to
deliver applications as a service to end users. The data in this state represent
request/response between client and server while they communicate. The data needs
to be protected by encrypting it from eavesdropping attack. However, this does not
stop attackers from malicious intentions, they can still analyse pocket size and timing
to leak sensitive information [9].
9.2.3 Data-In-Use
In a DIU state, data is loaded into computational resources (mainly CPU components
such as registers, caches and main memory). Generally, to load data from storage
devices to CPU registers, it must pass through several hierarchical buffers, includ-
ing main memory and CPU caches (L1, L2 and L3) prior to reaching the registers,
where it is primed for operations such as read and write. According to the applica-
tions, data might have different alignment and organisations while loading into main
memory regarding the data types, such as array, list, linked list, class and structure.
In a multitasking environment, physical resources are shared between processes in
different layers, such as OS (page sharing) and application (shared libraries), the data
frequently being used by those resources; therefore, the OS alternates the processes
200 Data security in cloud computing
in using those resources in the case that a region of memory is shared between pro-
cesses, such as buffer lookup table in AES [6]. Recent work has shown vulnerabilities
of DIU in multitasking systems in both layers’OS (page sharing) [12,42], the achieved
memory deduplication attack and Application (shared library).
9.3.3 Cryptography
Cryptography can be considered as a set of algorithms which primarily relies on math-
ematical theories with computational supports to be practised in computer systems. It
is widely used in various domains as a data protection solution from malicious parties
who intend to steal interesting information. It can be used by OS or application layers
and for different purposes, such as email services, health records, banking.
It can be an OS level solution to protect data by storing encrypted data on physical
storage devices. In modern OS settings, during OS installation, one of the optional
steps is asking to encrypt user files before storing them on the internal storage device
during the use of the OS. In cloud systems, this is good for cloud consumers because
they are worried about storage location transparency. Moreover, it can be used at
application level on top of OS level. Particularly in web applications, software indus-
tries integrate cryptographic algorithms with page contents to encrypt any data that
exchanges between client/server for any request/response. This is to protect data from
eavesdropping [9]. This can be useful in the cloud when Platform-as-a-Service (PaaS)
becomes mainstream to deliver enterprise applications through web technology to web
users.
Although cryptography has widely been used in computer and cloud systems, it is
still vulnerable to information leakage attacks. In this chapter, we focus on side/covert
channel attacks, in computer and cloud systems, such attacks attempt to gain sensitive
information, such as secret keys against cryptographic algorithms. The attacks take
advantage of software implementation and hardware architecture vulnerabilities or a
combination of both, rather than attacking underlying implementation of the algo-
rithms or brute force attacks. For instance, brute force attacks rely on guesswork,
wherein the attackers attempt a list of keys or passwords to find the right one.
Stealing sensitive data is possible at all data states (DAR, DIP and DIT) (see
Table 9.1) as they all have special consideration and techniques to achieve the attacks.
Data in multitasking/users’ systems can be processed by different resources, such as
CPU caches, memory, storage and network media. Each media has its own charac-
teristics on which the attackers rely during experiments. So, it is crucial to focus on
the common features that have already been exploited by previous studies.
9.4.2.2 Timing
Most of the published side and covert channel attacks relied on timing to achieve the
attacks. The attackers are interested in hardware activities at granularity level such
as number of cycles to access a single cache line in L1 cache. They utilise timing to
measure cache accesses, using own data, to synchronise with victim’s data on the same
cache in shared hardware settings. Modern processors such as Intel, which has offered
a hardware support, is a special counter register, for timing with a high resolution for
every operation. At the same time, Intel offers Read Time Stamp Counter (RDTSC)
204 Data security in cloud computing
instruction to read value of the counter register.1 The attackers use RDTSC instruction
to measure cache accesses, using own data, to synchronise with victim’s data on the
same cache. The utilisation of RDTSC has been proposed in attack implementation
code by different techniques such as prime+prob [28,35] and flush+reload [56].
9.4.2.3 CPU power consumption
Attackers monitor CPU power consumption activities over time to deduce mathemat-
ical operations, such as multiplication and division which use more CPU components
than add and subtraction operations. This information will be beneficial for attackers
who are targeting cryptographic algorithms to extract secret keys. This is because
when a computer is running cryptographic algorithms, CPU executes a series of mul-
tiplications and divisions which causes the CPU to consume more power. The attacker
then analyses the variation of the power consumption to deduce some or all of the
secret key bits. This chapter will not be focusing on that feature [5,50].
9.4.2.4 Page sharing
Memory page sharing is a notable technique which is the predominant method used in
memory management systems to reclaim memory. This technique is widely utilised
in OSs and hypervisors, such as KVM KSP [4] and ESX Transparent Page Sharing
(TPS) [45]. In this technique, the OS or hypervisor scans memory pages to find
identical pages (same contents), in every specific period of time, to maintain only
one copy of the pages and remove identical ones [32,47]. In virtualisation, when a
hypervisor runs multiple VMs of the same OS, it can reclaim a sufficient memory.
Alto et al. [46] statistically showed that more than 40% of memory can be saved when
10 VMs are running.
For instance, let’s assume that a file with 10 pages are shared by two VMs
(VM1 and VM2); when VM1 makes modifications on two pages, a copy-on-write
mechanism will take place by OS, which creates two private copies of the pages and
refers them to the VM1. VM2 no longer has access to the VM1’s private copies.
So, each VM has two private and eight shared pages. VM1 can observe a variation in
writing time between shared and private pages. Writing on shared pages is longer than
private pages. As the result, previous studies have shown how this feature has been
exploited by covert channel attackers in computer [49] and cloud systems [7,42,53].
9.4.2.5 Shared library
Shared libraries are code that is compiled in a way that other programs can link with
it during run-time. As a shared library is loaded into memory, the same memory
locations are shared with multiple programs which are linked with it. This saves
memory space; instead of having multiple copies for each program, only one copy
will be utilised. Another advantage is that it is easy to maintain; any modification on
the library does not affect the linked programs, and all the programs have the most
recent version of the library once it is loaded.
1
RDTSC is an assembly instruction, and it can be used in C and C++ in-line assembly. For more detail on
how to use RDTSC in modern Intel CPUs, see [33].
Review of data leakage attack techniques in cloud systems 205
However, shared libraries are another factor to provide shared variables between
independent processes. This has a negative impact on data protection. The most visible
vulnerability against shared libraries in past and recent studies is the usage of OpenSSL
implementation of AES [6,60], which provides a dynamic shared library libcrypto,
so as to be linked with multiple programs in UNIX-based OSs. AES utilises a lookup
or S-Box table, which is an array of values that will be used during the encryption
rounds. When a victim uses this table during encryption, an attacker can observe
recently accessed elements from the lookup table by the victim.
utilised inclusive cache L3 to overcome this issue and their result surprisingly showed
higher than previous attacks, namely 751 bps with the same attack settings.
9.4.4 Techniques
Time+Evict (TE) [31,43]: In this technique, it is assumed that a shared library is
linked to both attacker and victim program concurrently such as lookup table in AES.
They both have access to the lookup table. The attacker monitors cache line(s) that
are synchronised with the array.2 The attacker first finds the average time taken for
one encryption, then triggers the encryption function and evicts cache lines, that have
already touched the array. After the eviction, the attacker triggers a series of encryption
and measures them. If any encryption call takes longer than the average time, it
indicates that the evicted cache line(s) has(ve) been accessed recently by the victim.
Prime+Probe (PP) [28,60]: In this technique, an attacker process monitors
victim process by filling the CPU cache (unified cache such as L3 or LLC between
them) with its data. This is to check which attacker’s cache lines have been evicted by
the victim’s data. To do that, the attacker process utilises a busy loop that sleeps for a
specific time with each iteration to wake the attacker process up and measure variation
of access time to its cache lines. Longer time indicates the cache line is evicted by
the victim and needs to be loaded from a higher memory level. This technique can be
applied in all memory levels.
Flush+Reload (FR) [17,56]: This technique is the inverse of prime+probe. Both
the attacker and the victim should have access to the same data concurrently; this is a
feature of shared library as is referred to in Section 9.4.2.5. The attacker targets a range
of addresses in a shared library, which is loaded in to the main memory, and flushes
one or more cache line(s) which is referenced in the address range. The attacker needs
to be sure that the cache lines(s) have been removed from the cache hierarchy (L1, L2,
and L3). After the flush, the attacker waits until the victim accesses some data in the
address range. When the victim attempts to access the flushed cache line(s), he must
access them in main memory, due to the flush performed by the attacker. Finally, the
attacker scans and encodes the time accesses; shorter time indicates the cache line(s)
have been recently accessed by the victim.
Flush+Flush (FF) [13,14]: This attack technique is composed of two flushes.
This attack defers from flush+reload technique by avoiding reload stage, which causes
the cache to be free of misses. Thus detections, which rely on cache misses, of such
attack will be difficult. In this technique, the attacker relies on time variation of series
flush in instruction rather than monitoring cache line accesses.
2
The lookup table is represented as an array when it is loaded into main memory.
208 Data security in cloud computing
In the following, we describe a generic attack model in cloud systems and general
steps to achieve such attacks. In addition, we historically show the previous attacks,
over the last 15 years, against various shared resources.
In the earlier sections, the vulnerabilities of cloud components which might lead
to the occurrence of data leakage attacks were demonstrated. The achievement of
data leakage attacks depends on the attacker’s attitudes to what they are stealing (e.g.
secret keys). The aforementioned showed the importance of cryptographic algorithms
in real systems to protect data in various application domains by encrypting data with
secret keys. This has motivated attackers to deploy sophisticated attack techniques
to gain the secret keys. The scope of this attack model will be targeting secret keys.
A data leakage attack will be achieved in a two-step placement and running the
attack.
After identifying the data type by an attacker, the attacker needs to place its
malicious processes on the same physical machine as the victim is co-allocated, due
to the nature of the attack, which is co-resident. Placing a new instance of VM in a
lab is cheaper compared to real systems such as EC2 [36,55]. This is because cloud
providers, in real systems, attempt to hide the complexity of cloud infrastructure and
data storage to prevent cloud cartography. Thus, the attacker needs further action to
take place to find its victim. To overcome this problem, Ristenpart et al. [36] suc-
cessfully established a covert channel attack in a real cloud system, Amazon EC2, by
discovered EC2 mapping (internal and external network address spaces correspond-
ing to an entity creation) by utilising network probing technique. This was useful for
the early stage for the attacker to find the internal map of EC2 to place the attack
process with the victim so as to be in the same zone in EC2. This affords a chance of
being co-allocated on the same physical server in the same zone.
The next step is observation. In this stage, the attacker utilises the attack tech-
niques, such as prime+prob and flush+reload, as mentioned previously in Section
9.4.5. The attacker starts monitoring victim’s activities against shared hardware.
In recent years, these attacks have been studied very well across various on-board
resources such as CPU caches (L1 [60], L2 [48] and L3 or LLC [56]) and memory
pages [59]. Table 9.1 depicts relevant attacks against physical resources.
Stealing secret keys from cryptographic algorithms depends on the nature of
the algorithms. For instance, AES algorithm [6,8] encrypts plain text using lookup
table. This array holds a list of values that will be used during the encryption rounds.
This table is the most critical component in the algorithm and has been targeted by
attackers. AES attackers are interested in cache line accesses and cache contentions
to find out recent use of cache lines, which hold the candidate elements in the lookup
table by the victim.
However, the RSA algorithm [56], instead of having S-Box, depends on the
mathematical operations square and multiply. Thus, the attackers are interested in
tracing the execution of the victim’s program rather than memory accesses.
In attacks against L1 cache. Zhang et al. [60] constructed a side channel attack
utilising L1 instruction cache to extract AES secret keys from a co-resident victim VM
with attacker VM; both were running libgcrypt shared library. The authors addressed
source of noise (hardware and software) against the attack VM and were able to reduce
Review of data leakage attack techniques in cloud systems 209
the noise during the observation stage by using a combination of SVM and HMM to
deduce key-bits with low false negative.
In attacks against L2 cache, Ristenpart et al. [36] introduced a cross-VM covert
channel attack against CPU L2 cache. The authors targeted large-sized files to con-
vey messages between two VMs. The aim was to identify VMs’ co-residency on the
shared storage devices. The authors were interested in hard disk contention patterns
by recording access time variations to certain portions of the files between the VMs.
Subsequently, Xu et al. [55] carried out the previous attack and improved the res-
olution of the previous covert channel model with a higher bandwidth and lower
error rate.
Following this, attackers improved leakage attack by utilising different resources,
such as L3, with less time to recover entire keys. In previous studies, sharing cores
210 Data security in cloud computing
between attack and victim VMs was one of the requirement settings of the attacks.
Before 2014, the most targeted cache levels were L1 and L2 and the attack and the
victim process needed to be paired on the same core with the attacker process. Zhang
et al.’s [60] attack model utilised IPI interrupt to force the victim process to migrate
from a different core and allocate it with the attacker process. However, Yarom and
Falkner [56] introduced a new direction of side channel attacks by utilising L3 cache
and taking advantage of the inclusiveness feature. They proposed a flush + reload
technique to extract the components of the private key, approximately 97.7% bits of the
key, from the GnuPG implementation of RSA. Here, the attacker and victim process
resided on different cores, in shared page settings. They successfully constructed an
LLC-based channel between two unrelated processes in a virtualised environment.
As mentioned earlier, the deduplication feature is a key function in virtualisation,
so that the host is able to reclaim a large amount of memory against identical contents.
The results of the previous studies showed that this feature has been exploited in
cloud systems. Suzaki et al. [42] proposed a matching technique to identify running
applications (sshd and apache2 on linux OS and Firefox and IE6 on Windows XP)
and discover that a targeted file is downloaded in a browser on the victim’s VM.
Bosman et al. [7] proposed a side channel JavaScript-based attack against Windows
Edge Browser to retrieve HTTP hashed password. This weakness encouraged CPU
industries to disable this feature. However, this has not stopped attackers from leaking
information against hardware resources. Irazoqui et al. [22] took up the challenge that,
even with the disabling deduplication feature, leaking information is still feasible. The
authors utilised huge sized pages, in which the attacker can gain full physical addresses
from virtual addresses. When in cache addressing, the same physical addresses are
utilised for cache addressing.
In shared page settings, [56] successfully constructed LLC-based channel
between two unrelated processes in virtualised environment. As mentioned earlier,
deduplication feature is a key function in virtualisation so that the host OS is able to
reclaim a large amount of memory against identical contents. This feature is exploited
by [42,53,55]. This weakness encouraged CPU industries to disable this feature.
However, this does not stop attackers from leaking information from hardware level.
Irazoqui et al. [22] took the challenge that even with disabling deduplication feature,
leaking information is still feasible.
9.5.1 OS level
Previous studies proved that the OS failed to hide the granularity of underlying hard-
ware activities in multitasking systems. Moreover, it failed in maintaining the shared
libraries used by higher-level applications on top of it. Thus, researchers have offered
various solutions to customise existing OSs to mitigate such attacks.
Recent intensive studies have shown various exploitations to achieve such attacks.
This has led to drastic changes by CPU designers and software industries to mit-
igate and put barriers in front of attackers to prevent such malicious intentions. In
previous works, attacks against memory deduplication in virtualised systems led soft-
ware industries to make changes in the system settings by disabling such features by
default in their systems, such as Amazon EC2 [22,28]. Zhou et al. [62] proposed
CACHEBAR, which is a kernel space solution, to provide a concrete protection of
shared pages between cross-VMs in PaaS.
Irazoqui et al. [22] proposed S$A against LLC cache in large page settings to
retrieve AES secret keys. Because an attacker, in large size page addressing, can
reveal enough physical addresses to produce cache addresses, the authors, based on
the attack limitation, showed solutions to their attack by disabling large size pages
and supporting private cache slices per each VMs, which prevents cache interferences
between VMs. As such, an attacker will not be successful in deducing which cache
slices have been used by the victim.
performance analysis, because the nurture of this attack degrades performance of the
system.
Zhang et al. [58] designed a protection model (CloudRadar) to detect cross-VMs
side channel attacks in public cloud services PaaS against LLC without any hardware
and software configuration settings. Their model is based on the combination of
signature-based and anomaly-based detection techniques that relay on the use of
hardware performance counters.
9.6 Conclusion
This chapter has reviewed previous studies on data leakage attack techniques (side and
covert channels) and highlighted the essential resources and their characteristics that
have been used in such attacks. Furthermore, it introduced the technical requirements
to achieve such an attack. This chapter has demonstrated the importance of data and
its usage by different technologies in cloud environment while it is moved through
insecure communication channels and manipulated in untrusted computational envi-
ronments. It has also categorised data leakage attack techniques according to the
architectural layers in which the attack can be achieved and the resources have been
used to perform such attacks, highlighting what type of data is targeted by this attack.
References
[1] Onur Aciiçmez. Yet another microarchitectural attack: Exploiting i-cache. In
Proceedings of the 2007 ACM Workshop on Computer Security Architecture,
CSAW ’07, pages 11–18. New York, NY, USA: ACM, 2007.
[2] Onur Acıiçmez, Billy Bob Brumley, and Philipp Grabher. New results
on instruction cache attacks. In International Workshop on Cryptographic
Hardware and Embedded Systems, pages 110–124. Springer, 2010.
[3] Hassan Aly and Mohammed ElGayyar. Attacking AES using Bernstein’s attack
on modern processors. In Progress in Cryptology – AFRICACRYPT 2013,
pages 127–139. Springer, 2013.
[4] Andrea Arcangeli, Izik Eidus, and Chris Wright. Increasing memory density
by using ksm. In Proceedings of the Linux Symposium, pages 19–28. Citeseer,
2009.
[5] Utsav Banerjee, Lisa Ho, and Skanda Koppula. Power-based side-channel
attack for AES key extraction on the atmega328 microcontroller, 2015.
[6] Daniel J Bernstein. Cache-timing attacks on AES, 2005.
[7] Erik Bosman, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. Dedup est
machina: Memory deduplication as an advanced exploitation vector, 2016.
[8] Samira Briongos, Pedro Malagón, José L Risco-Martín, and José M Moya.
Modeling side-channel cache attacks on AES. In Proceedings of the Summer
Computer Simulation Conference, page 37. Society for Computer Simulation
International, 2016.
214 Data security in cloud computing
[9] Shuo Chen, Rui Wang, XiaoFeng Wang, and Kehuan Zhang. Side-channel
leaks in web applications: A reality today, a challenge tomorrow. In 2010
IEEE Symposium on Security and Privacy, pages 191–206. IEEE, 2010.
[10] Cheng-Kang Chu, Wen-Tao Zhu, Jin Han, Joseph K Liu, Xu Jia, and Zhou
Jianying. Security concerns in popular cloud storage services. IEEE Pervasive
Computing, 12(4):50–57, 2013.
[11] Qian Ge, Yuval Yarom, David Cock, and Gernot Heiser. A survey of microar-
chitectural timing attacks and countermeasures on contemporary hardware.
Journal of Cryptographic Engineering, pages 1–27, 2016.
[12] Daniel Gruss, David Bidner, and Stefan Mangard. Practical memory dedupli-
cation attacks in sandboxed javascript. In European Symposium on Research
in Computer Security, pages 108–122. Springer, 2015.
[13] Daniel Gruss, Clémentine Maurice, and Klaus Wagner. Flush+ flush: A
stealthier last-level cache attack. arXiv preprint arXiv:1511.04594, 2015.
[14] Daniel Gruss, Clémentine Maurice, Klaus Wagner, and Stefan Man-
gard. Flush+ flush: A fast and stealthy cache attack. arXiv preprint
arXiv:1511.04594, 2015.
[15] Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. Cache template attacks:
Automating attacks on inclusive last-level caches. In 24th USENIX Security
Symposium (USENIX Security 15), pages 897–912, 2015.
[16] Shay Gueron. Advanced encryption standard (AES) instructions set. Intel,
http://softwarecommunity. intel. com/articles/eng/3788.htm, accessed 25, Aug
2008.
[17] David Gullasch, Endre Bangerter, and Stephan Krenn. Cache games-bringing
access-based cache attacks on AES to practice. In 2011 IEEE Symposium on
Security and Privacy, pages 490–505. IEEE, 2011.
[18] Berk Gulmezoglu, Mehmet Inci, Gorka Irazoki, Thomas Eisenbarth, and Berk
Sunar. Cross-vm cache attacks on AES, 2016.
[19] Berk Gülmezoğlu, Mehmet Sinan Inci, Gorka Irazoqui, Thomas Eisenbarth,
and Berk Sunar. A faster and more realistic flush+ reload attack on AES. In
International Workshop on Constructive Side-Channel Analysis and Secure
Design, pages 111–126. Springer, 2015.
[20] Hermine Hovhannisyan, Kejie Lu, and Jianping Wang. A novel high-speed
ip-timing covert channel: Design and evaluation. In 2015 IEEE International
Conference on Communications (ICC), pages 7198–7203. IEEE, 2015.
[21] Ralf Hund, Carsten Willems, and Thorsten Holz. Practical timing side channel
attacks against kernel space aslr. In Security and Privacy (SP), 2013 IEEE
Symposium on, pages 191–205. IEEE, 2013.
[22] Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. S$a: A shared cache
attack that works across cores and defies vm sandboxing and its application
to AES. In 2015 IEEE Symposium on Security and Privacy, pages 591–604.
IEEE, 2015.
[23] Gorka Irazoqui, Mehmet Sinan Inci, Thomas Eisenbarth, and Berk Sunar. Wait
a minute! A fast, cross-vm attack on AES. In International Workshop on Recent
Advances in Intrusion Detection, pages 299–319. Springer, 2014.
Review of data leakage attack techniques in cloud systems 215
[24] Mehmet Kayaalp, Nael Abu-Ghazaleh, Dmitry Ponomarev, and Aamer Jaleel.
A high-resolution side-channel attack on last-level cache. In Proceedings of
the 53rd Annual Design Automation Conference, page 72. ACM, 2016.
[25] Taesoo Kim, Marcus Peinado, and Gloria Mainar-Ruiz. Stealthmem: System-
level protection against cache-based side channel attacks in the cloud. In
Presented as Part of the 21st USENIX Security Symposium (USENIX Security
12), pages 189–204, 2012.
[26] Butler W Lampson. Dynamic protection structures. In Proceedings of the
November 18–20, 1969, Fall Joint Computer Conference, pages 27–38. ACM,
1969.
[27] Fangfei Liu, Qian Ge, Yuval Yarom et al. Catalyst: Defeating last-level cache
side channel attacks in cloud computing. In 2016 IEEE International Sympo-
sium on High Performance Computer Architecture (HPCA), pages 406–418.
IEEE, 2016.
[28] Clémentine Maurice, Christoph Neumann, Olivier Heen, and Aurélien Fran-
cillon. C5: cross-cores cache covert channel. In International Conference on
Detection of Intrusions and Malware, and Vulnerability Assessment, pages
46–64. Springer, 2015.
[29] Peter Mell and Tim Grance. The nist definition of cloud computing,
2011.
[30] Michael Neve, Jean-Pierre Seifert, and Zhenghong Wang. A refined look
at Bernstein’s AES side-channel analysis. In Proceedings of the 2006 ACM
Symposium on Information, Computer and Communications Security, pages
369–369. ACM, 2006.
[31] Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache attacks and counter-
measures: The case of AES. In Topics in Cryptology – CT-RSA 2006, pages
1–20. Springer, 2006.
[32] Ying-Shiuan Pan, Jui-Hao Chiang, Han-Lin Li, Po-Jui Tsao, Ming-Fen Lin,
and Tzi-cker Chiueh. Hypervisor support for efficient memory de-duplication.
In Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International
Conference on, pages 33–39. IEEE, 2011.
[33] Gabriele Paoloni. How to benchmark code execution times on intel ia-32
and ia-64 instruction set architectures. Intel Corporation, September, 123,
2010.
[34] N Penchalaiah and Ravala Seshadri. Effective comparison and evaluation of
des and rijndael algorithm (AES). International Journal of Computer Science
and Engineering, 2(05):1641–1645, 2010.
[35] Colin Percival. Cache missing for fun and profit, 2005.
[36] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. Hey,
you, get off of my cloud: Exploring information leakage in third-party com-
pute clouds. In Proceedings of the 16th ACM conference on Computer and
communications security, pages 199–212. ACM, 2009.
[37] Asaf Shabtai, Yuval Elovici, and Lior Rokach. A Survey of Data Leakage
Detection and Prevention Solutions. Springer Science & Business Media,
2012.
216 Data security in cloud computing
[38] Gaurav Shah and Matt Blaze. Covert channels through external interference.
In Proceedings of the 3rd USENIX Conference on Offensive Technologies
(WOOT’09), pages 1–7, 2009.
[39] Gaurav Shah, Andres Molina, Matt Blaze et al. Keyboards and covert channels.
In Usenix security, volume 6, pages 59–75, 2006.
[40] Dawn Xiaodong Song, David Wagner, and Xuqing Tian. Timing analysis of
keystrokes and timing attacks on ssh. In Proceedings of the 10th Confer-
ence on USENIX Security Symposium, volume 10, SSYM’01, Berkeley, CA,
USA:USENIX Association, 2001.
[41] Salvatore J Stolfo, Malek Ben Salem, and Angelos D Keromytis. Fog
computing: Mitigating insider data theft attacks in the cloud. In Security
and Privacy Workshops (SPW), 2012 IEEE Symposium on, pages 125–128.
IEEE, 2012.
[42] Kuniyasu Suzaki, Kengo Iijima, Toshiki Yagi, and Cyrille Artho. Memory
deduplication as a threat to the guest os. In Proceedings of the Fourth European
Workshop on System Security, page 1. ACM, 2011.
[43] Eran Tromer, Dag Arne Osvik, and Adi Shamir. Efficient cache attacks on
AES, and countermeasures. Journal of Cryptology, 23(1):37–71, 2010.
[44] Yukiyasu Tsunoo, Teruo Saito, Tomoyasu Suzaki, Maki Shigeri, and Hiroshi
Miyauchi. Cryptanalysis of des implemented on computers with cache. In
International Workshop on Cryptographic Hardware and Embedded Systems,
pages 62–76. Springer, 2003.
[45] Marcel van den Berg. Paper: Vmware esx memory resource management:
Transparent page sharing, 2013.
[46] INC VMWARE. Understanding memory resource management in vmware esx
server. Palo Alto, California, USA, 2009.
[47] Carl A Waldspurger. Memory resource management in vmware esx server.
ACM SIGOPS Operating Systems Review, 36(SI):181–194, 2002.
[48] Cong Wang, Qian Wang, Kui Ren, Ning Cao, and Wenjing Lou. Toward secure
and dependable storage services in cloud computing. Services Computing,
IEEE Transactions on, 5(2):220–232, 2012.
[49] Zhenghong Wang and Ruby B Lee. Covert and side channels due to processor
architecture. In ACSAC, volume 6, pages 473–482, 2006.
[50] Jun Wu, Yong-Bin Kim, and Minsu Choi. Low-power side-channel attack-
resistant asynchronous s-box design for AES cryptosystems. In Proceedings
of the 20th Symposium on Great Lakes Symposium on VLSI, pages 459–464.
ACM, 2010.
[51] Zhenyu Wu, Zhang Xu, and Haining Wang. Whispers in the hyper-space:
High-speed covert channel attacks in the cloud. In Presented as Part of the
21st USENIX Security Symposium (USENIX Security 12), pages 159–173,
2012.
[52] Zhenyu Wu, Zhang Xu, and Haining Wang. Whispers in the hyper-space: high-
bandwidth and reliable covert channel attacks inside the cloud. IEEE/ACM
Transactions on Networking (TON), 23(2):603–614, 2015.
Review of data leakage attack techniques in cloud systems 217
[53] Jidong Xiao, Zhang Xu, Hai Huang, and Haining Wang. A covert channel
construction in a virtualized environment. In Proceedings of the 2012 ACM
Conference on Computer and Communications Security, pages 1040–1042.
ACM, 2012.
[54] Leslie Xu. Securing the enterprise with intel AES-NI. Intel Corporation, 2010.
[55] Yunjing Xu, Michael Bailey, Farnam Jahanian, Kaustubh Joshi, Matti Hiltunen,
and Richard Schlichting. An exploration of l2 cache covert channels in virtu-
alized environments. In Proceedings of the Third ACM Workshop on Cloud
Computing Security Workshop, pages 29–40. ACM, 2011.
[56] Yuval Yarom and Katrina Falkner. Flush+ reload: A high resolution, low noise,
l3 cache side-channel attack. In 23rd USENIX Security Symposium (USENIX
Security 14), pages 719–732, 2014.
[57] Younis A Younis, Kashif Kifayat, Qi Shi, and Bob Askwith. A new prime
and probe cache side-channel attack for cloud computing. In Computer
and Information Technology; Ubiquitous Computing and Communications;
Dependable, Autonomic and Secure Computing; Pervasive Intelligence and
Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference
on, pages 1718–1724. IEEE, 2015.
[58] Tianwei Zhang, Yinqian Zhang, and Ruby B Lee. Cloudradar: A real-time
side-channel attack detection system in clouds. In International Symposium
on Research in Attacks, Intrusions, and Defenses, pages 118–140. Springer,
2016.
[59] Yinqian Zhang, Ari Juels, Alina Oprea, and Michael K Reiter. Homealone:
Co-residency detection in the cloud via side-channel analysis. In 2011 IEEE
Symposium on Security and Privacy, pages 313–328. IEEE, 2011.
[60] Yinqian Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. Cross-vm
side channels and their use to extract private keys. In Proceedings of the 2012
ACM Conference on Computer and Communications Security, pages 305–316.
ACM, 2012.
[61] Yinqian Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. Cross-
tenant side-channel attacks in paas clouds. In Proceedings of the 2014 ACM
SIGSAC Conference on Computer and Communications Security, pages 990–
1003. ACM, 2014.
[62] Ziqiao Zhou, Michael K Reiter, and Yinqian Zhang. A software approach to
defeating side channels in last-level caches. arXiv preprint arXiv:1603.05615,
2016.
This page intentionally left blank
Chapter 10
Cloud computing and personal data processing:
sorting-out legal requirements
Ioulia Konstantinou1 and Irene Kamara1
Abstract
1
Vrije Universiteit Brussel, Research Group on Law, Science, Technology and Society (LSTS), Belgium.
This research is partially based on research conducted for the Research Project SeCloud and the report
Konstantinou I., et al. ‘Overview of applicable legal framework and general legal requirements: Deliverable
3.1 for the SeCloud project’ SeCloud project, 2016.
‘Security-driven engineering of Cloud-based Applications’ (‘SeCloud’) project is a 3-year project run-
ning from 2015 to 2018, funded by Innoviris, the Brussels Institute for Research and Innovation. It aims
at addressing security risks of cloud-based applications in a holistic and proactive approach, built upon
different aspects of the security problems: not only technical, but also organisational and societal ones.
See more information on http://www.securit-brussels.be/project/secloud/
The project is comprised of four perspectives: architecture, infrastructure, programming and process.
The ‘process’ perspective of SeCloud aims at assisting overcoming legal barriers when developing cloud-
based applications.
220 Data security in cloud computing
index of key obligations and responsibilities for cloud service providers and cloud
clients, but also for further research purposes (i.e. comparative analysis with other
legal frameworks).
Keywords
cloud computing, personal data, General Data Protection Regulation, responsibilities,
legal requirements, data transfers
The Cloud makes it possible to move, share and re-use data seamlessly
across global markets and borders, and among institutions and research
disciplines [2].
ENISA, the European Union Agency for Network and Information Security,
highlighted scalability, elasticity, high performance, resilience and security together
with cost-efficiency as benefits of cloud computing for public authorities [3]. The
possibilities of opening-up with cloud computing enable numerous applications such
as in mobile health, Internet of Things and online commerce, but at the same time
pose significant risks to the protection of personal data of the individuals, the data
of which are stored in the cloud. Data security risks include loss of governance,
isolation failure, insecure or incomplete data deletion [4]. In addition, the exercise
of the individuals’ rights to access, modify or erase personal data concerning them
is more challenging to be exercised in the cloud context. In the cloud model [1],
users’ data are transferred to cloud-based applications and are stored either by these
applications or by third parties (cloud services), along with data of other users. Cloud
applications and the entities related to the applications, often collecting and processing
personal data, have responsibilities and legal obligations to protect and process such
data in compliance with the law.
The EU has thorough legislation for the protection of personal data. The new Gen-
eral Data Protection Regulation (GDPR) [5], which was recently finalised, introduces
detailed provisions for the protection of personal data. In addition, EU data protec-
tion legislation has what is often called an extra-territorial effect: under conditions,
which are explained in the next section; it is also applicable outside of the EU ‘terri-
tory’ (jurisdiction). Therefore, the EU legislation is relevant not only to organisations
established in the EU, but also to organisations established outside the EU.
Cloud computing and personal data processing 221
Cloud computing entails data transfers from one data centre to another. From a legal
point of view, the location of the data centre, but also the cloud client and the CSP,
are in principle crucial to determine the applicable law. In general, data flows raise
several questions as to the governing law, applicable to the data processing operations.
Within the EU, there is a new personal data protection legal framework, the GDPR,
which will be directly3 applicable to all EU Member States from 2018 onwards.
Even beyond the CSPs established in the EU Member States, the GDPR is relevant
to cloud clients and CSP that might not be established in the EU. As De Hert and
Czerniawski explain, the GDPR introduces new factors, beyond ‘territory’, to assess
whether an EU Member State has jurisdiction over a data protection case [6]. Article
3 of the GDPR introduces new criteria for deciding GDPR applicability: ‘offering
goods or services to’ or ‘monitoring the behaviour of ’ data subjects in the EU by
a controller or processor not established in the EU [7]. The wording in the Union
2
Cloud actors in the context of this article are the cloud service provider and the cloud client.
3
It is important to explain the difference between an EU Directive and an EU Regulation. An EU Directive
is a legislative act that sets out a goal that all EU Member States must achieve. However, it is up to the
individual Member State to devise their own laws on how to reach these goals. Therefore, it requires
implementation by the Member States, as it is only a framework law. An EU Regulation, on the other hand,
is a binding legislative act, directly applicable in its entirety across the EU. No national implementation is
required. As it is immediately applicable and enforceable by law in all Member States, it offers higher levels
of law harmonisation across the EU. http://europa.eu/eu-law/decision-making/legal-acts/index_en.htm and
http://ec.europa.eu/legislation/index_en.htm.
222 Data security in cloud computing
implies physical presence of the individual whose personal data are processed (data
subject) in the EU, but not necessarily residence.
Article 3 GDPR provides:
Basically, the GDPR applies to (1) EU companies that process personal data,
regardless of whether the processing takes place in the EU or not and (2) non-EU
companies which process personal data of data subjects who are in the EU, and these
processing activities are related to (i) the offering of goods or services to EU data
subjects or (ii) monitoring individuals’ behaviours that occur within the EU.
To provide an example: when a US-established cloud provider offers targeted
services to individuals in the EU, irrespectively of whether the individuals’are citizens
of an EU Member State or US citizens residing in France, the legal relationship of the
US cloud provider with the US citizen is governed by the EU GDPR regarding the data
processing activities concerning those individuals’. This is because the EU legislation
does not depend on its protection on whether the individual is an EU citizen, but the
protection is offered to any individual, given that the other conditions of Article 3
GDPR are met.
This extra-territorial effect of the EU data protection legislation (GDPR) has been
discussed and criticised [8], among others as facilitating legal uncertainty for a CSP.
A CSP might be subject to different legislations and thus be caught in a network of
different and sometimes conflicting legal rules [6]. Despite the discussion revolving
around the effect and the criticism, cloud computing actors as analysed below need to
be aware of the broadened applicability of the EU data protection framework. If their
processing activities fall in the scope of the GDPR, they shall comply with their legal
obligations.
4
Directive 2002/58/EC applies to providers of electronic communication services made available to the
public and requires them to ensure compliance with obligations relating to the secrecy of communica-
tions and personal data protection, as well as rights and obligations regarding electronic communications
networks and services. In cases where cloud computing providers act as providers of a publicly available
electronic communication service, they will be subject to this regulation. The Directive is currently under
legislative reform.
5
Directive 95/46/EC needed to be implemented at a national level, requiring transposition into national
law by the national legislature of each member state. The General Data Protection Regulation is directly
applicable in all Member States. It applies automatically in each Member State and does not require any
national implementation by the Member States.
6
Article 99 GDPR.
224 Data security in cloud computing
obligations for data controllers, such as ‘privacy by design’ and ‘privacy by default’,
accountability, data protection impact assessments (DPIAs), personal data breach
notifications, as well as the right to be forgotten and the right to data portability. The
technologically neutral approach of EU data protection rules in Directive 95/46/EC
is maintained. The GDPR embraces the technological developments but does not
focus on any specific technologies. Therefore, it also applies to cloud computing
services [12].
The EU data protection legislation uses two key terms to signify the persons respon-
sible to comply with the legal obligations: data controller and data processor. Article
4(7) GDPR defines a controller as ‘the natural or legal person, public authority,
agency or any other body that alone or jointly with others determines the purposes
and means of the processing of personal data’. Article 4(8) GDPR defines a data
processor as the ‘natural or legal person, public authority, agency or any other body
that alone or jointly with others, processes personal data on behalf of the controller’.
One of the most important and challenging aspects of the data protection legal
framework in the cloud computing context is the applicability of the notions of
‘controller’ and ‘processor’. The allocation of these roles is the crucial factor that
decides the responsibility for compliance with data protection rules. Cloud computing
involves a range of various and different actors. To establish the specific obligations,
duties and responsibilities of each actor according to the data protection legal frame-
work, it is, first, necessary to define, refine and assess the role of each of these
actors involved. Article 29 Working Party (WP29), an advisory body comprised of
EU national data Protection authorities,7 in its Opinion 1/2010 on the concepts of
‘controller’ and ‘processor’ has stressed out that:
the first and foremost role of the concept of controller is to determine who
shall be responsible for compliance with data protection rules, and how
data subjects8 can exercise the rights in practice. In other words: to allocate
responsibility [13].
Therefore, two general key criteria can be extracted from the WP29 opinion on
how to determine who the controller is and who the processor in each case: allocation
of responsibility and responsibility for compliance.
7
Under the GDPR, Article 29 Working Party will be replaced by the European Data Protection Board
(EDPB).
8
The data subject is the person whose personal data are collected, held or processed. (https://secure.edps.
europa.eu/EDPSWEB/edps/site/mySite/pid/74#data_subject.)
Cloud computing and personal data processing 225
the cloud client/data controller may not be the only entity that can solely
determine the ‘purposes and means of the processing’. More and more often,
the determination of the essential elements of the means, which is a prerog-
ative of the data controller, is not in the hands of the cloud client. In this
respect, the cloud provider, who happens to have the technical background,
typically designs, operates and maintains the cloud computing IT infras-
tructure (be it simply the basic hardware and software services as in IaaS,
9
Per ICO (Information Commissioner’s Office, the UK Data Protection Authority) identifying the data
controller in a private cloud should be quite straightforward because the cloud customer will exercise
control over the purpose for which the personal data will be processed within the cloud service. If a
cloud provider simply maintains any underlying infrastructure, then it is likely to be a data processor. In
a community cloud, more than one data controller is likely to access the cloud service. They could act
independently of each other or could work together. If one of the data controllers also acts as a cloud
provider, it will also assume the role of a data processor in respect of the other data controllers that use
the infrastructure. When using a public cloud, a cloud client may find it difficult to exercise control over
the operation of a large cloud provider. However, when an organisation contracts for cloud computing
services based on the cloud provider’s standard terms and conditions, this organisation is still responsible
for determining the purposes and the means of the personal data processing. (pp. 7–9).
226 Data security in cloud computing
Furthermore, it is often the CSP the one who develops standard contracts or Ser-
vice Level Agreements (SLAs) to be offered to the cloud client based on its technical
infrastructure and business type. The cloud client has, therefore, no or very little
leeway to modify the technical or contractual means of the service [10].
Data processors need to consider the deployment model/service model of the
cloud in question (public, private, community or hybrid/IaaS, SaaS or PaaS) and
the type of service contracted by the client. Processors are responsible for adopting
security measures in line with those in EU legislation as applied in the controller’s and
the processor’s jurisdictions. Processors must also support and assist the controller
in complying with (exercised) data subjects’ rights [16]. The GDPR, in that respect,
introduces a broader perspective compared to the Directive 95/46/EC provisions with
regard to the role of the processors in personal data processing, who, under the
Directive 95/46/EC, are only indirectly liable. Under the new regime, certain EU data
protection law requirements, for instance on accountability, data security and data
breach notifications, will apply for the first time directly to data processors. Apart
from these requirements, Article 82 GDPR specifically holds processors directly liable
(par.1) and establishes the conditions under which individuals may claim damages
from them (par.2): in cases where a processor has not complied with GDPR obligations
addressed specifically to processors or in cases it has acted contrary to or outside of
the lawful instructions of the data controller [17]. Article 28(3) GDPR provides for
detailed rules on the terms of the contract which appoints processors.
10.4.2 Sub-contractors
Cloud computing services may involve many contracted parties who act as data pro-
cessors. Data processors may also sub-contract services to additional sub-processors.
In that case, the data processor has an obligation to provide the cloud client with any
relevant information regarding the service sub-contracted, e.g. its type, the character-
istics of current or potential sub-contractors and guarantees that these entities act/will
act in compliance with the GDPR. To ensure the allocation of clear responsibilities
for data processing activities, all the relevant obligations must also apply to the sub-
processors through contracts between the cloud provider and sub-contractor, describ-
ing the provisions, terms and requirements of the contract between the cloud client and
the cloud provider [14]. These conditions are also mentioned in Article 28(4) GDPR,
which stresses that in case of sub-contracting for specific processing activities, the
same data protection obligations as set out in the contract or another legal act between
the controller and the processor shall be imposed on the sub-processor. These obliga-
tions should provide sufficient guarantees, specifically around data security. In case
of sub-processor’s non-compliance with the imposed obligations, the initial processor
remains fully liable to the controller for the performance of these obligations.
The data processor may sub-contract its activities on the basis of the agreement
of the data controller (or else ‘consent’ if a natural person). This consent must be
given at the beginning of the service and must be accompanied with an obligation on
Cloud computing and personal data processing 227
behalf of the data processor to inform the data controller on any intended changes
concerning the addition or replacement of sub-contractors. The controller reserves the
right to object to such changes or terminate the contract at any time [14]. This opinion
is enshrined in Article 28(2) GDPR, which basically introduces sub-contracting only
upon controller’s consent [17].
Article 29 Working Party proposes that there should be guarantees for direct
liability of the processor towards the controller for any breaches caused by the sub-
processor(s). Another proposal is the creation of a third-party beneficiary right in
favour of the controller in the contracts signed between the processor and sub-
processor(s) or the signature of these contracts on behalf of the data controller,
rendering him a party to the contract [14]. Even though, the Article 29 Working Party
opinions are not mandatory, cloud actors should consider the ‘soft law’10 interpretation
of the legislation by the EU regulators, in this case the data protection authorities.
The following table summarises the terminology and basic roles described above:
Table 10.1 Basic roles of cloud actors from an EU data protection perspective
Cloud client Determines the purpose Data controller or joint A data controller,
of the processing and controller ‘alone or jointly with
decides on the others, determines the
delegation of all or purposes and means
part of the processing of the processing of
activities to an external personal data’.
organisation Art. 4(7) GDPR
Cloud service Provides the various When the cloud A data processor, AQ1
provider types of cloud provider supplies the ‘processes personal
computing services means and the data on behalf of the
platform, acting on controller’. Art. 4(8)
behalf of the cloud GDPR
client (data controller),
the cloud provider
is a data processor.
Also possible joint
controller
Sub- If the cloud client (data From a data protection –
contractors controller) consents, perspective, the
the cloud providers sub-contractors are
(data processors) data processors
may also sub-contract
services to additional
sub-processors
(sub-contractors)
10
‘Soft law’ could be defined as rules that are neither legally binding nor completely lacking legal signif-
icance. These rules, which are not directly enforceable, include guidelines, policy declarations, codes
of conduct, etc. ‘Soft law’ is often contrasted with ‘hard law’, which refers to traditional, directly
enforceable, law.
228 Data security in cloud computing
10.5.1.1 Transparency
The principle of transparency is crucial for fair and legitimate processing of personal
data in cloud computing. In accordance with Articles 5(1)(a) and 12 GDPR, the cloud
client, when acting as data controller, is obliged to provide the data subject, whose
personal data or data related to the data subject are collected, with the controller’s
identity and the purpose of the processing. Further relevant information shall also be
provided, such as the recipients or categories of recipients of the data, which can also
include processors and sub-processors, to the extent that such further information is
necessary to guarantee fair processing towards the data subject [18]. Transparency in
the above sense is also enshrined in Recitals 39 and 58 GDPR.
Transparency must also be guaranteed in the relationship between cloud client,
CSP and sub-contractors. The cloud client can assess the lawfulness of the processing
of personal data in the cloud only if the CSP keeps the cloud client up-to-date about
incidents that might occur, access requests and other issues.
Another aspect of transparency in the cloud computing context is the necessary
information the cloud client, when acting as data controller, must obtain, regarding
all the sub-contractors involved in the respective cloud service and the locations of all
data centres in which personal data may be processed. This is crucial, because only
under this condition, an assessment can be made on whether personal data may be
transferred to a third country outside of the European Economic Area which does not
ensure an adequate level of protection within the meaning of the GDPR [14].
process personal data for other purposes, incompatible or conflicting with the origi-
nal purposes. Furthermore, the CSP, when acting as a data processor, is not allowed
use personal data for its own purposes [15,18,19]. If he does, then the CSP is a data
controller, and subsequently, all the relevant obligations for data controller apply to
this CSP.
Moreover, further processing, inconsistent with the original purpose(s), is also
prohibited for the cloud provider or one of its sub-contractors. In a typical cloud
computing scenario, a larger number of sub-contractors may be involved; therefore,
the risk of further personal data processing for incompatible purposes may be quite
high. In order to mitigate the risk of further processing, the contract between CSP and
cloud client should entail technical and organisational measures and provide guaran-
tees for the logging and auditing of relevant processing operations on personal data
that are performed by the cloud provider or the sub-contractors, either automatically
or manually (e.g. by employees).
as data protection by design and by default, data security breach notifications and
Data Protection Impact Assessments. Apart from the general improvement of the
data subject’s protection in relation to the Data Protection Directive, these enhanced
responsibilities of the data controller are also considered a major improvement for
the protection of personal data in the cloud computing environment [21].
However, the EDPS11 observes that some of the new obligations may be difficult
to comply with if the data controller is considered to be the cloud client. As mentioned
earlier, in many cases, the cloud client may be considered the data controller, due the
capacity of the cloud client to determine the means and the purposes of processing.
However, in practice, it is not always easy to match the concepts of the controller one-
to-one with the concept of the cloud client. The EDPS identifies mainly the following
of the data controller obligations, as being difficult to be carried out by the cloud
client: that is the implementation of policies to ensure that personal data processing
is compliant with the GDPR, data security requirements, DPIA, data protection by
design, data breach notifications. This is mainly because in practice such measures are
usually implemented by the CSP. The cloud client, especially in business-to-consumer
relationships, has little or no control over issues such as the security policies of the
CSP. It is therefore paradoxical to assign responsibility to the cloud client for such
issues [10].
The processor, on the basis of Article 28 GDPR, is required to co-operate with
the controller in order to fulfil the latter’s obligation to respond to data subjects’
rights and assist the data controller in ensuring compliance with the security require-
ments, data breach notifications, DPIA and prior consultation. However, the ultimate
responsibility lies on the controller [10].
11
The European Data Protection Supervisor (EDPS) is an independent supervisory authority established in
accordance with Regulation 45/2001, on the basis of article 286 of the EC Treaty. The EDPS’ mission is to
ensure that the fundamental rights and freedoms of individuals – in particular their privacy – are respected
when the EU institutions and bodies process personal data.
12
Article 28(1) GDPR.
Cloud computing and personal data processing 231
controller and the processor need to consider when selecting technical and organi-
sational measures. Such criteria include the state of the art, the scope, context and
purposes of processing, but also the costs of implementation.
Under the GDPR, both the controller and the processor are obliged to assess
the risks identified through the processing and the nature of the data processed and
designate their measures accordingly. Regarding technical and organisational mea-
sures in cloud computing, the EDPS highlights that all parties involved, controllers
and processors, should perform risk assessments for the processing under their con-
trol, due to the complexity of cloud computing. Comprehensive risk assessment and
security management in a cloud environment requires co-operation and co-ordination
between the different parties involved, as the overall level of security is determined
by the weakest link. In a cloud environment used by multiple clients, security failures
of one client could affect the security of other clients, unless the service has provided
robust and secure measures to separate services and data between clients and make
mutual interference impossible [12]. Informing cloud users on the risk assessment and
security measures of the cloud provider and better understand their effectiveness and
limitations would enable cloud users, accordingly, to also take necessary measures,
as the EDPS further observes [10].
Compliance with security obligations may be achieved for data controllers when
they have extensive and accurate information allowing them to make the assessment
that the CSP fully complies with its security obligations as processor or controller. The
introduction of data breach notification in the GDPR (Articles 33 and 34) imposes
the obligation on data controllers to inform data protection authorities and, when
the breach is likely to result in high risk to the rights and freedoms of individ-
uals, data subjects about personal data breaches. CSPs, therefore, would have to
report any personal data breaches that occur in their services, either directly (to
the supervisory authorities and the individuals), in case they act as controllers,
or indirectly (to the cloud client who is the data controller) if they act only as
processors [12].
10.5.2.1 Availability
Providing availability means ensuring timely and reliable access to personal data.
Availability risks include infrastructure problems, such as, among others, accidental
or malicious loss of network connectivity and accidental hardware failures (either on
the network or the storage/processing systems) [14]. It is the responsibility of the data
controllers, check whether the data processor has adopted reasonable measures to
cope with the risk of interferences, such as backup internet network links, redundant
storage and effective data backup mechanisms [14].
10.5.2.2 Integrity
Data integrity relates to the maintenance and the guarantee of the authenticity,
accuracy and consistency of data over their entire life cycle. Data integrity can be
compromised during processing, storage or transmission of the data. In these cases,
the integrity of the data may be maintained, when the data are not maliciously or acci-
dentally altered. The notion of integrity can be extended to IT systems, requires that
232 Data security in cloud computing
the processing of personal data on these systems remains unmodified [14]. Personal
data modifications can be detected by cryptographic authentication mechanisms such
as message authentication codes, signatures or cryptographic hash functions (which
do not require secret keys, unlike message authentication codes and signatures). Inter-
ference with the integrity of IT systems in the cloud can be prevented or detected by
means of intrusion detection/prevention systems (IPS/IDS) [14].
10.5.2.3 Confidentiality
The cloud client, when acting as a data controller, must guarantee that the personal
data under its responsibility can only be accessed by authorised persons. In a cloud
environment, encryption may significantly contribute to the confidentiality of per-
sonal data, if applied correctly [15]. Encrypting personal data does not mean that
the individual is no more identifiable. As the technical data fragmentation processes
that may be used in the framework of the provision of cloud computing services,
such as encryption, do not render data irreversibly anonymous, data protection obli-
gations still apply [20]. Encryption of personal data may be used in all cases when
‘in transit’ and when available to data ‘at rest’ [15]. This should be particularly the
case for data controllers who intend to transfer sensitive data to the cloud or who are
subject to specific legal obligations of professional secrecy. In some cases, (e.g. an
IaaS storage service) a cloud client may not rely on an encryption solution offered
by the CSP but may choose to encrypt personal data prior to sending them to the
cloud. Encrypting data at rest requires attention to cryptographic key management
as data security then ultimately depends on the confidentiality of the encryption
keys [19].
Communications between CSP and cloud client as well as between data centres
should also be encrypted. Remote administration of the cloud platform should only
take place via a secure communication channel. If a cloud client plans to not only
store but also to further process personal data in the cloud, he must bear in mind that
encryption cannot be maintained during processing of the data [19].When encryption
is chosen as a technical measure to secure data, security of the encryption key is crucial,
through a robust key management arrangement. It is also important to note that the loss
of an encryption key could render the data useless. This could amount to the accidental
destruction of personal data and therefore a breach of the security and confidentiality
principle [15]. Further technical measures aiming at ensuring confidentiality include
authorisation mechanisms and strong authentication (e.g. two-factor authentication).
Contractual clauses may also impose confidentiality obligations on employees of
cloud clients, CSPs and sub-contractors [14].
10.5.2.4 Isolation
Isolation is linked to the purpose limitation principle. Even though not a legal term,
it can be generally said that in technical terms, isolation of the data serves the data
protection principle of purpose limitation. In cloud infrastructures, resources such
as storage, memory and networks are shared among many users. This creates new
risks for data and renders the possibility of disclosure and further processing for
illegitimate purposes quite high. Isolation as a protective goal, therefore, is meant
Cloud computing and personal data processing 233
to address this issue and ensure that data are not used beyond their initial original
purpose and maintain confidentiality and integrity [14].
Isolation may be achieved first by adequate governance and regular review of the
rights and roles for accessing personal data. The implementation of roles with exces-
sive privileges should be avoided (e.g. no user or administrator should be authorised
to access the entire cloud). More generally, administrators and users must only be
able to access the information that is necessary for their legitimate purposes (least
privilege principle) [14].
10.5.2.5 Intervenability
The data subjects have the rights of access, rectification, erasure, restriction and
objection to the data processing [22]. The cloud client must verify that the cloud
provider does not impose technical and organisational obstacles to the exercise of
those rights, even in cases when data are further processed by sub-contractors. The
contract between the client and the CSP should demand that the cloud provider is
obliged to support the client in facilitating exercise of data subjects’ rights and to
ensure that the same is safeguarded for its relation to any sub-contractor [14].
10.5.2.6 Portability
The use of standard data formats and service interfaces by the cloud providers is very
important, as it facilitates inter-operability and portability between different cloud
providers. Therefore, if a cloud client decides to move to another cloud provider,
any lack of inter-operability may result in the impossibility or at least difficulties
to transfer the client’s (personal) data to the new cloud provider (‘vendor lock-in’).
The same problem also appears for services that the client developed on a platform
offered by the original cloud provider (PaaS). The cloud client should check whether
and how the provider guarantees the portability of data and services prior to ordering
a cloud service. Preferably, the provider should make use of standardised or open data
formats and interfaces. Agreement on contractual clauses stipulating assured formats,
preservation of logical relations and any costs accruing from the migration to another
cloud provider could be considered guarantees [14].
Data portability is defined in Article 20 and Recital 68 GDPR as an instrument
to further strengthen the control of the data subject over its own data: the data sub-
ject should be allowed to receive personal data concerning the data subject which it
provided to a controller in a structured, commonly used, machine-readable and inter-
operable format, and to transmit it to another controller. To implement this right,
it is important that, once the data have been transferred, no trace is left in the original
system. In technical terms, it should be possible to verify the secure erasure of data
[10,12].
10.5.2.7 IT accountability
In terms of data protection, IT accountability has a broad scope: it refers to the ability
of the parties involved in personal data processing to provide evidence that they took
appropriate measures to ensure compliance with the data protection principles [14].
234 Data security in cloud computing
13
The list of the processing operations where data protection impact assessments should be mandatory is
non-exhaustive.
Cloud computing and personal data processing 235
by independent internal or external auditors. Article 28, accordingly, specifies the data
protection measures that the processors must take. Besides these provisions, cloud-
computing-specific codes of conduct drawn up by the industry and approved by the
relevant data protection authorities could also be a useful tool to improve compliance
and trust among the various actors. The codes of conduct model are present both in
Directive 95/46/EC and the GDPR (Articles 27 and 40, respectively, Recitals 77 and
98 GDPR are also relevant) [10].
14
Under the current legal framework, the Commission has adopted several adequacy decisions with respect
to Andorra, Argentina, Australia, Canada, Switzerland, Faeroe Islands, Guernsey, State of Israel, Isle of
Man, Jersey, US PNR and the US Safe Harbor. Such decisions will remain in force until amended, replaced
or repealed by a Commission Decision (article 45(9) GDPR).
15
Article 46 GDPR.
236 Data security in cloud computing
16
More specifically, these exemptions, which are tightly drawn, for the most part concern cases where
the risks to the data subject are relatively small or where other interests (public interests or those of the
data subject himself) override the data subject’s right to privacy. As exemptions from a general principle,
they must be interpreted restrictively. Furthermore Member States may provide in domestic law for the
exemptions not to apply in particular cases. This might be the case, for example, where it is necessary to
protect particularly vulnerable groups of individuals, such as workers or patients. [10].
Cloud computing and personal data processing 237
the CSPs, they should all contain minimum guarantees on essential aspects. These
guarantees might include the requirement to enter into written agreement with sub-
processors, by which they commit to the same data protection obligations, prior
information/notices of the cloud customer on the use of sub-processors, audit clause,
third-party beneficiary rights, rules on liability and damages, supervision, etc. [10].
In addition to the standard contractual clauses (SCCs), cloud providers could offer
customers provisions that build on their pragmatic experiences as long as they do not
contradict, directly or indirectly, the SCCs approved by the Commission or prejudice
fundamental rights or freedoms of the data subjects [31]. Nevertheless, the companies
may not amend or change the SCCs without implying that the clauses will no longer
be ‘standard’ [31].
Another ground is binding corporate rules (BCRs). BCRs are personal data
protection policies adhered to by the controller or processor established in an EU
Member State. These policies facilitate transfers of personal data to a controller or
processor in one or more third countries within a group of undertakings or a group of
enterprises engaged in a joint economic activity (Article 4(20) GDPR). Thus, BCRs
refer to data transfers within the same company, for instance the CSP establishments
in several countries or companies with a joint economic activity. Article 29 Working
Party has developed (based on the Directive 95/46/EC) BCRs for processors which
will allow the transfer within the group for the benefit of the controllers without
requiring the signature of contracts between processor and sub-processors per client
[32]. Such BCRs for processors would enable the provider’s client to entrust their
personal data to the processor while being assured that the data transferred within
the provider’s business scope would receive an adequate level of protection [14]. The
GDPR includes (Article 47 GDPR) detailed minimum content requirements for the
BCRs to be a valid ground for data transfers, such as: the legally binding nature,
complaint procedures, personnel training, tasks of the data protection officer and
others.
Another ground for data transfers likely to be widely used in the case of cloud
computing is the soft law grounds, i.e. approved codes of conduct and approved data
protection certifications. Adherence to approved codes of conduct and certifications
based on the conditions of the GDPR may bring several benefits to data controllers and
processors in terms of accountability, administrative fines imposed by the data pro-
tection authorities (Article 83 GDPR) and data transfers. The process of conforming
to such soft law instruments might entail intensive third-party audits (for instance
from accredited certification bodies indicated in Article 43 GDPR) and follow-up
activities to maintain the certification and the seal. The cloud clients and service
providers, however, might consider choosing this option, as it is more flexible than
standard data protection clauses and BCRs. Transfers by means of codes of conduct
or certification mechanisms need to be accompanied with binding and enforceable
commitments on behalf of the controller or processor in the third country, that they
shall apply the appropriate safeguards, including those concerning data subject rights.
Data transfers in the absence of an adequacy decision can also be based on
contractual clauses between controller or processor and the controller, processor or
recipient of the personal data in the third country or organisation, after authorisation
238 Data security in cloud computing
of the competent supervisory authority (Article 46 (3) GDPR). This ground can also
be relevant for cloud actors.
10.7 Conclusions
The use of cloud computing models facilitates and accelerates the creation and pro-
cessing of big data collections and the production of new services and applications.
When these big data collections contain personal data, specific risks and challenges
for privacy and data protection arise, and appropriate safeguards are imperative to be
implemented.
Privacy and data protection in the context of cloud computing must not, in any
case, be inferior to the level of protection required in any other data processing context.
The cloud computing model can only be developed and applied legally if the data
protection standards are not lowered compared to those applicable in conventional
data processing operations.
The extra-territorial effect of the new EU GDPR induces not only broader legal
rules but also new challenges. Not only EU, but also non-EU cloud computing service
providers are highly affected and need to be aware of the updated EU data protection
framework to process personal data in compliance with their legal obligations. Cloud
providers, not established in the EU, that offer targeted services to individuals in the
EU, irrespectively of whether these individuals are citizens of an EU Member State,
are governed by the GDPR regarding the data processing activities concerning those
individuals.
The cloud client-cloud provider relationship is often a data controller–processor
relationship. Exceptionally, the cloud provider may act as a controller and, therefore,
have full responsibility for compliance with all the legal obligations deriving from
EU data protection law, or a joint controller together with the cloud client. The cloud
client, when acting as a data controller, is also responsible for selecting a CSP that
guarantees compliance with EU data protection legislation. The GDPR renders both
data controllers and data processors directly liable for certain EU data protection law
requirements, such as accountability, data security and data breach notifications.
As far as contracts between CSPs and cloud clients are concerned, sub-processors
may only be delegated if the controller consents to such activity. Furthermore, CSPs’
contracts with sub-contractors should stipulate the provisions of the original contract
with the cloud client.
The GDPR requirements in the context of cloud computing may be divided into
two broad categories: (1) compliance with the fundamental data processing principles
and (2) technical and organisational measures implementation.
Transparency entails the detailed provision of information on behalf of the cloud
providers to the cloud clients about all data protection relevant aspects of their services
and, accordingly, the relevant notifications to the data subjects by the cloud clients.
The purpose specification principle guarantees that the cloud client does not process
personal data for further purposes, other than the original(s), by the cloud provider
or any sub-contractors. Furthermore, once personal data are no longer necessary
Cloud computing and personal data processing 239
for the specific purposes they were collected and processed, the cloud client must
ensure that they are erased from wherever they are stored. Finally, the cloud client
and the CSP must be able to ensure and demonstrate accountability through the
adoption and implementation of appropriate data protection policies and technical and
organisational measures that their processing activities comply with the requirements
of the EU Data Protection Law.
Regarding technical and organisational measures, such measures should be guar-
anteed and included in the contract between the cloud client and the cloud provider
and be reflected in the provider and sub-contractors’ relationship. Technical measures
must ensure availability, integrity and confidentiality of the data. Isolation is a pro-
tective goal, which is meant to address the risk that data is used beyond its initial
original purpose and to maintain confidentiality and integrity.
Moreover, the data subjects have the rights to access, rectification, erasure, block-
ing and objection. The exercise of the data subject rights should be assisted by the cloud
service provider. Interoperability and data portability are facilitated using standard
data formats and service interfaces by the cloud providers.
In case of cross-border transfers, the EU data protection legislation, both the
Directive 95/46/EC and the new GDPR, includes safeguards and provisions that enable
such transfers. A cloud client or cloud provider acting as data controller needs to pre-
serve the necessary safeguards for the protection of personal data in the framework of
the general accountability principle, but also make use of the means established in the
legislation for legitimate data transfers. The GDPR requires that not only controllers
but also processors implement appropriate safeguards for international data transfers.
In the absence of an adequacy decision, transfers of data to non-adequate third coun-
tries require specific safeguards via the use of special arrangements (e.g. the EU-US
Privacy Shield), Standard Contractual Clauses or Binding Corporate Rules.
References
[1] Mell P., Grance T. ‘The NIST definition of cloud computing’. Communications
of the ACM. 2010;53(6):50.
[2] European Commission. European Cloud Initiative – Building a Competi-
tive Data and Knowledge Economy in Europe. Communication from the
Commission to the European Parliament, the Council, the European
Economic and Social Committee and the Committee of the Regions,
COM (2016) 178 final, 2016. Available from: https://ec.europa.eu/digital-
single-market / en / news/ communication-european-cloud-initiative-building-
competitive-data-and-knowledge-economy-europe.
[3] ENISA. Security Resilience in Governmental Clouds. 2011. Avail-
able from: https://www.enisa.europa.eu/publications/security-and-resilience-
in-governmental-clouds.
[4] ENISA. Cloud Computing: Benefits, Risks and Recommendations for Infor-
mation Security. 2009. Available from: http://www.enisa.europa.eu/act/rm/
files/deliverables/cloud-computing-risk-assessment/at_download/fullReport.
240 Data security in cloud computing
[5] European Parliament and Council of the European Union. Regulation 2016/679
of the European Parliament and of the Council of 27 April 2016 on the protec-
tion of natural persons with regard to the processing of personal data and on
the free movement of such data, and repealing Directive 95/46/EC (General
Data Protection Regulation), L 119/14.5.2016.
[6] De Hert, P., Czerniawski M. ‘Expanding the European data protection scope
beyond territory: Article 3 of the General Data Protection Regulation in
its wider context’. International Data Privacy Law. 2016; 6(3):230–243.
doi:10.1093/idpl/ipw008.
[7] Hon W.K., Hörnle J., Millard C. ‘Data protection jurisdiction and cloud com-
puting – When are cloud users and providers subject to EU data protection law.
The cloud of unknowing’. International Review of Law, Computers & Technol-
ogy. 2012; 26(2–3):129–164, p. 33. Available from: http://www.tandfonline.
com/doi/abs/10.1080/13600869.2012.698843.
[8] Svantesson D.J.B. ‘Extraterritoriality and targeting in EU data privacy law:
The weak spot undermining the regulation’. International Data privacy Law.
2015;5:230–33.
[9] European Parliament and Council. Directive 95/46/EC of the European Parlia-
ment and of the Council of 24 October 1995 on the protection of individuals
with regard to the processing of personal data and on the free movement of
such data (Data Protection Directive). OJ L 281, 23.11.1995.
[10] European Data Protection Supervisor. ‘Opinion of 16 November 2012 on
the Commission’s Communication on “Unleashing the potential of Cloud
Computing in Europe”’. 2012. Available from: https://secure.edps.europa.eu/
EDPSWEB/webdav/shared/Documents/Consultation / Opinions / 2012 / 12-11-
16_Cloud_Computing_EN.pdf, p. 14.
[11] European Parliament and the Council. Directive 2002/58/EC of the European
Parliament and of the Council of 12 July 2002 concerning the processing
of personal data and the protection of privacy in the electronic communi-
cations sector (Directive on privacy and electronic communications), OJ L
201, 31.07.2002 p. 37, as amended by European Parliament and the Council,
Directive 2009/136/EC of the European Parliament and of the Council of 25
November 2009, OJ L 337, 18.12.2009.
[12] Hon W.K., Kosta E., Millard C., Stefanatou D. ‘Cloud accountability: The
likely impact of the proposed EU data protection regulation’. Queen Mary
School of Law Legal Studies Research Paper. 2014(172). Available from:
http://papers.ssrn.com/sol3/Papers.cfm?abstract_id=2405971.
[13] Article 29 Data Protection Working Party. ‘Opinion 1/2010 on the con-
cepts of “controller” and “processor” WP 169’. 2010. Available from:
http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-
recommendation/files/2010/wp169_en.pdf.
[14] Article 29 Data Protection Working Party. ‘Opinion 05/2012 on Cloud
Computing’ WP 196, 2012. Available from: http://ec.europa.eu/justice/data-
protection / article-29 / documentation / opinion-recommendation / files / 2012 /
wp196_en.pdf.
Cloud computing and personal data processing 241
Abstract
Data privacy is an expected right of most citizens around the world, but there are many
legislative challenges within the boundary-less cloud computing and World Wide Web
environments.
The Waikato Data Privacy Matrix outlines our global project for alignment of
data privacy laws, by focusing on Asia Pacific data privacy laws and its relationships
with the European Union and the United States. Some alignment already exists for
the European Union and United States, there is a lack of research on Asia Pacific
alignment within its region and across other regions. The Waikato Data Privacy Matrix
also suggests potential solutions to address some of the issues that may occur when a
breach of data privacy occurs, in order to ensure an individual has their data privacy
protected across the boundaries within the Web.
11.1 Introduction
Privacy of an individual is a widely discussed issue in the legal arena, but with the
introduction of cloud services, privacy concerns have also made their way into the
computing realm [1]. Laws made by governments can sometimes be confusing to an
everyday citizen. In recent years, legislation has been enacted to protect the privacy of
an individual or society, but this has come under fire [2]. These discussions have been
fuelled by the large amount of media coverage and publicity about leaks of personal
data, and breaches of data privacy, including the case of the 2013 National Security
Agency (NSA) leaks [3]. A result of this publicity has meant an increased awareness in
data privacy limitations and rights, which highlighted a need for clarification around
trans-national legislation and an effective way of aligning them with other countries
so an everyday user (e.g. consumer and small businesses) can understand any privacy
concerns that may relate to them or their data processed or stored by third parties.
1
Cyber Security Lab, University of Waikato, New Zealand
244 Data security in cloud computing
The emergence of the Internet of Things (IoT) [4] and the adoption of cloud
services [5] presents important research foci towards ensuring users and vendors
can put trust in these technologies and services by knowing the requirements of
different countries’ legislation. The amount of data and personal information stored
or transferred to servers across trans-national jurisdictions, in which devices reside,
creates a need for a better understanding of global data privacy legislation that may
create repercussions for their business or privacy.
There are many legislative challenges within boundary-less cloud computing and
World Wide Web environments. Despite its importance, the legal side of the security
ecosystem seems to be in a constant effort to catch-up. There are recent issues showing
a lack of alignment that caused some confusion. An example of this is the “right to be
forgotten” case in 2014 that involved a Spanish man and Google Spain. He requested
the removal of a link to an article about an auction of his foreclosed home, for a debt
that he had subsequently paid. However, misalignment of data privacy laws caused
further complications to the case.
The Waikato Data Privacy Matrix (WDPM) provides an easy way to cross ref-
erence different trans-national legislation that align with a set of predefined domain
areas. Assisting a user to see what laws are governing their data wherever in the world
that data may be located. The WDPM can also be utilised by governments, and in
particular the legislature, to see gaps which may appear in their own legislation and
allow them to propose changes to improve or align with the rest of the countries.
11.2 Background
11.2.1 Justification
Many users of cloud services do not have a legal background or legal understand-
ing. Cloud services have been incorporated into everyday life, and the geographical
boundaries which once contained a legal jurisdiction are now being blurred. Legisla-
tion is an important function in society, set down by the legislature, to govern what
are acceptable behaviours, and punishments if these behaviours are not followed.
Cloud users and vendors need to know what legislation will impact on them and
their data wherever it is in the world.
The Department of the Prime Minister and Cabinet (DPMC) in New Zealand
released the updated Cyber Security strategy, on 10 December 2015, replacing the
2011 version. The strategy outlines the government’s response to addressing the threat
of cybercrime to New Zealanders. Connect Smart conducted a survey in 2014 on cyber
security practises; 83% of those surveyed said they had experienced a data breach in
some way (22% saying they had email accounts compromised). The scary side to that
statistic is 61% of those did nothing to change their behaviour [6]. The new version
has four principles:
● partnerships are essential
● economic growth is enabled
● national security is upheld
● human rights are protected online
Waikato Data Privacy Matrix 245
Cyber resilience involves detection, protection and recovery from cyber incidents,
and looking to create action plans for disaster recovery from cyber incidents.
Cyber capability refers to educating the public and providing them with the
necessary tools they may need. It focuses on individuals, businesses, government
departments and organisations to build better cyber security capabilities and aware-
ness. The success of this goal will allow all levels of New Zealanders to have the
knowledge and tools available to protect themselves against a cyber threat. This prin-
ciple should also have the potential to increase the skills in the cyber security industry,
allowing businesses and organisations to have the technical staff to support the rest
of their information technology (IT) team.
Addressing cybercrime looks at prevention of cybercrime, but also has an extra
component, in the “National Plan to Address Cybercrime”, which identifies cyber-
crime issues and challenges and ways they can be addressed. Most of this is from
awareness so the public can help themselves.
International co-operation is the last goal which is vital to mitigating risk within
cybercrime. This looks at building international partnerships within the Asia Pacific
(APAC) region.
New Zealand is not the only country to release a new cyber security strategy [7].
Australia released their 4-year strategy in April 2016 which outlines five themes:
● a national cyber partnership
● strong cyber defences
● global responsibility and influence
● growth and innovation
● a cyber smart nation
TheAustralian and New Zealand strategies have similar goals in mind – ultimately
educating citizens, and providing tools and international co-operation.
Thousands of new businesses are started every year, some of these will not even
get off the ground and out of the ones that do, around 10% will fail within the first
year and around 70% will fail within 5 years [8]. One of the biggest points of failure
comes down to the business revenue. Even a company which is semi established that
needs to break out into an overseas market to get more customers, may not have
enough revenue to hire a legal team or even a single professional, to give them all
of the legal advice to successfully launch their business in an overseas jurisdiction.
Legal bills can be very expensive. Although legislation is freely available in most
countries around the world it may not be easy to navigate.
for the first quarter in 2016 [11]. This is the same as the result for this survey when
conducted for 2015 fourth quarter, with Microsoft, IBM and Google coming in under
Amazon [12].
In January 2016, RightScale – an organisation deploying and managing appli-
cations in the cloud – conducted its annual State of the Cloud Survey of the latest
cloud computing trends which focuses on cloud users and cloud buyers. There were
1,060 IT professionals who participated in the survey, and of these participants 95%
were using cloud services [13]. To utilise cloud computing, it is essential to have
multiple data centres located in different parts of the country or the world, to ensure
lower latency for the customers using the cloud service. Google has many data servers
scattered across the globe, but it is unclear the precise number of data centres that
Google operates [14]. Although this is good for users who have their data stored in
these places, it makes it difficult to know what laws apply to their data. Even if a user
has data stored in the United States (US) their data may be subjected to different state
laws depending on which part of the country it is stored in. What makes matters more
unclear is when a user has their data stored in multiple data centres in different parts
of the world. Internet addresses are not physical addresses, which allows them to be
easily spoofed, making it harder to locate where the data came from or showing the
data is residing in an entirely different country. There is a clear need for policy makers
to collaborate on these laws so there is a global alignment which does not produce
any surprises for users of these services.
metadata from US citizens, including sender, recipient, and time stamp of email
correspondences from Internet users [19].
Since the leaks, the InformationTechnology and Innovation Foundation (ITIF), an
industry-funded think tank that focuses on the intersection of technological innovation
and public policy, estimated the leaks could cost cloud computing companies up to
$35 billion in lost revenue [3].
The fallout from this exposure forced countries that were using data centres in the
US to open data centres in their own countries or look for other places to store data.
Russia received this news and passed a new law which required all tech companies
inside Russian borders to only use servers located within Russia. This is one way of
not having to worry about a global alignment, but it is an extremely high cost for the
companies to use backyard data centres [3]. It also forced users of cloud services to
look into where their data was going to be stored or if it would be moved from the US
centres to another part of the world where the laws were unknown to them.
11.2.4 PRISM
The PRISM program was launched in 2007 after the enactment of the Foreign Intelli-
gence Surveillance Act (FISA). PRISM was carried out by the NSA to collect stored
Internet communications and use data mining techniques to look for patterns of ter-
rorist or other potential criminal activity within the communications. There were at
least nine major US Internet companies participating in this program which included
Microsoft in 2007, Yahoo in 2008, Google, Facebook and Paltalk in 2009, YouTube
in 2010, AOL and Skype in 2011 and Apple in 2012 [20]. The basic idea behind the
program was for the NSA to have the ability to request data on specific persons of
interest. Permission is given by the FISC, a special federal court setup by the FISA.
There are still questions about the operation of the FISC and if its actions are in breach
of the US constitution.
October 2015 the Court of Justice of the European Union (CJEU) had declared the
Safe Harbor invalid, after the case of Max Schrems.
After the invalidation, a draft of the new EU–US Privacy Shield [22] emerged. The
draft Privacy Shield was announced in February 2016, and is an adaption of the Safe
Harbor Agreement. In a press release in February 2016 the European Commission
stated that the new Privacy Shield would “provide stronger obligations on companies
in the European Union (EU) to protect the personal data of Europeans and stronger
monitoring and enforcement by the US Department of Commerce and Federal Trade
Commission, including through increased co-operation with European Data Protec-
tion Authorities” [23]. Three new elements were included in the new Privacy Shield
framework:
The Privacy Shield was signed off on 8 July 2016 by the European Commission
and the Department of Commerce of the United States. The new and approved version
of the Privacy Shield contains numerous clarifications for the privacy principles.
The Privacy Shield was open to companies from 1 August 2016, so by August
2017 the questions around how legitimate this Privacy Shield will be, should be
answered. All going well, it should be able to restore and start to rebuild trust with
the citizens around the use, protection, and stewardship of data [24].
The new and approved version of the Privacy Shield contains numerous
clarifications for the privacy principles.
The first relates to data integrity and purpose limitation, which clarifies the
purpose of data usage and that it is reliable for its intended use; meaning it must be
up to date and complete.
The choice principle allows the data subject to opt-out if their data will be dis-
closed to a third party or used for a different purpose, and clarifies the use for direct
marketing.
The principle on accountability for onwards transfers clarifies the obligation to
all parties involved, of the processing of data being transferred to ensure the same
level of protection despite the location of that party.
The access principle is probably the most important principle in the Privacy
Shield. It allows a data subject to query an organisation if they are processing any
personal data related to them, which the organisation needs to respond to, in a rea-
sonable time. Although, the problem here is what constitutes reasonable. This is a
subjective interpretation of the word so this may cause some problems in the future.
It also allows for the data subject to correct, amend, or delete personal data that is
inaccurate or has been processed in violation of the principles. This aligns with the
EU directives and regulations.
The principle on Recourse, Enforcement and Liability clarifies how complaints
are handled, and sets out eight levels of redress that must be handled in a specific
Waikato Data Privacy Matrix 249
order, which would be used for EU citizens if their complaint is not resolved to their
satisfaction [25].
Since the Privacy Shield was built upon parts of the Safe Harbour Agreement,
companies still need to self-certify. It has the extra principle components, meaning
citizens from the EU are protected better than before, and there is more transparency
in this agreement.
Once in force, the GDPR will be legally binding on all member states of the
EU. This will also extend the scope to all organisations who may operate within
the EU or process data of EU citizens whether they are headquartered there or
not [30,31].
There has been much discussion around the effects the GDPR will have on the
data privacy landscape. The general consensus is that the GDPR will have a positive
effect. The new principles in the GDPR aim to give back the control to citizens over
their data. The GDPR will set the new standard for data privacy.
1
Regulation on the protection of natural persons with regard to the processing of personal data and on the
free movement of such data, and repealing Directive 95/46/EC.
250 Data security in cloud computing
2
Case C-362/14 Schrems v. Data Protection Commissioner [2015] ICLR.
3
Google Spain SL, Google Inc. v. Agencia Española de Protección de Datos (es), Mario Costeja Gonzålez
[2014] C-131/12.
Waikato Data Privacy Matrix 251
“While we believe the FBI’s intentions are good, it would be wrong for the
government to force us to build a backdoor into our products. And ultimately,
we fear that this demand would undermine the very freedoms and liberty our
government is meant to protect.”
Tim Cook, Chief Executive Officer of Apple
252 Data security in cloud computing
legislation in the countries specified; however, it will mostly give the main piece of
legislation relating to data privacy. The handbook then summarises the selected topic,
for example, if the user clicks on “Authority” it will give an overview of who the
authority is. In New Zealand’s case it just gives contact details for the Office of the
Privacy Commissioner.
A possible solution is the WDPM, which is a Rosetta Stone like matrix which helps
to align data privacy laws throughout Asia Pacific, the EU and the US. It does this
by having a set of seven predefined domains which include a control specification.
(The Rosetta Stone [51] was a stone uncovered in 1799 with writing inscribed on it in
two languages – Egyptian and Greek, these are done in three scripts – hieroglyphic,
demotic and Greek.) The first domain is “Legislative Framework” which includes
six “control specifications”. Next to each control specification it lists the name of
the documents relevant to that specification. The document name in the first domain
gives the user the full name of the document and a link they can click which will take
them to that document.
254 Data security in cloud computing
The WDPM directs a user to a specific section, article, schedule or part in the
applicable legislation. This reduces the user hunting through government or other
websites to find the relevant legislation they need and then directs them to the specific
part of that legislation where they can see what the law states. The WDPM allows a
user to see if there are any similar laws to do with that control within some of the
countries located in the Asia Pacific, the EU or the US.
One example of the WDPM is the control specification from the pre-collection
process domain. It directs the user to many different documents that relate to whether
consent is required from the individual involved in the collection. In New Zealand
there are three documents identified. The Privacy Act 1993 – which is the legis-
lation, section 6 which in the Act, is titled “Information Privacy Principles” and
then to principle 3. The use of the WDPM allows users to quickly and painlessly
find and identify the relevant information relating to consent. When a user looks at
the country in question they are also able see Australia, China and the UK which
helps the user to see immediately that there is some law around consent in these
countries.
The contributions from a variety of privacy experts have helped shape the WDPM,
as well as the advise of over 20 experts from academia and industry. This has helped
to provide some peer review of the information and directions within the WDPM.
The WDPM only focuses on general data privacy legislation at a federal level,
meaning only legislation which covered the country as a whole was looked at. Due to
the scope of the project only general data privacy was researched, this did not include
legislation relating to health and finance. There were only 12 countries included in
the WDPM as a start; the next step would be to add multiple other countries to create
a more global and comprehensive alignment tool. The delivery for the WDPM is also
an important step as the current form of the WDPM is a large Excel spreadsheet, but
a web application will need to be introduced to make the user experience even better.
To ensure the WDPM is a truly global tool, a wider range of countries will also
need to be added.
11.6.1 Vision
To help to complete the future work and promote the WDPM and data privacy, the
Data Privacy Foundation Inc. has been setup. The foundation has the following vision:
(a)To assist in achieving global alignment of data privacy laws by identifying gaps
and shortfalls in country and regional laws and legal systems, thereby ensuring
full legal protection of data.
(b) To establish the premier, knowledge based, definitive global authority on data
privacy.
(c) To provide knowledge, tools, training, consultancy and events to assure data
privacy across the globe.
(d) To establish, build, and sustain data privacy knowledge databases by
harnessing collaborative, open source, scalable contributions and technologies.
(e) To facilitate delivery of data privacy at a level not achievable or limited by
any one organisation or country.
The foundation will help to create a comprehensive and robust global alignment
tool for all types of data privacy legislation. There is a lot of work to be done to include
these extra additions but this is a crucial development to create a truly global tool,
and the benefit of having the foundation will help to extend the reach of this research.
By having access to federal and state legislation combined with case law, users
and governments have a tool which gives them extensive information and direction
to data privacy legislation around the globe.
References
[1] Vic (J.R.) Winkler, “Cloud Computing: Privacy, confidentiality and the
cloud,” June 2013 (Last Accessed on 24 October 2016). [Online]. Available:
https://technet.microsoft.com/en-us/library/dn235775.aspx
[2] I. Georgieva, “The Right to Privacy under Fire – Foreign Surveillance under
the NSA and the GCHQ and Its Compatibility with Art. 17 ICCPR and Art.
256 Data security in cloud computing
8 ECHR,” Utrecht J. Int’l & Eur. L., vol. 31, p. 104, February 27, 2015
(Last Accessed on 24 October 2016). [Online]. Available: www.utrechtjournal.
org/articles/10.5334/ujiel.cr/
[3] N. Arce, “Effect of NSA Spying on US Tech Industry: $35 Billion? No.
Way More,” June 10, 2015 (Last Accessed on 22 July 2016). [Online].
Available: http://www.techtimes.com/articles/59316/20150610/effect-of-nsa-
spying-on-us-tech-industry-35-billion-no-way-more.htm
[4] K. L. Lueth, “Why the Internet of Things Is Called Internet of Things: Def-
inition, History, Disambiguation,” December 19, 2014 (Last Accessed on
24 October 2016). [Online]. Available: https://iot-analytics.com/internet-of-
things-definition/
[5] L. Columbus, “Roundup of Cloud Computing Forecasts And Market Esti-
mates, 2016,” March 13, 2016 (Last Accessed on 24 October 2016). [Online].
Available: http://www.forbes.com/sites/louiscolumbus/2016/03/13/roundup-
of-cloud-computing-forecasts-and-market-estimates-2016/#5557f3c574b0
[6] Department of the Prime Minister and Cabinet (New Zealand), “National
Plan to Address Cybercrime,” December 10, 2015 (Last Accessed on
24 August 2016). [Online]. Available: http://www.dpmc.govt.nz/sites/all/
files/publications/nz-cyber-security-cybercrime-plan-december-2015.pdf
[7] Department of the Prime Minister and Cabinet (Australia), “Aus-
tralia’s Cyber Security Strategy,” April 21, 2016 (Last Accessed on
24 August 2016). [Online]. Available: https://cybersecuritystrategy.dpmc.
gov.au/assets/img/PMC-Cyber-Strategy.pdf
[8] S. Nicholas, “Surviving and Thriving with Your New Business,” September
22, 2015 (Last Accessed on 24 August 2016). [Online]. Available: http://www.
stuff.co.nz / business / better-business/72295224/Surviving-and-thriving-with-
your-new-business
[9] “The History of Cloud Computing” (Last Accessed on 27 August 2016).
[Online]. Available: http://www.eci.com/cloudforum/cloud-computing-
history.html
[10] A. Mohamed, “A History of Cloud Computing” (Last Accessed on 27 August
2016). [Online]. Available: http://www.computerweekly.com/feature/A-
history-of-cloud-computing
[11] Business Cloud News, “AWS, Google, Microsoft and IBM Pull Away from
Pack in Race for Cloud Market Share,” April 29, 2016 (Last Accessed on
27 August 2016). [Online]. Available: http://www.businesscloudnews.com/
2016/04/29/aws-google-microsoft-and-ibm-pull-away-from-pack-in-race-for-
cloud-market-share/
[12] J. Tsidulko, “Keeping Up with the Cloud: Top 5 Market-Share Leaders,”
February 11, 2016 (Last Accessed on 27 August 2016). [Online]. Avail-
able: http: / /www.crn.com / slide-shows/cloud/300079669/keeping-up-with-
the-cloud-top-5-market-share-leaders.htm/pgno/0/6
[13] K. Weins, “Cloud Computing Trends: 2016 State of the Cloud Survey,”
February 9, 2016 (Last Accessed on 6 April 2016). [Online]. Available: http://
www.rightscale.com / blog / cloud-industry-insights / cloud-computing-trends-
2016-state-cloud-survey
Waikato Data Privacy Matrix 257
[40] K. Zetter, “Apple’s FBI Battle Is Complicated. Here’s What’s Really Going
On,” February 18, 2016 (Last Accessed on 29 August 2016). [Online].
Available: https://www.wired.com/2016/02/apples-fbi-battle-is-complicated-
heres-whats-really-going-on/
[41] D. Sulivan, “How Google’s New Right to Be Forgotten Form Works: An
Explainer,” May 30, 2014 (LastAccessed on 28 July 2016). [Online]. Available:
http://searchengineland.com/google-right-to-be-forgotten-form-192837
[42] L. Clark, “Google’s ‘Right to Be Forgotten’ Response Is ‘disappointingly
clever’,” May 30, 2014 (Last Accessed on 28 July 2016). [Online]. Available:
http://www.wired.co.uk/article/google-right-to-be-forgotten-form
[43] D. Sulivan, “Google to Remove Right-To-Be-Forgotten Links Worldwide,
For Searchers in European Countries,” February 10, 2016 (Last Accessed
on 28 July 2016). [Online]. Available: http://searchengineland.com/google-
to-remove-all-right-to-be-forgotten-links-from-european-index-242235
[44] “The Right to Be Forgotten – Between Expectations and Practice,” Novem-
ber 20, 2012 (Last Accessed on 28 July 2016). [Online]. Available:
https://www.enisa.europa.eu/publications/the-right-to-be-forgotten
[45] “Data Protection Laws of the World” (Last Accessed on 30 August
2016). [Online]. Available: https://www.dlapiperdataprotection.com
#handbook/world-map-section
[46] “Global Heat Map” (Last Accessed on 30 August 2016). [Online]. Available:
http://heatmap.forrestertools.com/
[47] H. Shey, E. Iannopollo, M. Barnes, S. Balaouras, A. Ma, and B. Nagel,
“Privacy, Data Protection, and Cross-Border Data Transfer Trends in Asia
Pacific,” March 4, 2005 (Last Accessed on 30 August 2016). [Online]. Avail-
able: https://www.forrester.com/report/Privacy+Data+Protection+And+
CrossBorder+Data+Transfer+Trends+In + Asia + Pacific/-/E-RES131051#
figure2
[48] J. Rohlmeier, “International Data Protection Legislation Matrix” (Last
Accessed on 30 August 2016). [Online]. Available: http://web.ita.doc.gov/ITI/
itiHome.nsf/51a29d31d11b7ebd85256cc600599b80/4947d6deb021a9648525
6d48006403af?OpenDocument
[49] Baker & McKenzie, “Global Privacy Handbook,” 2016 (Last Accessed
on 30 August 2016). [Online]. Available: http://globalprivacymatrix.
bakermckenzie.com/
[50] Baker & McKenzie, “Firm Facts,” 2016 (Last Accessed on 30 August 2016).
[Online]. Available: http://www.bakermckenzie.com/-/media/files/about-
us/firm-facts-final.pdf?la=en
[51] M. Cartwright, “Rosetta Stone,” January 3, 2014 (Last Accessed on 16
December 2016). [Online]. Available: http://www.ancient.eu/Rosetta_Stone/
[52] “EU Countries Ranked for ‘influence potential’,” July 29, 2009 (Last
Accessed on 28 April 2016). [Online]. Available: http://www.euractiv.com/
section/future-eu/news/eu-countries-ranked-for-influence-potential/
This page intentionally left blank
Chapter 12
Data provenance in cloud
Alan Yu Shyang Tan1 , Sivadon Chaisiri1 , Ryan Ko Kok
Leong1 , Geoff Holmes1 , and Bill Rogers1
Abstract
One of the barriers of cloud adoption is the security of data stored in the cloud.
In this chapter, we introduce data provenance and briefly show how it is applicable
for data security in the cloud. Building on this, we discuss the underlying question
of how data provenance, required for empowering data security in the cloud, can
be acquired. The strengths and weaknesses of two methodologies for provenance
acquisition, active collection and reconstruction, are discussed. The goal is to provide
an understanding on the current state-of-the-art for generating provenance, such that
better methodologies and solutions can be developed.
1
Cyber Security Lab, University of Waikato, New Zealand
262 Data security in cloud computing
From these examples, one can observe data provenance is useful for addressing
issues questions the why, how and where [9] of data. Addressing these questions
becomes more relevant when it comes to data in the cloud. Kaufman highlighted cloud
users not knowing where their data is stored as a main concern in his discussion on data
security in the cloud [10]. The abstraction offered in cloud services further complicate
matters when it comes to enforcing legal requirements surrounding data management,
such as the compliance checking and auditing. Because of these concerns, there is an
incentive to apply data provenance to cloud. In addition, the use of data provenance is
not limited to just data auditing and verification. It can also be used for data security.
servers storage
Physical Layer
provenance, on which their proposed solution relies on, is obtained. In this rest of this
chapter, we discuss how data provenance can be acquired in the cloud.
Regional Regional
Depot Depot
Sender Local (R1) (R2) Local Receiver
(S) Depot mail (m1) Depot (R)
(L1) (L2)
Provenance of mail
S delivered by L1 delivered by R1
received by
Postman1 received by Truck
received, time it was received at the depot, time it was sent out of the depot and other
information are tracked at the depot. However, each depot does not track where
or which other depots the item had passed through prior to it reaching the depot.
Similarly, log files are considered to be the result of logging mechanisms deployed
at specific points within a computer system. Most of these mechanisms only capture
events observed within their assigned scope and perspective. For example, a logger
for an application only captures events happening in relation to the application and
within the application’s execution space (e.g. memory region or execution stack).
Thus relationships between objects and processes outside the scope of the application
(e.g. across other log files) are unknown.
On the other hand, provenance of a data describes how the data is managed
and accessed by different applications and entities in a system. As such, it contains
information that intersects with different log files. Referencing the mail example,
provenance of the mail shows the origin of the mail, the depots that have processed
the mail and other information that describes how the mail eventually reaches the
current depot. This provenance can be retrieved from a centralised system, that stores
information from all depots, using the bar code or ID assigned to each mail item.
Because of its relational properties, provenance is commonly visualised as a
graph, as shown in the example illustrated in Figure 12.2. On the other hand, visu-
alising events in a log file (e.g. an application log) will result in a graph that shows
the states and objects the application interacted with directly connected to the node
representing the application. The resulting graph is similar to a ‘one-step’ provenance
graph, as illustrated in Figure 12.3.
Data provenance in cloud 265
File1
Event1
File2
Application
Event2
File3
Hence, to capture information that allow the relational aspects and the entities of
the provenance graph to be modelled, customised tools and techniques are required.
1
We adopt the term provenance collectors for solutions that actively collects provenance from systems.
266 Data security in cloud computing
Process2
write(t2)
Cannot determine
read(t1) which write edge
write(t3)
File Process1 File2 propagated the data
write(t4)
Process3 Other entity nodes
may further branch
out the provenance
graph
monitoring the system through these components, the proposed solutions are able to
observe the activities of each application.
However, because these components operate at the fine-grained layer of the
system. Events observed are detailed, high volume and semantically mismatched to
user activities observed in the more coarse-grained layers of the system (e.g. user
layer) [30]. For example, an event that shows an application reading data from a file
can be translated to a series of read system calls in the kernel. Just by looking at the
system calls, it is difficult to deduce whether the series of read calls are generated
from a single or multiple read events in the application layer. Another major downside
with capturing provenance through observing the OS is the dependency explosion
problem [31].
Given two events, A and B, B is said to be causally dependent on A (A → B), if,
A happens before B and B is generally dependent on A [32]. An example of a pair of
causally dependent system call is when an application read data from a file and write
the data to other objects. If the write happens after the read, the write is considered to
be causally dependent on the read. This is so as the application needs to first obtain
the data before it can output the data.
Due to a lack of semantics, relating to the data or the activity, being captured, it is
difficult to discern whether a pair of causally dependent input–output system call is
related. A single read from an entity may be connected to a group of writes executed by
the same entity after the read. In such situations, it is difficult to accurately determine
which other entities did the data being read propagated to. As a result, all causally
dependent edges and their sub-graphs will have to be included into the provenance.
To complicate matters, the other entities may further branch out the graph, causing an
explosion of nodes and edges in the provenance graph. This end result is an explosion
of dependencies, as illustrated in Figure 12.4.
12.3.1.2 Application reporting
Another alternative to collecting provenance in systems is to modify applications
managing the data to output provenance information whenever the relevant functions
Data provenance in cloud 267
that managed the data, provenance that shows how the data is being transformed by
those applications is unknown.
Assuming a scenario where the provenance for a set of critical files (e.g. medical
records) or data is lost, an intuitive approach is to infer the relationship between files,
thereby reconstructing the data provenance, by comparing the content between files.
Magliacane et al. [38,39], Deolalikar and Laffitte [40] andAierken et al. [41] proposed
using similarity measures such as cosine similarity and longest common subsequence
to compare and determine if two files are related. Files found to be related are then
grouped and ordered based on timestamps, such as date of creation, found in their
metadata. The ordered group of files then forms the coarse-grained provenance for
the oldest file, showing the order of revisions for that file. The underlying assumption
is that different revisions of a file would bear similarities in content between different
revisions. However, there are two weaknesses in the proposed solutions. First, the
reconstructed provenance only shows the possible order of transformation sequence,
from the original file till the latest revision. The output does not capture details on how
and what were the changes between revisions. Second, the approaches do not consider
the case where the content of two revisions of the same file can be drastically different.
Nies et al. [42] recognised these two weaknesses and proposed a solution that
compares two pieces of data (e.g. news articles, files) based on their semantic prop-
erties. Information such as named entities and descriptive metadata annotations that
are embedded in the file are extracted as semantic properties of the data. The authors
argued comparing two pieces of data at the semantic level allow the relationship
between the data to be established even if the contents differed in length or words
used. Details of changes made between two directly dependent revisions are then
inferred by identifying the differences in content. These differences are then mod-
elled into fine-grained provenance relations using concepts defined in the PROV-DM
model [43].
Although these solutions show data provenance can potentially be reconstructed
by comparing the similarity between two pieces of data, the approach is not scalable
as the number of data pieces (e.g. virtual objects) or files increases. In public cloud
storage services, such as Dropbox, it is common to find more than millions of file
objects belonging to different users stored in the underlying infrastructure. As such,
scalability of the approach is a factor that cannot be overlooked when considering
alternative approaches for acquiring provenance from a cloud environment.
Groth et al. [44] explored the idea of formulating data provenance reconstruction
as an Artificial Intelligence (AI) planning problem. The authors assumed the set of
transformations that can take place between two pieces of data, the initial and final
state of a piece of data are known. A prototype, based on A* search [45], that searches
for all possible sequence of transformations that can explain how the data reaches its
final state is built. However, the authors highlight the search space being unbounded
and the possibility of the algorithm returning incomplete results as challenges that
needs to be addressed before the approach is viable.
Deviating from the use of information surrounding the data, Huq et al. [46]
analysed the source code of a given application.2 Based on the grammar used in the
2
In the cited paper, the authors analysed Python scripts. Hence, the term ‘script’ is used in place of source
code.
270 Data security in cloud computing
analysed code, an abstract syntax tree is generated. The proposed graph building
engine then parses through the syntax tree, generating a new object for every node in
the tree. Users are prompted for details, such as whether the node is reading or writing
data, for each object generated. The end result is a provenance graph that describes
the workflow of the application and shows how data is being transformed within the
application’s execution space. Unfortunately, such a provenance graph does not show
how data is being communicated or transformed outside of the application’s execution
space.
The type of reconstruction algorithm that can be applied to cloud heavily relies
on the type of information obtainable from the environment (e.g. algorithm that works
on the physical layer may not work for virtual layer). Current proposed approaches
face issues such as unbounded search space for the possible sequence of transfor-
mation and ability to reconstruct fine-grained provenance. These issues will need
to be resolved before provenance reconstruction can become a viable alternative for
acquiring provenance in the cloud.
References
[1] Y. L. Simmhan, B. Plale, and D. Gannon, “A Survey of Data Provenance in
e-Science,” ACM SIGMOD Record, vol. 34, no. 3, pp. 31–36, 2005. [Online].
Available: http://doi.acm.org/10.1145/1084805.1084812
[2] Y. S. Tan, R. K. L. Ko, and G. Holmes, “Security and Data Accountabil-
ity in Distributed Systems: A Provenance Survey,” in Proceedings of the
15th IEEE International Conference on High Performance Computing and
Communications (HPCC’13), 2013.
[3] L. Carata, S. Akoush, N. Balakrishnan, et al., “A Primer on Provenance,”
Communications of the ACM, vol. 57, no. 5, pp. 52–60, May 2014.
[4] S. M. S. D. Cruz, M. L. M. Campos, and M. Mattoso, “Towards a Taxonomy of
Provenance in Scientific Workflow Management Systems,” in IEEE Congress
on Services, 2009, pp. 259–266.
[5] L. Moreau, “The Foundation for Provenance on the Web,” Foundations and
Trends in Web Science, Journal, vol. 2, pp. 99–241, 2010.
272 Data security in cloud computing
[31] K. H. Lee, X. Zhang, and D. Xu, “High Accuracy Attack Provenance via
Binary-based Execution Partition,” in Proceedings of Annual Network and
Distributed System Security Symposium, April 2013, San Diego, CA.
[32] L. Lamport, “Time, Clocks, and the Ordering of Events in a Distributed
System,” Communication of the ACM, vol. 21, no. 7, pp. 558–565, Jul. 1978.
[Online]. Available: http://doi.acm.org/10.1145/359545.359563
[33] P. J. Guo and M. Seltzer, “Burrito: Wrapping Your Lab Notebook in Compu-
tational Infrastructure,” in USENIX Workshop on the Theory and Practice of
Provenance (TaPP), 2012.
[34] P. Ruth, D. Xu, B. Bhargava, and F. Regnier, “E-notebook Middleware
for Accountability and Reputation Based Trust in Distributed Data Sharing
Communities,” in Proceedings of Second International Conference on Trust
Management, 2004, pp. 161–175.
[35] Mulesoft.org, “Anypoint Platform,” https://developer.mulesoft.com/ (Last
accessed: 14/03/17), 2017.
[36] M. D. Allen, A. Chapman, B. Blaustein, and L. Seligman, Capturing Prove-
nance in the Wild. Berlin: Springer, 2010, pp. 98–101. [Online]. Available:
http://dx.doi.org/10.1007/978-3-642-17819-1_12
[37] K.-K. Muniswamy-Reddy, U. Braun, D. A. Holland, et al., “Layering in
Provenance Systems,” in Proceedings of the 2009 Conference on USENIX
Annual Technical Conference, ser. USENIX’09. Berkeley, CA, USA:
USENIX Association, 2009, pp. 10–10. [Online]. Available: http://dl.acm.org/
citation.cfm?id=1855807.1855817
[38] S. Magliacane, “Reconstructing Provenance,” The Semantic Web – ISWC,
pp. 399–406, 2012.
[39] S. Magliacane and P. Groth, “Towards Reconstructing the Provenance of Clin-
ical Guidelines,” in Proceedings of Fifth International Workshop on Semantic
Web Applications and Tools for Life Science (SWAT4LS), vol. 952, 2012.
[Online]. Available: http://ceur-ws.org/Vol-952/paper_36.pdf
[40] V. Deolalikar and H. Laffitte, “Provenance as Data Mining: Combining File
System Metadata with Content Analysis,” in Proceeding of First Workshop on
Theory and Practice of Provenance (TAPP’09), 2009.
[41] A. Aierken, D. B. Davis, Q. Zhang, K. Gupta, A. Wong, and H. U.
Asuncion, “A Multi-level Funneling Approach to Data Provenance Recon-
struction,” in Proceedings of the 2014 IEEE 10th International Conference on
e-Science – Volume 02, ser. E-SCIENCE’14. Washington, DC, USA: IEEE
Computer Society, 2014, pp. 71–74. [Online]. Available: http://dx.doi.org/
10.1109/eScience.2014.54
[42] T. D. Nies, S. Coppens, D. V. Deursen, E. Mannens, and R. V. de Walle,
“Automatic Discovery of High-Level Provenance using Semantic Similarity,”
in Proceedings of the Fourth International Conference on Provenance and
Annotation of Data and Processes (IPAW), 2012, pp. 97–110.
[43] L. Moreau and P. Missier, “PROV DM – W3C Working Group Note,”
https://www.w3.org/TR/2013/REC-prov-dm-20130430/ (Last accessed:
13/06/2016), April 2013.
Data provenance in cloud 275
Abstract
Cloud services continue to attract organizations with advantages that enable subsidiary
costs. While there are advantages, security in the cloud is an ongoing challenging pro-
cess for cloud providers and users. Cyber-threats are penetrating cloud technologies
and exposing flaws in the cloud technologies. Data Provenance as a Security Visual-
ization Service (DPaaSVS) and Security Visualization as a Cloud Service (SVaaCS)
for cloud technologies are solutions to help track and monitor data in the cloud.
Either data is at-rest or in in-transit, security visualization empowers cloud providers
and users to track and monitor their data movements. Security visualization refers
to the concept of using visualization to represent security events. In this chapter, we
(1) provide our security visualization standardized model and (2) provide the secu-
rity visualization intelligence framework model and finally discuss several security
visualization use-cases.
13.1 Introduction
Providing security for data in cloud computing has been achieved in many ways and
methods, mainly by past and existing research, technology solutions and innovations.
Cloud providers and platforms are offering attractive services which spans across
the wider spectrum of both business and home users. However, existing solutions are
tailored towards the interests of cloud providers and largely for business organizations
but less for the cloud end-users. Traditional computer security technologies such as
firewalls, antivirus solutions and web proxies that scan and filter malicious packets
were the foundation of computer security. These solutions faced challenges when
cloud technologies emerged. The ability to protect data from exploitation, data breach
and data leakage has been a challenge. And while cloud technologies provide efficient
1
Cyber Security Lab, Department of Computer Science, University of Waikato, New Zealand
278 Data security in cloud computing
services to desirable users [1], current cloud technology providers are faced with data
security challenges. For example, an attacker is able to:
user-centric visual approaches for customers when using cloud services and providing
situation awareness to the cloud users [6].
While visualization has been used across multiple research domains, ranging
from its origins in art to medical/pharmaceutical domain, researchers and businesses
are tapping into using visualization mainly for two purposes: visualization for explo-
ration and visualization for reporting. The reason is that visualization facilitates the
missing gap between an anxious user and the given medium examined (visualization,
reports, etc.). Visualization allows users to psychologically and intellectually build
visual views of the examined medium which naturally being processed by human
perception and cognition.
Although there are many visualizations used out in cloud technologies and visu-
alization domains, the focus of this chapter is on effective security visualization with
use-cases primarily for law enforcement – “Threat Intelligence and Data Analytics,”
the use of user-centric visualizations and generally in other security organizations
such as financial and academia. Sub objectives are stated below to address effective
intelligence reporting as follows:
presents new use-cases with the law enforcement approach of leveraging security
visualization for Bitcoin investigations and malware threat visualizations. Finally,
Section 13.6 provides the concluding remarks of this chapter.
lack the assessment factor. Security visualization in a quick-to-action form with pre-
sentable insights directed towards law enforcement is a catalyst to minimize the time
spent on addressing/analyzing cyber security issues.
Data security in organizations and Internet is made aware of through security
situational awareness. Adding visualization tools to provide data security such as
NVisionIP [18], PortVis [19] and VisFlowConnect [20] allows security experts to
increase their knowledge on the current state of their network. NVisionIP’s primary
focus is on detection by leveraging on visual representation of suspected segments
of the network. PortVis’s emphasis is on security event data discovery. It monitors
data flow over Transmission Control Protocol (TCP) ports for security events. The
VisFlowConnect tool enhances the ability of security administrators to detect and
investigate anomalous data traffic between an internal local network and external
networks. With all the various functionalities mentioned above, these tools are able to
discover and observe varieties of interesting network traffic patterns and behaviors.
However, there are continuous data security challenges especially for cloud
technologies. These challenges which are listed [21] include:
The data security challenges mentioned also affect how security visualization is
used when representing security events and scenarios. The introduction of security
visualization into data security research has helped to identify and expose specific
security needs that are targeted to data in the cloud.
Often security requires standards, policies, and guidelines to help safeguard a set of
platforms when organizations handle sensitive data. In order for any security model
to effectively function, certain requirements have to be met. The requirements include
standards, guidelines, policies, and trainings around security visualizations with the
aim of providing the users with the most useful insights on certain visualization
purposes.
Security visualization standards are published materials that attempt to safeguard
the security visualization environment. It is important and largely contributes to how
visualizations provide security insights to users. Understanding what makes a visu-
alization useful depends on how simple, clear, effective, and appropriate the type of
security event is intended to be visually represented. Effective security visualization
approaches heavily rely on user perceptions and cognitive abilities to connect the dots
between users and visualizations.
And from the threat landscape approach, researchers and developers often ask
the following questions below:
The purpose of this section is to outline how effective and efficient the use
of security visualization within law enforcement operations and investigations is.
Regardless of existing security challenges faced by security firms [27] such as big
data analytics, there is always the need to improve how security tools are performing.
While there are other methods aiding law enforcement operations and investigations,
the application of security visualization into the operations can minimize the time
spent on investigations and other work processes. In order to achieve that, the law
enforcement needs to create a set of visualization standards which act as indicators in
various processes. For example, the current use of INTERPOL’s “International Notice
System [28],” the color Red in Red Notices indicates “Wanted Person,” Yellow Notices
indicates “Missing Person” and so on [29]. This helps law enforcement personals to
quickly act upon such alerts when received at the National Central Bureaus (NCB) in
different member countries and offices.
While the INTERPOL’s Notice System uses colors to indicate different human
investigation processes, there is a need to expand the idea and develop a visualization
standard for law enforcement. This means, creating similar color and shape standards
to visually represent various cyber-attacks. For example, over any network and system
investigations, the color “Green” can be used to represent “normal web traffic.”
Suspicious traffic can be visually represented with the color “Yellow.” Malware traffic
can be visually represented with the color “Red.” However, network traffic can be
symbolized using “Circles” instead of “Rectangles,” just to distinguish between the
INTERPOL’s Notice System and the network traffic.
Other cyber threats and attacks can be visualized differently, however keeping
the standard consistent across different law enforcement visualization applications
will gradually establish common grounds with visual reports and investigations. As
a result, information processing will be fast and effective because humans naturally
have the cognitive ability to process visual images and information faster than reading
words.
Web applications
DPaaSVS
Cloud
provider
Added security
visualization framework
users of the cloud today [30]. An added sub-component to working towards the concept
of a Security-as-a-Service (SECaaS) model [31] is to observe user activities, web
traffics and keeping track of what is happening in the cloud services [32]. Adding the
security visualization as a service model into security in the cloud enables visibility
in the cloud processes for both providers and customers. It installs confidence and a
trust relationship between cloud customers and the service offered. Fujitsu’s security
visualization dashboard has helped to improve security governance in the cloud and
enabled customers to visualize the efficiency and cost-effectiveness of information
security measures [33].
The use of visual dashboards in business management software packages has
provided businesses with the opportunity to include visual parts to the entire report-
ing framework. Amazon’s QuickSight, a business intelligence for Big Data, provides
visual analytics [34]. Google’s cloud datalab tool offers data exploration, analysis,
visualization and machine learning capabilities [35]. Microsoft Azure can produce
Power Bi, a Big Data solution, with live dashboard and attractive visualization [36].
MicrosoftAzure’s Power Bi tool shown in Figure 13.2 displays how visualization dash-
boards are used to visually analyze data and present interesting insights. In Figure 13.2,
Power Bi tool is used to show a persons’ movement. Dashboard visualizations aim to
consolidate most information onto a single screen view, showing users all the neces-
sary information needed. However, adding the concept of Security Visualization as a
Cloud Service (SVaaCS) into the cloud services to enhance dashboard visualizations
helps to improve the way data security is presented with visualization.
a
A colored figure of the “DPaaSVS and SVaaS: Cloud Computing Security Visualization Framework.”
[Online]: https://crow.org.nz/security-visualization/Figure-13.1_SaaSSecVis
Security visualization for cloud computing: an overview 287
c82e9d0848: Received
1G6GjEEMMBTnnj1BjNKkSRknpGQRizeSwz
3yCpqJyrNzHSrnperZnTMSwLa5K
b
A colored figure of the “Azure’s Power Bi Analytic Visualization.” [Online]: https://crow.org.nz/security-
visualization/Figure-13.2_PowerBI_2
c
A colored figure of the “A Bitcoin Transaction Address Visualization.” [Online]:
https://crow.org.nz/security-visualization/Figure-13.3_Bitcoin_Address_Output_Tree
288 Data security in cloud computing
d
A colored figure of the “SVInt: Bitcoin Transaction Visualization.” [Online]: https://crow.org.nz/security-
visualization/Figure-13.4_Blockchain_Vis
e
A colored figure of the “BitNodes - Bitcoin Live Map Visualization.” [Online]:
https://crow.org.nz/security-visualization/Figure-13.5_BitNodes
290 Data security in cloud computing
f
A colored figure of the “SVInt: Threat Intelligence Visualization.” [Online]: https://crow.org.nz/security-
visualization/Figure-13.6_ThreatGeoMap
Security visualization for cloud computing: an overview 291
Sexual
Spyware Impulse
Phishing Psychiatric
Illness
Undefined
g
A colored figure of the “ForensicTMO Visualization Analytics.” [Online]: https://crow.org.nz/security-
visualization/Figure-13.7_Forensic_Extension_Attrib
292 Data security in cloud computing
Cloud computing technologies change and shape the way Internet services were
offered. When using cloud services, users have to remind themselves of how their
data are being managed and stored. Security is always a challenge for both cloud
providers and cloud users.
In this chapter, SVaaS is used to bridge the gap between users and cloud plat-
forms. With use-cases from the law enforcement domain, SVaaS – a sub-component
of SECaaS – highlights how security visualizations are transforming cloud technolo-
gies. From an analytical perspective, DPaaSVS highlights security through tracking,
monitoring, and enabling user interaction with visual features for cloud services.
SVaaS and DPaaSVS connect the users perception of how users’ data activities while
their data are stored and processed in the cloud. Finally, security visualization creates
the sense of trust and confidence to both cloud providers and customers.
References
[1] Jin, H., S. Ibrahim, T. Bell, W. Gao, D. Huang, and S. Wu, “Cloud types and
services,” in Handbook of Cloud Computing, B. Furht and A. Escalante, Eds.
Boston, MA: Springer, 2010, pp. 335–355.
[2] Muttik, I. and C. Barton. “Cloud security technologies.” Information Security
Technical Report, 14(1) (2009): 1–6.
[3] Hundhausen, C. D., S. A. Douglas, and J. T. Stasko. “A meta-study of algorithm
visualization effectiveness.” Journal of Visual Languages & Computing 13,
no. 3 (2002): 259–290.
[4] “Kaspersky Security Bulletin 2015. Overall statistics for 2015 – Securelist.”
[Online]. Available: https://securelist.com/analysis/kaspersky-security-
bulletin/73038/kaspersky-security-bulletin-2015-overall-statistics-for-2015/.
[Accessed: 12-Jan-2017].
[5] Best, D. M., A. Endert, and D. Kidwell. “7 key challenges for visualiza-
tion in cyber network defense.” In Proceedings of the Eleventh Workshop on
Visualization for Cyber Security, pp. 33–40. Paris, France: ACM, 2014.
[6] I. Kotenko and E. Novikova, “Visualization of security metrics for cyber sit-
uation awareness,” in 2014 Ninth International Conference on Availability,
Reliability and Security, 2014, pp. 506–513.
Security visualization for cloud computing: an overview 293
[35] “Cloud Datalab – Interactive Data Insights Tool,” Google Cloud Plat-
form. [Online]. Available: https://cloud.google.com/datalab/. [Accessed:
20-Mar-2017].
[36] “Azure brings big data, analytics, and visualization capabilities to U.S. Gov-
ernment.” [Online]. Available: https://azure.microsoft.com/en-us/blog/azure-
brings-big-data-analytics-and-visualization-capabilities-to-u-s-government /.
[Accessed: 20-Mar-2017].
[37] Garae, J., R. K. L. Ko, and S. Chaisiri, “UVisP: User-centric visualization of
data provenance with Gestalt principles,” in 2016 IEEE Trustcom/BigDataSE/
ISPA, Tianjin, China, August 23–26, 2016, 2016, pp. 1923–1930.
[38] “Bitnodes live map.” [Online]. Available: https://bitnodes.21.co/nodes/live-
map/. [Accessed: 20-Mar-2017].
[39] Stricot-Tarboton, S., S. Chaisiri, and R. K. L. Ko, “Taxonomy of man-in-the-
middle attacks on HTTPS,” in 2016 IEEE Trustcom/BigDataSE/ISPA, Tianjin,
China, August 23–26, 2016, 2016, pp. 527–534.
[40] Krasser, S., G. Conti, J. Grizzard, J. Gribschaw, and H. Owen. “Real-time and
forensic network data analysis using animated and coordinated visualization,”
in Proceedings from the Sixth Annual IEEE SMC Information Assurance
Workshop, pp. 42–49. IEEE, 2005.
This page intentionally left blank
Index