A Dynamic Cloud With Data Privacy Preservation
A Dynamic Cloud With Data Privacy Preservation
Title
A Dynamic Cloud with Data Privacy Preservation
Permalink
https://escholarship.org/uc/item/03g6171c
Author
Bahrami, Mehdi
Publication Date
2016
License
https://creativecommons.org/licenses/by/4.0/ 4.0
Peer reviewed|Thesis/dissertation
by
Mehdi Bahrami
Doctor of Philosophy
in
of the
Committee in charge:
Professor Mukesh Singhal, Chair
Professor Florin Rusu
Professor Dong Li
1
Professor Hamid R. Arabnia
1
(External Committee Member, University of Georgia)
Fall 2016
A Dynamic Cloud with Data Privacy Preservation
Copyright 2017
by
Mehdi Bahrami
The Dissertation of Mehdi Bahrami is approved, and it is acceptable in quality and form
for publication on microfilm and electronically:
______________________________________________________________________________
Professor Dong Li
____________________________________________________________________________
______________________________________________________________________________
(University of Georgia)
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
Chair
2016
i
Abstract
Mehdi Bahrami
The emerging field of Cloud Computing provides elastic on-demand services over the
Internet or over a network. According to the International Data Corporation (IDC), cloud
computing has two major issues: i) architecture issues, such as a lack of standardization, a
lack of customization; and ii) users’ data privacy. In this study we focus on these issues.
i) enables a single service layer to interact with all native cloud services (e.g., IaaS,
PaaS, SaaS and any cloud-based services);
ii) provides a standardization for existing services and future services in the cloud;
iii) customizes native cloud services based on users’ group requests.
The second part of this study focuses on users’ data privacy preservation on the
proposed architecture. Users’ data privacy can be violated by the cloud vendor, the vendor’s
authorized users, other cloud users, unauthorized users, or external malicious entities.
Encryption of data on client side is one of the solutions to preserve data privacy in the
cloud; however, encryption methods are complex and expensive for mobile devices to encrypt
and decrypt each file, such as smart phones. We propose a novel light-weight data privacy
method (DPM) by using a chaos system for mobile cloud users to store data on multiple
clouds. The proposed method enables mobile users to store data in the clouds while it
preserves users’ data privacy.
ii
We evaluate both the proposed dynamic architecture and the proposed data privacy
preservation method. Our experimental results show that on the one hand DCCSOA
enhances standardization by offering a flexible cloud architecture and minimizing the
modification on the native cloud services; on the other hand, DPM achieves a superior
performance over regular encryption methods in regard to computation time.
iii
In Memory of My Father
who was a great and kind teacher, and an invaluable supportive person and
This work would not have been possible without all your support.
iv
Contents
Motivation ............................................................................................................................. 87
IoT devices and their limitation ......................................................................................... 89
Related Works....................................................................................................................... 90
The Proposed data privacy Scheme for IoT Devices ....................................................... 91
Experimental Setup .............................................................................................................. 95
Experimental Results ........................................................................................................... 95
Summary of Chapter ........................................................................................................... 96
Chapter 7 ................................................................................................................................... 98
An EHR Platform based on DCCSOA........................................................................................... 98
Introduction .......................................................................................................................... 98
7.1.1 Migration of EHR systems .............................................................................................. 98
7.1.2 Data Security of EHR systems ........................................................................................ 98
7.1.3 Data Privacy of EHR systems ......................................................................................... 99
Background ........................................................................................................................... 99
The proposed EHR platform ............................................................................................ 100
Experimental Setup ............................................................................................................ 102
Experimental Results ......................................................................................................... 103
Related Work ...................................................................................................................... 104
Summary of chapter .......................................................................................................... 105
Chapter 8 ................................................................................................................................. 107
Data Privacy Preservation for Cloud-based Databases ............................................................ 107
Introduction ........................................................................................................................ 107
Background ......................................................................................................................... 108
8.2.1 Security parameters of DPM ........................................................................................ 109
The size of chunks............................................................................................................. 109
The number of repeated initial parameters........................................................................ 109
The proposed DPM-based Schema for cloud databases .............................................. 109
Security Analysis ................................................................................................................ 112
Experimental Setup ............................................................................................................ 114
Experimental Results ......................................................................................................... 114
Related Works..................................................................................................................... 115
ix
List of Figures
Figure 2.3. One snapshot of DTSL layer and connection to cloud value-added services 28
Figure 2.4. An example of a template (IaaSx template) with two different back-ends for two
different platforms ..................................................................................................................... 29
Figure 3.2. Split the header of files and substitute header of file #1 with file #2 .............. 44
Figure 3.3. Two different patterns to store chunks in three files ........................................ 46
Figure 3.8. A statistical deviation of position in original file and scramble files .............. 51
Figure 3.7. A deviation of chunks positions in scramble file with different parameter
values ........................................................................................................................................... 51
Figure 3.9. (a) the original image; (b) a scrambled image based on the proposed method;
....................................................................................................................................................... 55
(c) a cipher image based on JPEG encoder; (d) cipher image based on AES encryption 55
Figure 4.2. The evaluation results of one and six sets of 𝜉 with different initial values . 66
Figure 5.2. One snapshot of DTSL and its interaction with two heterogeneous cloud
platforms ..................................................................................................................................... 81
Figure 6.3. A General view of the proposed data privacy method for MIoT ................... 91
Figure 6.4. An example of submitting 8-bit generated data from IoT device with 128KB
buffer, and 64 KB stream data to cloud .................................................................................. 94
Figure 6.5. The repetition rate for the first 152 values of 𝜖𝑖 in 𝜓𝑖 ....................................... 97
Figure 7.1. A view of EHR template with implementation of DPM and its connection to
cloud value-added services .................................................................................................... 100
Figure 7.2. Experimental Results: a comparision between the performance of DPM and
AES on the proposed platform............................................................................................... 105
Figure 8.1. The proposed DPM-based Schema for cloud databases ................................ 110
Figure 8.2. The difference between the original address and the scrambled address ... 114
Figure 8.3. A comparison between AES encryption and DPM on NoSQL databases ... 117
Figure 8.4. The response time difference between AES and DPM ................................... 117
Figure 8.5. A comparison of data binding latency between AES encryption and DPM 117
xii
List of Tables
Table 1.1. Other service layers in Cloud Architecture ......................................................... 16
Table 2.1 A comparison between different cloud architectures and cloud platforms .... 35
List of Algorithms
List of Codes
Acknowledgements
A Ph.D. program is a long journey and this work would not be completed without
receiving support from numerous people. It has been a privilege to work and collaborate
with faculties, students, communities and staffs at University of California, Merced.
First, all that has been completed was advised by Professor Mukesh Singhal, who is
a bright professor, leader, friend and an incredible supervisor. His patience, constructive
feedback, and inspiration has encouraged me to complete this work. His invaluable
support and encouragement allows me to successfully move forward in the cutting-edge
areas of cloud computing, data security and data privacy preservation. I learned from
Professor Singhal, how to do a research, and how to transfer my research from the lab to
the real world.
I received valuable support and direction from Professor Florin Rusu, Professor Dong
Li, Professor Arabnia in the fields of database, parallel computing, and distributed
computing respectively.
I have been delighted to work with Ms. Kathleen Cadden and Dr. Kelvin Lwin as a
teaching assistant. Ms. Kathleen Cadden helped me to deliver a set of high-tech software
tools to students with diverse computer science backgrounds. Dr. Kelvin Lwin helped
me to prepare a set of software engineering projects.
I would like to thank the Margo F. Souza Leadership Center, which prepared me to
be a leader in my field. I have been awarded several honors, generous leadership awards,
and fellowship awards from the center, including the Margo Souza Entrepreneur in
Training Award and the Distinguished Leadership Award.
I would like to thank Mr. Akira Itoh, Dr. Wei-Peng Chen, and Mr. Takaki Kamiya
from Fujitsu Laboratory of America, Inc. for their direction in the field of security of
Information-Centric Networking and Natural Language Processing. My collaboration
with these incredible scientists resulted in the best demo award at ACM ICN 2016, two
United States patents pending, and three technical papers.
I would also like to thank Dr. Liguang Xie from Virginia Tech, Dr. Arshia Khan from
University of Minnesota, and Dr. Ashish Kundu from IBM Thomas J. Watson for their
collaboration and their constructive feedback.
I had a chance to meet several people from different departments who helped me
with their support. First, I would like to thank Ms. Belinda Braunstein from the Center
for Engaged Teaching and Learning for her kindness, giving valuable time and
supportive direction in teaching. She was one of the most valuable people I met during
orientation day and I would be happy to working with her in the future. Second, she
xvi
I would like to thank the UC Merced Graduate Division, School of Engineering and
their staff for their invaluable support as several research and teaching fellowship
awards. These awards have always encouraged me to enhance my services to both
communities of UC Merced and the fields of computer science.
I am deeply grateful for receiving all of this invaluable support at the University of
California, Merced.
1
In the first chapter, we review the concept of cloud computing and big data tools
(Bahrami and Singhal 2015a). This chapter introduces key points of cloud computing, the
architecture of cloud computing, and available big data tools in the cloud computing
environment.
Another challenge in cloud computing is users’ data privacy which is the second goal
of this study. In Chapter 3, we introduce a novel light-weight data privacy method for
mobile cloud users (Bahrami and Singhal 2015c and Bahrami 2015f). By end of this
chapter, the reader understands the current data privacy issue in mobile cloud computing
as well as our proposed privacy preservation method.
Recently, use of cloud-assisted IoT devices has become popular around the world.
Due to storage and computation limitation on IoT devices, cloud computing provides an
opportunity to these tiny computing machines to outsource their data and computation
to cloud environments; however, there is two challenges:
First, how to use heterogeneous cloud computing architectures for interacting with variety of
IoT devices; and second, how a user may maintain own data privacy on IoT devices.
cloud computing systems. The results of these study will appear in (Bahrami and Singhal
2016d).
Chapter 6 answers the second question regarding users’ data privacy preservation
by introducing a novel DPM-based solution for Cloud-assisted IoT devices (Bahrami et
al. 2016b).
Chapter 7 introduces a use case of DPM and DCCSOA for electronic healthcare
systems. A novel platform has been described in this chapter which is presented in
(Bahrami et al. 2016b). The platform preserves patients’ data privacy in cloud-based
electronic health record systems.
Finally, Chapter 9 concludes this study and provides future research directions on
cloud computing architecture and data privacy preservation for mobile cloud users and
cloud-assisted IoT devices.
3
Chapter 1
Introduction
Introduction
Capturing data from different sources allows a business to use Business Intelligence
(BI) (Matheson 1998) capabilities. These sources could be consumer information, service
information, products, advertising logs, and related information such as the history of
product sales or customer transactions. When an organization uses BI technology to
improve services, we characterize it as a “smart organization” (Matheson 1998). The
smart features of these organizations have different levels which depend on the accuracy
of decisions; greater accuracy of data analysis provides “smarter” organizations. For this
reason, we are collecting a massive amount of data from people, actions, sensors,
algorithms, and the web which forms “Big Data.” This digital data collection grows
exponentially each year. According to (Manyika 2011), big data refers to datasets whose
size is beyond the ability of typical database software tools and applications to capture,
store, manage and analyze.
An important task of any organization is data analysis which is able to change a large
volume of data to a smaller amount of valuable data but still it requires to collect a
massive amount of data.
Big data has become a complex issue in all disciplines of science. In scientific big data,
several solutions have been proposed to overcoming big data issues in the field of life
sciences (Buscema et al. 2008 and Howe at al. 2008), education systems (Hanna 2004),
material sciences (Wilson 2013), social networks (Tan 2013) and.
Some examples of the significance of big data for generating, collecting and
computing are listed as follows:
4 | CHAPTER 1.
● It is predicated that data production will be 44 times greater in 2020 than it was
in 2009. This data could be collected from variety resources, such as traditional
databases, videos, images, binary files (applications) and text files;
● Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data
which includes a variety of data, such as images, videos and texts.
● In 2008, Google was processing 20,000 Terabytes of data (20 petabytes) per
day (Schonfeld 2014).
● Decoding the human genome originally took 10 years to process; now it can be
achieved in one week with distributing computing on big data.
● Big data is a top business priority and drives enormous opportunities for
business improvement. Wikibon’s own study projects that big data will be
a $50 billion business by 2017 (Rigsby 2014).
● Macy's Inc. provides a real-time pricing. The retailer adjusts pricing in near
real-time for 73 million items for sale based on demand and inventory
(Davenport 2013).
● The major VISA process more than 172,800,000 card transactions each day
(Fairhurst 2017).
The most public resource data are available on the Internet, such as multimedia steam
data, social media data and text. This variety of data shows we are not facing only
structured data, but also unstructured data, such as multimedia files (including video,
audio and images), and Twitter and Facebook comments. Unstructured data causes
complexity and difficulty in analyzing big data. For example, a corporation analyzes user
comments and user shared data on social media that could recognize customer favorites
and provide best offers.
1
International Data Corporation (IDC) is an American market research, analysis and advisory firm specializing in
information technology, telecommunications, and consumer technology.
INTRODUCTION | 5
To collect and process big data, we can use Cloud Computing Technology. Cloud
computing is a new paradigm for hosting clusters of data and delivering different
services over a network or the Internet. Hosting clusters of data allows customers to store
and compute a massive amount of data on the cloud. This paradigm allows customers to
pay on pay-per-use basis and enables them to grow (or shrink) their computing and
storage needs on demand. These features allow customers to pay the infrastructure for
storing and computing based on their current capacity of big data and transactions.
Capturing and processing big data are related to improving the global economy,
science, social affair, education and national security; processing of big data allows us to
propose accurate decisions and acquire knowledge from raw data.
This chapter aims to show the role of cloud computing in dealing with big data and
intelligent computing. This chapter is organized as follows: Section 1.2 discusses a
definition and characteristics of big data. In Section 1.3, we discuss important
opportunities and challenges in handling big data. In Section 1.4, we discuss cloud
computing and key architectural components for dealing with big data. In this section,
we review how each service layer of a cloud computing system could handle big data
issues. In Section 1.5, we provide a list of cloud-based services and tools for dealing with
big data. A summary of implementation cloud computing models is described in Section
1.6. We review some major cloud computing issues in Section 1.7. We summary the
chapter in Section 1.8.
● “Velocity” which indicates the speed for data processing in terms of response
time. This response time could be a batch, real-time or stream response-time;
● “Veracity” which indicates level of accuracy in the data. For example, a sensor
that generates data can have a wrong value rather than provides an accurate
data.
Big data could have one or multiple of the above characteristics. For example, storing
and computing on social data could have a very large volume of data (volume) and
specific response-time for computing (velocity) but it may not have variety and veracity
characteristics.
6 | CHAPTER 1.
Another example, analyzing public social media data regarding the purchase history
of a customer could provide a future favorite purchase list when she searches for a new
product. In this case, big data have all characteristics: volume of data, because collecting a
massive amount of data from public social media networks; velocity, because response-
time limited to near real-time when a customer search a product; variety, because big data
may come from different sources (social media and purchase history); lack of veracity,
because data from customers in social media networks may have uncertainty. For
instance, a customer could like a product in a social media network, not because this is
the product of her choice, but because of this product is used by her friend.
Another important question in big data is, “How large is big data?” We can answer this
question based on our current technology. For example, (Jacobs 2009) states that in the
late 1980s at Columbia University that they stored 100 GB of data as big data via an IBM
3850 MSS (Mass Storage System), which costs $40K per GB. In 2010, the Large Hadron
Collider (LHC) facility at CERN produced 13 petabytes of data (Gewin 2008). So what we
call big data depends on the cost, speed and capacity of existing computing and storage
technologies. For example, in the 1980s, 100 GB was big data because the storage
technology was expensive at that time and it had low performance. However, by 2010,
the LHC processed 13 Petabyte as a big data which has 1.363*105 times more volume than
IBM 3850 MSS big data in 1980s.
Collection of information cannot only help us to avoid car accidents but also could
help us to make an accurate decision in any systems, such as business financial systems
(Rigsby 2014), education systems (Siegel 2000), and treatment systems, e.g. Clinical
Decision Support Systems (Berner 2007).
Some important opportunities are provided by big data. They are listed as follows:
● Analyze big data to improve business processes and business plans, and to
achieve business plan goals for a target organization (The target organization
could be a corporation, industry, education system, financial system,
government system or global system.)
On the other hand, we have several issues with big data. The challenges of big data
happened in various domains including storing of big data, computing on big data and
transferring of big data. We discuss these issues below:
Storage Issues
A database is a structured collection of data. In the late 1960s, flat-file models which were
expensive and slow, used for storing data. For these reasons, relational databases
emerged in the 1970s. Relational Database Management Systems (RDBMS) employ
Structured Query Language (SQL) to store, edit and retrieve data.
Lack of support for unstructured data led to the emergence of new technologies, such
as BLOB (Binary Large Object) in the 2000s. Unstructured data may refer to multimedia
data. Also unstructured data may refer to irregularly or randomly repeated column
patterns that vary from row to row within each file or document. BLOB could store all
data types in most RDBMS.
In addition, a massive amount of data could not use SQL databases because
retrieving data and analyzing data takes more time for processing. So “NoSQL”, which
stands for “Not Only SQL” and “Not Relational”, was designed to overcome this issue.
NoSQL is a scalable partitioned table that could distribute data over many servers.
NoSQL is implemented for cloud computing because in the cloud, a data storage server
could be added or removed anytime. This capability allows for the addition of unlimited
data storage servers to the cloud.
8 | CHAPTER 1.
This technology allows organizations to collect a variety of data but still increasing
the volume of data increases cost investment. For this reason, capturing high-quality data
that could be more useful for an organization rather than collecting a bulk of data.
Computing Issues
When we store big data, we need to retrieve, analyze and modify it. The important
part of collecting data is analyzing big data and converting raw data into valuable
information that could improve a business process or decision making. This challenge
can be addressed by employing a cluster of CPUs and RAMs in cloud computing
technology.
Transfer Issues
Transfer of big data is another issue. In this challenge, we are faced with several sub-
issues: Transfer Speed, which indicates how fast we can transfer data from one
location/site to another location/site. For example, transferring of DNA, which is a type
of big data, from China to the United States has some delay in the backbone of the
Internet, which causes a problem when they receive data in the United States (Marx 2013).
BGI (one of the largest producers of genomic data, Beijing Genomics Institute in Shenzen,
China) could transfer 50 DNAs with an average size of 0.4 terabyte through the Internet
in 20 days, which is not an acceptable performance (Marx 2013).
Traffic Jam: transfer of big data could happened between two local sites, cities or
worldwide via the Internet but between any locations this transfer will result in a very
large traffic jam.
Accuracy and Privacy: Often we transfer big data through unsecured networks, such
as the Internet. Data transfers through the Internet must be kept secure from
unauthorized access. Accuracy aims to transfer data without missing any bits.
Cloud Computing
Several traditional solutions have been emerged for dealing with big data such as
Supercomputing, Distributed Computing, Parallel Computing, and Grid Computing.
However, elastic scalability is important in big data which could be supported by cloud
computing services. Cloud computing has several capabilities for supporting big data
which are related to handling of big data. Cloud computing supports two major issues of
INTRODUCTION | 9
big data, which are described in the following sections including storing of big data and
computing of big data. Cloud computing provides a cluster of resources (storage and
computing) that could be added anytime. These features allow cloud computing to
become an emerging technology for dealing with big data.
In this section, we first review important features of cloud computing systems and a
correlation of each of them to big data. Second, we discuss a cloud architecture and the
role of each service layer in handling big data.
The major characteristics of cloud computing as defined by the U.S. National Institute
of Standards and Technology (NIST) (Liu et al. 2011) are as follows:
This characteristic shows the following features: (i) an economical model of cloud
computing which enables consumers to order required services (computing machines
and/or storage devices). The service requested service could scale rapidly upward or
downward on demand; (ii) it is a machine responsibility that does not require any human
to control the requested services. The cloud architecture manages on-demand requests
(increase or decrease in service requests), availability, allocation, subscription and the
customer’s bill.
This feature is interesting for a start-up business, because this feature of cloud
computing systems allows a business to start with traditional data or normal datasets (in
particular start-up business) and increase their datasets to big data as they receive
requests from customers or their data grows during the business progress.
Resource pooling
A cloud vendor provides a pool of resources (e.g., computing machines, storage devices
and network) to customers. The cloud architecture manages all available resources via
global and local managers for different sites and local sites, respectively.
This feature allows big data to be distributed on different servers which is not
possible by traditional models, such as supercomputing systems.
Service Accessibility
A cloud vendor provides all services through broadband networks (often via the
Internet). The offered services are available via web-based model or heterogeneous client
applications (Singhal 2013). The web-based model could be an Application Programming
10 | CHAPTER 1.
Interface (API), web-services, such as Web Service Description Language (WSDL). Also
heterogeneous client applications are provided by the vendors. Customers could run
applications on heterogeneous client systems, such as Windows, Android and Linux.
This feature enables partners to contribute to big data. These partners could provide
cloud software applications, infrastructure or data. For example, several applications
from different sites could connect to a single-data or transparent multiple-data
warehouse for capturing, analyzing or processing of big data.
Measured Service
Cloud vendors charge customers by a metering capability that provides billing for a
subscriber, based on pay-per-use model. This service of cloud architecture manages all
cloud service pricing, subscriptions and metering of used services. This capability of
cloud computing system allows an organization to pay for the current size of datasets
and then pay more when dataset size increases. This service allows customers to start
with a low investment.
The Architecture of a cloud computing system is specific to the overall system and
requirements of each component and sub-components. Cloud architecture allows cloud
vendors to analyze, design, develop and implement big data.
Cloud vendors provide services through service layers in cloud computing systems.
The major categories are divided into four service layers: Infrastructure-as-a-Service
(IaaS), Platform-as-a-Service (PaaS), Software-as-a-Service (SaaS) and Business
Intelligence (BI) and other service layers assigned to the major service layers as shown in
Figure 1.1, such as Data-as-a-Service(DaaS) assigned to IaaS layer. Description of each
service discussed in Section 1.5.
INTRODUCTION | 11
BI
Business Process aaS Business Intelligence aaS
SaaS
The IaaS model offers storage, processors and fundamental hardware to the cloud
customers. This model covers several services, such as firmware, hardware, utilities, data,
databases, resources and infrastructure. This model allows clients to install operating
systems, receive quoted infrastructure, and develop and deploy required software
applications. This model is often implemented via Virtualization, which enables multi
users/tenants work on share machines with his own privacy.
Amazon Elastic Compute Cloud (Amazon EC2) provides virtual and scalable
computing systems at the IaaS. Amazon EC2 customers could define instances of a
variety of operating systems (OSs). Each OS and required hardware, such as CPUs and
RAMs could be customized by a customer on the fly. Customers should create an
Amazon Machine Image (AMI) in order to use Amazon EC2. The AMI contains the
12 | CHAPTER 1.
required applications, operating systems (the customer could select various operating
systems such as Windows or Linux versions), libraries, data and system configuration.
Amazon EC2 uses Amazon S3, which is a cloud storage service and stores data and
uploads AMI into S3.
The impact of big data in this service layer is higher than other service models in
cloud computing systems, because IaaS users could access and define the required data
framework, computing framework and network framework.
In a data framework, users could define structured data, unstructured data and semi-
structured data. Structured and semi-structured data could be defined via traditional
databases, such as RDBMS and OODBMS. In these models, structured data stored which
has a schema before adding data to the databases. All of data frameworks and in
particular unstructured data could be defined by cloud databases, such as Hadoop which
is based on MapReduce programming model. MapReduce programming language
technique allows storing data on a cluster of resources. The implementation model of
MapReduce is provided by Hadoop which is provided a category of open-source
database, applications and analytics tools.
In network framework, users have a significant benefit, because they have access to
required network control, such as network cards and the Internet connectivity. For
example, they could access to regular network transfer infrastructure such as Optical
Carrier (OC) 768 backbone (Cartier 2014), which is capable of transferring 39,813.12
Mbit/s.
This accessibility to data, computing and network framework allows the users to
control require hardware like an administrator in IT department. However, these users
could handle infrastructure without worrying about maintenance.
PaaS is a platform that provided by cloud vendor. The PaaS model does not require users
to setup of any software, programming language, environment application, designer,
tools or application. Developers use vendor’s platform, library and programming
language for developing their applications. This model provides a software application
INTRODUCTION | 13
for outgrowth of the cloud applications delivery. PaaS allows developer to focus on
software application development, without worrying about operating system
maintenance like in IaaS. The PaaS provides services for software programmers to
develop and deploy their applications with an abstraction on the hardware layer.
The role of PaaS in handling big data is less than IaaS, because some restrictions and
limitations are applied to PaaS users in order to work on the data framework, computing
framework and transfer frameworks. In this service layer, users are limited to cloud
vendor frameworks. For example, Google App Engine provides a platform which
supports Python, Java, PHP, Go and MySQL compatible Cloud SQL to develop
applications. So, in this service layer, users could not access other languages, such as C#
or C++ and server hardware. However, developers still could build, deploy and run their
scalable applications on the cloud computing systems. These applications could capture
a massive amount of data from anywhere and use a cluster of CPUs for computing and
analytics of big data.
The traditional model of software is to purchase software applications and install them
on the local computer. However, SaaS model provides applications in the cloud though
a network and does not require customers to install applications on their local computers.
and Scalability, which supports all other lower-levels. In addition, this level
supports scalability through architectural design that adds a capability of dynamic
load-balancing for growing or shrinking cloud servers. Most applications in the
cloud are developed at this level.
The impact of SaaS is less than PaaS, because in this service layer, users could use
provided applications and resources. This service layer is limited to developers.
However, users still could work on big data that could be added before or captured by
provided infrastructure. For example, Google Apps, such as Gmail, provides services on
14 | CHAPTER 1.
the web and users could not add or manipulate capturing data from server. Users are
limited to web-based interface for email processes such as sending an email.
The BIaaS layer sits on the top of cloud architecture service layers and aims to provide
the required analytic models for cloud customers.
Cloud computing could provide the following information granularity and granular
computing infrastructures (Pedrycz 2013):
Cloud based BI could reduce the total development cost, because cloud computing
systems provide environment for agile development and reduce the maintenance cost.
Also, the BI could not be implemented on a traditional system, because the current
volume of data for analysis is massive. BI-as-a-Service (Zorrilla et al. 2013) is other
example that shows how the BI could migrate to the cloud computing systems as a
software application in the SaaS layer.
One of the major challenges with traditional computing is analysis of big data. Cloud
computing at BIaaS layer could handle this issue by employing a cluster of computing
resources. For example, SciDB (Cudré-Mauroux 2009) is an open-source and cloud-based
database management system (NoSQL DBMS) for scientific application with several
functions for analyzing of big data, such as astronomy, remote sensing and climate
modeling.
The major service models of cloud computing are BIaaS, IaaS, PaaS and SaaS. As shown
in Table 1.1, we assigned each service to the major service models.
2
http://apache.org/
16 | CHAPTER 1.
Role of Service in
Service name Related to Service Description
Big Data
Business-Process-as-a-Service (BPaaS) BIaaS Automated tool support Analysis of big data
(Accorsi 2011)
Business-Intelligence-as-a-Service (BIaaS) BIaaS Integrated approaches to management support Analysis of big data
(Hunger 2010)
Simulation Software-as-a-Service (SimSaaS) SaaS Simulation service with a MTA configuration model Analysis of big data
(Tsai 2011)
Testing-as-a-Service (TaaS) SaaS Software testing environments Test big data tools
(Candea et al. 2010)
Robot-as-a-Service (RaaS) (Chen et al. 2010) PaaS Service-oriented robotics computing Action on big data
Privacy-as-a-Service (PaaS) PaaS A framework for privacy preserving data sharing Big data privacy
(Itani et al. 2009) with a view of practical application
IT-as-a-Service (ITaaS) IaaS Outsource IT department’s resource (on Grid Maintaining of big
(Foster et al. 2005) infrastructure that time) data
Hardware-as- a Service (HaaS) IaaS A transparent integration of remote hardware that Capturing and
(Stanik et al 2012) is distributed over multiple geographical locations maintaining of big
into an operating system. data
Database-as-a-Service (DBaaS) (1) a workload-aware approach to multi-tenancy Storing big data
(Curino et al. 2011) (2) a graph-based data partitioning algorithm
IaaS
(3) an adjustable security scheme
Data-as-a-Service (Daas) IaaS Analyzing major concerns for data as a service Storing big data
(Truong et al. 2009)
Big-Data-as-a-Service (Zheng 2014) All layers Service-generate for big data Generate big data
INTRODUCTION | 17
3
The description retrieved from each tools’ official website
4
http://ambari.apache.org/
5
http://avro.apache.org/
6
http://incubator.apache.org/chukwa/
7
http://hive.apache.org/
8
http://pig.apache.org/
9
http://spark.incubator.apache.org/
10
http://zookeeper.apache.org/
11
http://www.actian.com/about-us/#overview
12
http://hpccsystems.com/
13
http://orange.biolab.si/
14
http://mahout.apache.org/
15
http://keel.es/
16
http://www.talend.com/
17
http://www.jedox.com/en/
18 | CHAPTER 1.
18
The description retrieved from each tools’ official website
19
http://www.pentaho.com/
20
http://rasdaman.eecs.jacobs-university.de/
21
http://lucene.apache.org/
22
http://lucene.apache.org/solr/
23
http://www.elasticsearch.org/
24
http://developer.marklogic.com/
25
http://www.mongodb.org/
26
http://cassandra.apache.org/
27
http://hbase.apache.org/
28
http://www.objectivity.com/
INTRODUCTION | 19
an IT department to migrate from the traditional model to the cloud computing system
and does not require data to be migrated to another location (such as cloud vendor
location). This model is implemented for local trusted users. This model still allows
scalability, on-demanded self-service, and elastic service. However, this model requires
high investment in maintenance, recovery, disaster control, security control, and
monitoring.
Several open source applications have been developed for establishing private cloud
computing based on IaaS and SaaS service layers. For example, CloudIA is a private cloud
computing system at HFU (Doelitzscher et al. 2011). The targeted users of the CloudIA
project are HFU staff and students running e-Learning applications, and external people
for collaboration purposes.
The public model is a regular model of cloud computing system. This model is
provided by cloud vendor who supports billing and a subscription system for public
users. This model, unlike a private model, does not require high investment, because
consumers could pay on pay-per-use basis for cloud storage or cloud computing services
on demand.
The hybrid model composes private and public clouds. This model could connect a
private cloud to public cloud through network connection, such as the Internet.
Scalability: This model also is useful for extending the scalability of a private cloud
computing system, because in case of limited resources at a peak time, a cluster of
new resources could be added temporary from another cloud.
When big data costs customers, and a system disaster could cause organizational
destruction in the digital age, migration applications and databases from traditional
model are difficult to cloud, because:
Cloud customers need to have a contract with one or more cloud vendor(s) -often one
cloud vendor- and they should use the provided operating systems, middleware, APIs
and/or interfaces. Data and application are dependent on the platforms or are provided
by cloud vendor infrastructure. This dependency in cloud services has several issues. For
example, “Security” is the major concern in cloud computing systems. Cloud features,
such as a shared resource pool and multi-user/tenancy causes security issue because the
resourced pool are shared through users and we could expose users’ data and users’
privacy to others.
Unsecured connection to the vendor, network access security, Internet access security
and cloud vendors’ user security emerged as other major security concerns based on
accessibility to the cloud via the Internet.
“Bringing back in-house may be difficult” with 79.8% issue rate and “Hard to integrate
with in-house IT” with 76.8% issue rate indicates customers are afraid of data and software
application migration to the cloud computing systems, because the migration is difficult
to integrate with IT departments and it is difficult to return data back to the IT
department; “Lack of interoperability standards” with 80.2% is another cloud issue. This
issue shows that cloud computing requires higher interoperability with other cloud
computing systems; also as indicated in this report, “Not enough ability to customize” with
INTRODUCTION | 21
a 76.0% issue rates show, the cloud computing system requires dynamic architecture and
customization.
Some studies, such as (Juve 2009) show existing cloud computing systems (Amazon
EC2 in this case) could not be responsible with a cost-effective performance for HPC
applications over using tightly-couple hardware, such as Grid Computing or Parallel
Computing systems.
Chapter Summary
In this chapter, we discussed a definition of big data, the importance of big data, and
major big data challenges and issues. We understand that, if we analyze big data with
business intelligence tools, we may provide a catalyst to change an organization to a
smart organization. We discussed the importance of cloud computing technology as a
solution to handle big data for both computing and storage. We reviewed the capabilities
of cloud computing systems that are important for big data, such as resource scalability,
resource shrink-ability, resource pool sharing, on-demanded servicing, elastic servicing,
and collaboration with other cloud computing systems. We explained cloud architecture
service layers and role of each service layer to handle big data. We discussed how
business intelligence could change big data to smaller valuable data by using cloud
computing services and tools. Finally, we discussed major cloud computing system
issues that need to be addressed for cloud computing to become a viable solution for
handling big data.
22
Chapter 2
In the previous chapter, we describe the definition of big data. We introduce a set of
cloud-based tools for collecting and analyzing big data. We also define the architecture
of cloud computing which is divided into multiple layers. This chapter summarizes some
critical challenges of cloud architecture as well as our proposed dynamic architecture to
overcome the issues.
Introduction
Cloud computing is based on a distributed and parallel computing systems that provide
elastic storage resources and computing resources over the Internet. As described in the
previous chapter, the cloud computing paradigm allows customers to pay for their
resource usage based on pay-per-use model, and enables customers to scale their storage
and computing resources up or down on- demand.
Motivation
Cloud computing services relies on the vendor infrastructure. This dependency causes
several issues which are described in (IDC Enterprise Pnael 2009; Moreno-Vozmediano
et al. 2013; Sasikala et al. 2013). For example, according to the IDC Survey (IDC Enterprise
Pnael 2009) 79.8% people say “Bringing back in-house may be difficult” is another issue and
DYNAMIC CLOUD ARCHITECTURE | 23
76.8% people say “Hard to integrate with in-house IT” is an issue. These issues indicate
consumers are afraid of migrating to cloud computing systems because the migration is
difficult to integrate with IT department services and it is difficult to return the data back
to the IT department. The survey shows 80.2% of people say “Lack of interoperability
standards” is another concern. Thus, cloud computing requires interoperability with other
cloud computing systems; also as indicated in this report, 76.0% of respondents answer
that “Not enough ability to customize” is an issue. Similar significant concerns around cloud
computing are reported recently in other studies (Moreno-Vozmediano et al. 2013;
Sasikala et al. 2013, Shayan 2013). Furthermore, all of these concerns show that cloud
computing systems require flexibility in defining a variety of services that meet specific
cloud users’ requirements. The flexibility in defining services can be implemented by a
customizable architecture that allows a vendor to define a service for each group of users.
Cloud vendors provide several services to their customers through a general multi-tier
architecture (IaaS, PaaS and SaaS). Although this architecture is useful for several
customers’ requests, customers may have own specific request. Customers should adapt
his request based on offered services, because each offered service intends to satisfy
unique user requests. For example, when a customer requests a service in PaaS for
developing an image processing application, the customer has the same accessibility to
Application Programming Interfaces (API)) as other customers who develop a web
mining application on a cloud. However, an image processing application requires
specific functions (e.g., spatial transformations) that are different in type and not useful
for a web mining application that requires more specific network functionality (e.g.,
spatial indices). This example shows that the customization of a service by a cloud vendor
allows a cloud vendor to provide unique service to each customer. A customized service
allows customers to have a simple system or API rather than a complex system or a
complex API that intends to satisfy different users’ demands. For example, a cloud
vendor could define a customized service that only satisfies a small group of partners or
users, such as a group of users who only need Voice-over-IP service (VoIP) in a cloud
computing system.
Another concern in cloud computing is an increasing demand for the introduction and
migration of a variety of services to cloud computing systems. Although each service
provides a new feature, such as Simulation-as-a-Service (Tsai et al. 2011) or Robot-as-
a-Service (Chen et al. 2010), it aggravates migration issues and complexity issues due to
the lack of standardization and customization, respectively because each cloud-based
service has its own features, requirements and output. For example, Robot-as-a-Service
provides a platform to control robot devices through a cloud computing system. This
24 | CHAPTER 2.
Related Work
Currently, we do not have a generally accepted standard for cloud computing. Unlike the
Internet which was developed by the U.S. government agencies (Kaufman 2012), such as
ARPA (Leiner 2009), cloud computing has been developed by several open-source groups
and leading business companies, such as Microsoft and Amazon. Therefore, several
independent cloud architectures have been developed.
To the best of our knowledge, no effective architecture exists that supports dynamic
customization. As previously discussed, the lack of ability for customization is one of the
major issues in existing cloud architectures. This drawback of existing cloud architecture
creates other issues, which are discussed in Section 2.2, such as migration issues. We have
several solutions to overcome this drawback by implementing customization at different
level of cloud computing systems. As shown in Figure 2.1, we divided customization of
cloud computing systems into conceptual level, architecture level and implementation
level. In the following section, we review related work in each level of customization.
adoptable because authors did not provide the specific detail of implementation methods
for a diverse environments.
(i) Service-Oriented Architectural (SOA)-based (Perrey et al 2003): (Tsai et al. 2010) provided
SOCCA which is a combination of Enterprise SOA style and cloud style and Zhang et al.
provide CCOA (Zhang et al. 2009) architecture based on SOA with a scalable service
feature, but these cloud architectures do not provide customization on each service layer;
(ii) Cloud Reference Architecture (CRA) (Liu et al. 2011)which is developed by NIST. This
architecture has five primary actors: Cloud Service, Consumer, Cloud Service Provider,
Cloud Broker, Cloud, Auditor and Cloud Carrier;
(iii) Open forums, such as OGF Open Cloud Computing Interface (Metsch et al. 2010),
Cloud Computing Interoperability Forum (CCIF)29, Deltacloud (Bist et al. 2013), DMTF30,
Open Stack (Bist et al. 2013), Open Cloud Consortium31 and Open Cloud Computing
Interface (OCCI)32 (Grossman et al. 2010).
The idea behind most of these open source clouds is to provide a common interface that
includes major cloud platforms. However, in this chapter, we propose an architecture
that allows vendors to define and implement their own specific service through a
29
Available from: http://www.cloudforum.org/
30
Available from: http://dmtf.org/standards/cloud
31
Available from: http://opencloudconsortium.org
32
Available from: http://occi-wg.org/about
26 | CHAPTER 2.
standardized layer cross all other vendors’ platforms. In the proposed architecture, the
vendor uses a layer to provide standard services to their customers. The vendors are not
required to modify their platform and they can provide an extension layer on the top of
their cloud platform.
Existing cloud architectures do not provide any solution for facilitating different services.
In addition, existing cloud architectures are static and could not easily provide a
customization on services.
The CCM has several drawbacks. For example, if an architecture is non-functional, then
the implementation model cannot provide an efficient model. For example, limitation on
network access at PaaS layer it causes limitation on CCM application (i.e., the lack of
accessibility to a protocol). Implementation of CCM also has some drawbacks because the
model depends on the architecture with specific requirements, such as type of
programming language. These issues show disadvantages of a cloud architecture could
be caused issue in the implementation.
A dynamic architecture for cloud computing allows cloud vendors to customize their
services. As shown in Figure 2.2, the architecture is based on SOA. The SOA features
enable an architecture to provide several independent services that work together as a
system and can be run on different cloud computing systems. The proposed architecture
can customize value-added cloud services (offered resources on a cloud computing
system). In the proposed architecture, a dynamic layer represents all heterogeneous
services, and it can customize services on-demand.
A cloud vendor defines several different services on-demand at DTSL. Each defined
service is a Template which is integrated with one or multiple value-added cloud services.
Cloud vendors can set up, configure and provide different templates to their customers
based on different value-added service layers in a cloud computing system. As
illustrated in Figure 2.3, a template at the back-end of DTSL is dynamic, and it interacts
with one or multiple value-added cloud services.
SaaS
Front-end
End-user
Back-end
PaaS
IaaS
Cloud Developer
DTSL
A cloud vendor can define several templates at FTaaS where each template provides
cloud services to end-users. The FTaaS allows different vendors to define the same
template to their customers. This feature provides independent value-added service to
customers who need data and applications migration from one cloud to another cloud.
Cloud vendor are able to define their own service layer with a BTaaS. The BTaaSs differ
from vendor to vendor and they provide a transfer from heterogeneous services to general
templates. For example, in Figure 2.4, if two vendors (V1 and V2) provide different IaaSs
DYNAMIC CLOUD ARCHITECTURE | 29
(IaaS1 and IaaS2), each vendor can provide a template as IaaSx at FTaaS. BTaaS in V1 is
different from BTaaS in V2. Both vendors should use his own BTaaS to configure the back-
end of his IaaSx.
The dynamic customization feature of the BTaaS layer enables a cloud vendor to
customize their own services and it provides standard services through the templates.
A cloud vendor can edit a layer by adding, editing or removing a template as shown
in Figure 2.5. In this figure, rows represent cloud-value added services (traditional
services layers) are static, such as IaaS, PaaS and SaaS or other service layers, such as
future services. In this figure, columns represent templates and are dynamic that can be
defined by a cloud vendor on-demand. Four templates are defined in Figure 2.5. For
example, a user who has access to T1 can use SaaS layer, or a user has access to T3 can
access to SaaS and PaaS layers.
The templates can be implemented for any kind of cloud services and traditional services.
For instance, a cloud vendor can define several services as a template, such as Business-
Intelligence-as-a-Service (BIaaS) and IaaS. In this case, Figure 2.5 will be changed and
rows represent IaaS and BIaaS layers, and columns can be defined by a vendor.
The number of columns is dynamic, and is defined by a vendor. Each column stands for
a template. The vendor defines several templates which make use of resources in one or
multiple layers in a cloud computing system. For example, in Figure 2.5, T1, T2, T3 and T4
are cloud templates (orange colors). T1 interacts with SaaS value-added service layer, T2
interacts with all value-added service layers in a cloud computing system, T3 interacts
with two value-added service layers (SaaS and PaaS) and finally T4 interacts with two
lower-level value-added service layers (PaaS and IaaS).
The customer groups include end-users, developers and third-party users (with end-user,
or developer role). They use Cloud Client Dashboard for interacting with FTaaS to use cloud
30 | CHAPTER 2.
resources. Each user has an option for working on several cloud value-added services
simultaneously by interacting with a template. For instance, a developer who uses T4
template in Figure 2.5, can work on PaaS and IaaS simultaneously. The developer can work
on IaaS to install a new application (App1) on the server and she has access to PaaS
simultaneously for developing a Mashups application which is required App1.
Customizable architecture
The dynamic component of the proposed architecture (DTSL) allows cloud vendors to
modify and customize their cloud architecture on demand. This customization improves
cloud architectural issues, such as lack of usability of cloud computing because a cloud
vendor defines a new template that covers several services for enabling customers to have
an integrated service. This offer will be more attractive for a variety group of users
because a vendor is able to provide different customized services via different templates.
For example, in traditional cloud computing systems, a telecom (Pal et al. 2011) user who
32 | CHAPTER 2.
needs one or more network functions should find a cloud vendor who provides IaaS and
subscribe to this service. However, a cloud vendor can define multiple services (e.g., a
VPN service and a storage service) in a template for a group of users, such the telecom
user.
Dynamic Abstraction
The proposed architecture abstracts and encapsulates higher-level service layers from
lower-level service layers by defining a template in DTSL that exposes lower-level services
to advanced customers, and expose higher-level services to regular users or a customized
service from both levels to a group of users. For example, in Figure 2.5, a vendor offers
template T4 to advanced customers who need service in PaaS and IaaS service layers. The
reason for this exposure is to improve flexibility and accessibility for some customers
who need access to different and multiple services.
The DTSL facilitates the customers’ migration to the cloud and return back to the in-house
IT department because a cloud vendor can provide a template at DTSL that has the similar
features to in-house IT or other cloud vendors. For example, in Figure 2.5, customers who
interact with T4 can access to IaaS to setup an operating system, and use a cloud platform
simultaneously.
Security
The DTSL divided into FTaaS and BTaaS. This segmentation improves cloud security
because customers have access to the FTaaS services layer and this layer is isolated from
other value-added cloud services. This isolation makes the DTSL more secure. In
addition, any data security and privacy methods can be implemented as a template in the
DTSL. For instance, Chapter 7 describes a new template for electronic healthcare systems
along with its implementation for maintaining data privacy.
Standardization
One of the major issues in cloud computing is a lack of standardization because this
problem causes vendor lock-in issue. However, DCCSOA provides a dynamic service
layer (DTSL) to enable different vendors to offer the same front-end (FTaaS) service layer.
When different cloud vendors provide the same FTaaS to their customers, the customers
could transfer their data and applications to other vendors, or they can transfer data and
applications to their private clouds through defining a similar FTaaS.
literature did not provide information related to a feature of their platform, or they did
not consider the feature. We use the following features in our comparison:
customization and standardization with minimal modification to the architecture and
services;
the capability of supporting interoperability;
support new cloud-based services. In Table 2.1, each row represents a study or a
product of a conceptual model, a cloud architecture, a cloud platform or a tool. Each
column represents the following items: the level of customization that indicates the
ease of customization with which vendors could customize their own architectures;
the level of standardization indicates the level of modifications is needed to provide
a standardized cloud computing system; and the last column represents different
service capabilities that show which architecture, platform or tools could support
future services with ease.
Low level Customization indicates customization at the Implementation Level because each
application or product requires to be modified. For example, MC provides a solution to
modify each service to provide a customize cloud computing system. Medium Level
indicates customization at the Conceptual Level and Architecture Level (unadoptable)
because both architecture and the existing applications are required to be modified. For
example, CRA provides a new architecture without adopting new features with the
existing architecture. High Level indicates customization at the Architecture Level with
adopting new features with the existing architecture. For example, CCIF provides
adoptable services through a uniform cloud interface. This level requires minimal
modifications to achieve customization with standard model. The solutions of interest
are high level customization because they provide customized services with minimal
modifications to the existing architectures (conceptual level) and existing services
(implementation level). DCCSOA provides an independent service (TaaS) to provide
customization on existing services. DCCSOA is not required to modify the existing
architecture or the existing services to achieve customization with minimal modifications.
DCCSOA is only required to modify and adopt the templates of each service.
Low Level standardization represents the maximal modifications to major cloud services
to provide a standard service between different cloud computing systems. For example,
all components are required to be modified in CCM to provide a standardized cloud
computing system. High Level indicates standardization with less modifications. For
example, CCIF provides a solution for standardization through a uniform cloud interface.
The solutions of interest are high level standardization because it does not require
modifying the existing architecture or existing services to achieve standard cloud
DYNAMIC CLOUD ARCHITECTURE | 35
computing. DCCSOA
Table provides between
2.1 A comparison differentdifferent
templates at architectures
cloud the FTaaS to provide
and a uniform
cloud platforms
*aaS
Level of Level of
Cloud Architecture Level of Interoperability Suppo
Customization Standardization
rt
Conceptual
Low
Level
Cloud
Computing High (if all vendors
Low (via Unified Cloud
Interoperabilit High implement Unified ×
Interface)
y Forum Cloud Interface)
(CCIF)1
Deltacloud Medium (at API High (via REST-
× ×
[19] level) based API)
Medium (via
High (via message Medium (via message
DMTF 2 Common ×
exchange) exchange)
Information Model)
Open Cloud Medium (at API High (via REST- Low (via Unified Cloud
×
Consortium3 level) based API) Interface)
Open Cloud
Computing Medium (at API High (via REST- Low (via Unified Cloud
×
Interface level) based API) Interface)
(OCCI) [20]
Platform
Open Stack × ×
Medium ×
[19] (private cloud) (private cloud)
36 | CHAPTER 2.
interface for different types of the existing services that cloud be implemented in different
cloud vendor’s systems with different architectures.
The last column shows the capability of cloud-based services. Other related work
(methods, platforms and architectures) did not consider this feature as a part of their
proposed solution. DCCSOA allows a cloud vendor to define, deploy, customize and
standardize new services via FTaaS and BTaaS. DCCSOA enables a cloud vendor to add
new services by implement a heterogeneous service and adopt the service at the front-
end layer (FTaaS) to provide a customizable and standardized service with minimal
modifications and with ease. This comparison shows our proposed architecture
(DCCSOA) allows vendors to define a dynamic, standardized and customizable cloud
architecture with the capability of supporting interoperability. DCCSOA requires
minimal modifications to the architecture and services with maximal the customization.
These major topics are divided into minor evaluation items as shown in Table 2.2.
Icon “☺” in Table 2.2 shows the advantage of the selected parameter topic in DCCSOA.
For instance, the proposed method in fine-grained services topic provides advantage in
flexibility feature. More details about each parameter can be found in (Bianco et al. 2007).
We consider the following general scenario to evaluate the proposed method:
DYNAMIC CLOUD ARCHITECTURE | 37
Scenario SC1: “User U1 uses a platform as follow: P1 runs on the top of Cloud1 to
provide service S1, and U1 is willing to transfer data and application to P2 which is running
on the top of Cloud2 for the same service. When U1 needs to transfer data and applications
from P1 to P2, administrator of P2 needs to define the same service on P2. Both platforms
(P1 and P2) are bound to the target cloud vendor services (S1 and S2).”
Summary of chapter
In this chapter, we proposed a new Dynamic Cloud Computing Service-Oriented Architecture
(DCCSOA). The proposed architecture addresses the most existing cloud computing
issues, such as data and applications migration between different clouds, transfer to
cloud, or return back to in-house IT, data and applications lock-in issues, and a lack of
standardization and customization. DCCSOA provides a dynamic and customizable
service layer (DTSL). The DTSL provides simplicity these issues by defining a layer,
template, with the same feature in DTSL. A template is divided into front-end (FTaaS) and
back-end (BTaaS) layers. The defined templates can be customized by a cloud vendor for
different groups of users. DCCSOA also allows different cloud vendors to provide the
similar cloud services through a template that meets a standardization between different
cloud computing systems. We discussed how the proposed architecture supports existing
and future services by using the DTSL at BTaaS that can be configured to a specific cloud
services. We evaluated the proposed method based on SOA evaluation. The result shows
that the proposed architecture, DCCSOA, provides several advantages over existing
cloud architectures and platforms, such as minimal modifications for providing
standardization and customization. Chapter 7 describes a new template for electronic
healthcare systems along with its implementation for maintaining data privacy.
38 | CHAPTER 2.
Chapter 3
Introduction
Cloud computing paradigm refers to a set of virtual machines (VM) that provide
computing and storage services through the Internet. Cloud computing uses
virtualization technology to provide VMs on the top of distributed computing systems.
Cloud computing provides several advantages over traditional in-house IT (Singhal et al.
2013), such as on-demand services, pay-per-use basis and elastic resources which have
rapidly made cloud computing a popular technology in different fields, such as IT
business, mobile computing systems and health systems (Adibi 2013, Rodrigues 2012).
(Huang et al. 2011) defines Mobile Cloud Computing (MCC) is rooted from mobile
computing and cloud computing paradigm that allows mobile users offload their data
and process data on cloud computing. MCC provides a big data storage for mobile users
with a lower cost and more ease of use. As of today, some popular MCC cloud vendors
for storage service, such as Google and Dropbox, provide a free standard storage service
with 2 GByte and 15 GByte capacity, respectively33.
MCC provides an online massive storage for mobile users, but it aggravates the user
data privacy issues because users have to trust third-parties (cloud vendors and their
partners). For instance, in a recent study, (Landau 2014) reports some challenges toward
privacy issues when users trust cloud vendors, and the vendor shares data with a third-
33
Recorded on November 13, 2014 from https://drive.google.com and https://www.dropbox.com
40 | CHAPTER 3.
party or other unauthorized users could have access to these data. In another study,
(Kumar et al. 2010) suggest that not all MCC applications can save energy on mobile
devices by offloading data. Finally, in an earlier study on cloud computing, (Ristenpart
et al. 2009) show a VM can be a vulnerable component when users use cloud shared
resources.
The rest of the chapter is organized as follows: in Section 3.2, we briefly review
background materials for this study. In Section 3.3, we present the proposed method and
its requirements. In Section 3.4, we implement the proposed method and present its
experimental results. We also present a statistical security model of the proposed method
to show the level of the security. In Section 3.5, we investigate different scenario attacks
against the proposed method. In Section 3.6, we review a comparison between the
proposed method and existing methods. Finally, in Section 3.7, we conclude the proposed
method which is described in this chapter.
Background
In the proposed method, we use JPEG file format as a case study. In this section, we
briefly review the background of JPEG file format and its encoder/decoder requirements.
DATA PRIVACY PRESERVATION IN CLOUD | 41
Each JPEG file has a header file that defines metadata, such as the canvas of a JPEG
file with a specific dimension (width and height), resolution, camera information, GPS
information, and compression information.
Each JPEG file (including the header and the content) consists of several segments
and each segments beginning with a marker. A raw data of a JPEG file includes several
markers (ITU 1993). For example, each JPEG file begins with “0xFF0xD8” marker that
represents this binary is a type to image and it allows an image viewer application to
decode the binary file and show the image. Each marker begins with ‘0xFF’. Table 3.1
describes some important markers.
A JPEG image encoder uses a lossy form of compression which is based on the
discrete cosine transform (DCT) method (ITU 1993) to compress an image. A sequence
byte of raw JPEG image file contains a multiple chunk of Minimum Codded Unit
(MCU) as described in (ITU 1993). Each MCU block stores 4*4 pixels of an image.
● The privacy model must satisfy a balance between computation overheads and
maintaining the security.
● Unlike default MCC offloading methods that submit original files to MCC for
encryption, the proposed method can be ran on mobile devices to provide data
privacy, and then, the protected result will be stored on MCC.
The proposed method splits files into multiple files and uses a pseudo-random
permutation to scramble chunks in each split file. The proposed method reads a JPEG file
as a binary file rather than using a JPEG encoder/decoder to protect each pixel or the color
of each pixel.
In this method, we have two phases to split files into multiple files and to recombine
the files as follows:
● Disassembling of an image that splits an image file into multiple binary files,
divide the original file into: (i) one file that contains the header of the original
file, and (ii) multiple files that contain the content of the original file. The
content of each split file consists of multiple chunks of original file. Chunks
distribute through multiple files based on a Pattern, chunks in each file
randomly scramble by using the chaos system. The output of this phase (split
files) will be stored in MCC(s).
● Assembly of split files that recombine all split files to reorganize the original file.
In this phase the following steps will be proceed: (i) read all scramble files from
MCC(s); (ii) using the chaos system random arrays (which are used at the first
phase) to reorder the chunks in each split file; and (iii) use the Pattern to
reorganize the original files.
We divide the original JPEG file into header and the content because the header of
the original file carries some important privacy information. This division also provides
a complex method for recombining split files because it removes important JPEG markers
from the original image file.
𝑗=𝑐𝑚𝑎𝑥
where 𝑐𝑚𝑎𝑥 represents the maximum number of chunks in 𝐹𝑖𝑙𝑒𝑖 and is defined as
follows:
𝑆𝑖𝑧𝑒𝑖 (2.)
𝑐𝑚𝑎𝑥 = ⌈ − 𝐻𝑆𝑖𝑧𝑒𝑖 ⌉
𝐵𝑢𝑓𝑓𝑒𝑟
where 𝑆𝑖𝑧𝑒𝑖 represents the size of 𝐹𝑖𝑙𝑒𝑖 (Byte), 𝐵𝑢𝑓𝑓𝑒𝑟 represents the size of chunks (Byte),
and 𝐻𝑆𝑖𝑧𝑒𝑖 represent the size of the header of the original 𝐹𝑖𝑙𝑒𝑖 (Byte).
Figure 3.1, illustrates a general view of the proposed method that allows a mobile
device to split an original JPEG file into header file and three multiple files. The split files
are submitted to two MCCs. A user can configure his/her application to set: (i) the number
of split files, (ii) the size of chunks, and (iii) the cloud user account(s) information to
upload files on MCC(s).
We use the JPEG markers to split the header of the original file from the content and
to find important JPEG markers. If someone has an access to split files to assemble
Chunk 2
Chunk 4
Cloud Vendor 2
different files without accessing to the header, or if the person creates a new header for a
JPEG file, still he/she cannot simply retrieve the file. For example, Figure 3.2 shows two
pictures (a) and (b), with the same size and the same resolution that took by a smart
phone. The last right frame (c) shows the result of assembling the header of the first file
and the content of the second file. As shown in this Figure 3.2, only the size and the
resolution can be retrieved and still the assembler cannot retrieve the content of the
image. To protect an image from a person who attempt to assemble a part of image by
assembling files, first, we use a pattern to distribute chunks of the image to different files.
The pattern can be defined as an input by a user to indicate, how to distribute a sequence
of bytes in a split file. Second, we use chaos theory to randomly distribute each chunk in
each split file.
Figure 3.2. Split the header of files and substitute header of file #1 with file #2
The proposed method uses two steps to disassemble an original JPEG file to a
number of chunks, and to scramble randomly each chunk in each split file as follows:
3.3.2 Pattern
The proposed method divides each file into 𝑐𝑚𝑎𝑥 chunks (binary codes) and it distributes
chunks to multiple split files in different order as will be discussed in the next phase. In
this phase, the method use pattern that aims to distribute chunks to multiple split files. A
pattern can be defined as a key by a user or it can be selected randomly as a predefined
method. A user can define different patterns to provide different strategy for distribution.
For example, Figure 3.3 shows two different patterns for disassembling a JPEG file. In
this figure, the original file includes a header, and nine chunks of content. In 𝑃𝑎𝑡𝑡𝑒𝑟𝑛𝐴 ,
the proposed method reads two consecutive chunks from the original file and stores each
chunk in one of two files. The first chunk stores in 𝐹𝑖𝑙𝑒1,2 and the second chunk stores in
𝐹𝑖𝑙𝑒1,3. At the end of reading 𝑆𝑜𝑢𝑟𝑐𝑒 𝐹𝑖𝑙𝑒1, 𝐹𝑖𝑙𝑒1,2 contains {𝐵1 , 𝐵3 , 𝐵5 , 𝐵7 } and 𝐹𝑖𝑙𝑒1,3
contains {𝐵2 , 𝐵4 , 𝐵6 , 𝐵8 }. In Figure 3.4, 𝑃𝑎𝑡𝑡𝑒𝑟𝑛𝐵 shows another pattern to store chunks. In
this pattern, each four consecutive chunks of original file store in two files. The first and
DATA PRIVACY PRESERVATION IN CLOUD | 45
the forth chunks store in 𝐹𝑖𝑙𝑒2,2 ; the second and the third chunks store in 𝐹𝑖𝑙𝑒2,3. At the
end of reading 𝑆𝑜𝑢𝑟𝑐𝑒 𝐹𝑖𝑙𝑒2, 𝐹𝑖𝑙𝑒2,2 contains{𝐵1 , 𝐵4 , 𝐵5 , 𝐵8 , 𝐵9 } and 𝐹𝑖𝑙𝑒2,3 contains
{𝐵2 , 𝐵3 , 𝐵6 , 𝐵7 }. If someone has an access to all contents, still he/she cannot assemble files
because he/she needs an access to the pattern as a key to assemble split files.
1
0.9
0.8
0.7
0.6
0.5
𝑷𝒌
0.4
0.3
0.2
0.1
0
151
181
211
241
101
111
121
131
141
161
171
191
201
221
231
251
261
271
281
291
301
1
11
21
31
41
51
61
71
81
91
Iteration
The proposed method uses the following set to provide a non-convergent, non-
periodic pseudo random numbers:
{𝑃𝑘 }𝜔
𝑘=0 (4.)
The proposed method uses the following equation to find the location of 𝐶ℎ𝑢𝑛𝑘𝑘 in
a split file:
Reading and writing a JPEG file as a binary adds more complexity to retrieve the
image by unauthorized users. For example, let’s assume that the original file has two
consecutive bytes (i.e., ‘FFD8’ that indicates start of an image). First, we split these two
consecutive bytes into two chunks (i.e., ‘FF’ and ‘D8’). Second, distribute chunks in
different location in a file (i.e., at 0 position and 2047 position); then a JPEG decoder
cannot retrieve this file because the decoder cannot find JPEG makers. In this case, the
JPEG decoder or an operating system cannot understand this binary file is a type of JPEG
format.
The best option for selecting the buffer is: when Buffer 𝑚𝑜𝑑 2 = 1 that splits two
consecutive bytes into two chunks.
The procedure (Equations 6-11) extends the upper bound collisions and the lower
bound collisions to upper and lower available addresses, respectively. If 𝑃𝑜𝑠𝑘−1 < 𝑃𝑜𝑠𝑘
then, the procedure finds an upper level available position. If 𝑃𝑜𝑠𝑘−1 > 𝑃𝑜𝑠𝑘 then, the
procedure finds a lower level available position. Some exceptions are listed as follows: (i)
k is the maximum address number: the procedure finds a maximum position of l where
k>l; (ii) k is the minimum address number: the procedure finds a minimum position of l
where k<l; (iii) there is no any available upper bound position: the procedure finds the
maximum number of l where k<l; (iv) there is no any available lower bound position: the
procedure finds the minimum number of l where k>l. Figure 3.5, shows the result of
equations (6-11) when 𝜇 = 3.684 and 𝑃0 = 0.9999.
𝑃𝑜𝑠𝑘 𝑃𝑜𝑠𝑙
testing. Another approach is statistical model that describes a deviation between the
original and the output. This section presents these two approaches to evaluate the
proposed method.
3.4.1 Implementation
We use the proposed method to disassemble and store 21 JPEG files with different sizes
that were taken from a smart phone. We compare the result of assembly phase against
encryption methods and disassembly phase against decryption method.
To evaluate the proposed method, we select three different values of 𝜇 and we select
the same other initial values as follows: 𝑃0 = .9999, the maximum size of input file is 3057
Kbyte (the maximum size of our dataset images is described in the Section 3.4.2) and the
size of each chunk (buffer size) is 10 Kbyte. Figure 3.7 shows a deviation of the original
chunk position and the same chunk position in scrambled file with different parameters
of (𝜇) values.
800
600
400
200
0
21.79
97.04
158.87
195.03
360.43
409.84
469.79
584.52
631.45
673.02
699.14
859.10
1223.43
1233.34
2192.80
2199.41
2522.70
2641.92
2649.00
2797.74
2935.09
Size of file (KB)
60
40
20
0
21.80
97.05
158.88
195.05
360.44
409.84
469.80
584.53
631.45
673.03
699.14
859.11
2192.81
1223.44
1233.34
2199.42
2522.70
2641.92
2649.00
2797.75
2935.09
In Figure 3.7, X-axis represents eight selected 𝑃𝑜𝑠𝑘 and 𝑃𝑜𝑠𝑘+1 and the Y-axis
represents the deviation of position of 𝑃𝑜𝑠𝑘 and 𝑃𝑜𝑠𝑘+1 in scrambled file. As shown in
this diagram, each two consecutive positions in a scramble file has different deviations.
DATA PRIVACY PRESERVATION IN CLOUD | 51
The different values of 𝜇 provide different models of file scrambling. As shown in this
figure, the deviation of each curve are different for different initial values of 𝜇.
We also compare the position of each chunk in the original file and in the scrambled
file. We use the configuration of Figure 3.7 and we assume the original file is scrambled
in one file that includes the header and the content of the original file. Figure 3.8, shows
the deviation of a chunk position from the original file to the new position in the
scrambled file. As shown in this figure, each chunk is relocated to different locations in
scrambled file and the value of each chunk is vary chunk to chunk, randomly.
200
150
100
50
0
0-1 35-36 76-77 114-115 150-151 188-189 229-230 305-306
Figure 3.7. A deviation of chunks positions in scramble file with different parameter values
300
250
𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
200
150
100
50
0
0 25 50 75 100 125 150 175 200 225 250 275 300
𝑇ℎ𝑒 𝐶ℎ𝑢𝑛𝑘 Number
Figure 3.8. A statistical deviation of position in original file and scramble files
An attacker requires assembling all split files and all chunks in each split file to
retrieve an image. The proposed method provides a scrambled binary file that provides
an obstacle for JPEG encoders to retrieve images because a JPEG encoder requires specific
markers which are scrambled through split files.
We assume that the attacker wants to retrieve a JPEG file with a 2 MByte size and the
attacker uses an Intel CPU i7 4770K with 127273 MIPS at 3.9 GHZ. The following
scenarios can be implemented against the proposed method:
3.5.1 Scenario 1
Assumptions: the attacker who has access to all split files but dos not know {𝑃𝑖 } and the
size of each chunk
In this scenario, since the attacker does not have information of {𝑃𝑖 }, the attacker must
run a brute-force attack to assemble split files and reorganize the scrambled chunks in
split files. It requires minimum 𝑂((𝑛! − 1) − 𝜕), where 𝜕 is the number of similar bytes in
all files and 𝑛 is the size of file (byte). In this case, the attacker needs to try2.229077716E +
9381 permutation combinations to reconstruct the image (𝜕 ≈ 0).
Using this scenario with this CPU configuration to retrieve an image is impossible
and it is required 2.027100061111045012201E+9371 years for processing the computations.
3.5.2 Scenario 2
Assumptions: The attacker has access to all split files, knows the size of each chunks
(10Kbyte) but does not know {𝑃𝑖 }.
In this scenario, the attacker needs to run a brute-force attack but the computation on size
of the scrambled files (3070 Kbyte) is divided to the size of each chunk (10Kbyte). In this
case, the attacker needs to try a minimum of 307! − 1 permutation combinations to
reconstruct scrambled file, needs an impossible computation
(6.677321883507716595116E+621 years) to compute all permutation combinations.
3.5.3 Scenario 3
Assumptions: The attacker has access to all split files, knows the size of chunks
(10Kbyte) but does not know the method is based on chaos system.
In this scenario, the attacker needs to use a brute-force attack against the proposed
𝑛
method that requires a computation with 𝑂(10 ! − 1) − 𝜕), where 𝜕 is the number of
DATA PRIVACY PRESERVATION IN CLOUD | 53
similar bytes in all files and 𝑛 is the size of the original file (3070 Kbyte). In this case, the
attacker needs 𝑂(307!) computation.
3.5.4 Scenario 4
Assumptions: The attacker has access to one file of the multiple split files.
In this scenario, since each two consecutives chunks stores in different clouds (i.e., using
Pattern A in Figure 3.3), the attacker only could retrieve a part of an image by using a
brute- force attack. We can estimate the probability of finding the size of each chunks as
follows:
Related Works
Several studies (Lian et al. 2004, Podesser et al. 2002, Choo et al. 2007, Ye et al. 2010, Ra et
al. 2013) have been conducted to image encryption. Unlike these studies that address
encryption methods based on JPEG encoders, our proposed method provides a light-
weight data privacy based on binary file. As described in Section 3.3, using JPEG encoders
has computation overhead for mobile devices, such as smart phones. Our proposed
method uses the binary file rather than using encrypt method on image pixels or color of
a pixel.
(Podesser et al. 2002) proposed a selective encryption method for mobile devices to
cipher a partial of an image. However, in this method, still a partial of image is visible to
everyone. (Choo et al. 2007) proposed a light-weight method for real-time multimedia
transmission. Although this method provides an efficient performance over AES
encryption, the method still needs heavy computation on a smart phone to cipher an
average 3 Mbyte JPEG image file in a real-time (see Section 3.5 for the comparison).
54 | CHAPTER 3.
Unlike existing studies, our proposed method provides three steps to reconstruct a
JPEG image file as follows:
i) Recognizing the type of the scrambled files: It is difficult to understand the type of
file because the header of file (includes JPEG marker) splits from the content of file
and all the chunks are scrambled in each file.
ii) Finding and assembling the scrambled files: Since the split files distributed through
two or multiple clouds, it is impossible to reconstruct an image file completely.
iii) Reconstructing the original file from split scrambled files: Reconstructing split
scrambled files requires heavy computations to retrieve partial or full image.
The proposed method hides JPEG markers from image decoders that does not allow
JPEG encoders to retrieve the metadata of a scramble image. For example, Figure 3.9
shows: (a) an original image; (b) a scrambled JPEG file based on the proposed method (if
we save the file with a JPG extension); (c) a cipher image based on JPEG encoder; (d) a
cipher image based on AES. As shown in this figure, the proposed method and AES
cannot retrieve the information of an image and the size of an image. However, metadata
of an image can be retrieved for a cipher image based on a JPEG encoder. As described
in Section 3.4, our proposed method provides a better performance over existing
methods. The proposed method can be implemented for different applications in cloud
computing systems such as (Bahrami et al. 2013) to collect information by a web crawler
and maintain the privacy of the information in a cloud computing system. The method
can be applied to eHealth systems (Rodrigues et al. 2013) to maintain data privacy.
Summary of chapter
In this chapter, we proposed a new data privacy method to store JPEG files on multi cloud
computing systems. Since the method uses less complexity, we have shown, the
implemented method provided a cost effective solution for mobile devices that do not
have enough energy for resources, such as CPU and RAM. The proposed method splits
each file to multiple chunks, distribute each chunk to multiple split files, and scramble
chunks based on chaos system. The proposed method provides low computation
overheads and it can efficiently run on a smart phone. It restricts unauthorized users
including cloud vendors and their partners to reconstruct a JPEG image file. We
compared our proposed method against other encryption methods to demonstrate its
performance superiority over existing methods. Furthermore, we investigated some
important security attack scenarios against the proposed method to evaluate the level of
security.
DATA PRIVACY PRESERVATION IN CLOUD | 55
Acknowledgement
The implementation work of the application was supported by Microsoft Windows
Azure through Windows Azure Educator Award.
56
Chapter 4
The previous chapter describes DPM for mobile cloud users. In this chapter we extend
the method by parallelizing it on GPU. Next chapters present different use case scenarios
for deploying DPM on the top of DCCSOA.
Introduction
Cloud Computing and parallel computing paradigms introduce several advantages for
processing heavy computation methods. Our study in this chapter is based on these two
paradigms which are described in the following.
computing systems. However, using complex security methods on mobile devices raise
resource limitation challenge. Therefore, not all complex security methods are able to
protect user data privacy by providing a balance between resource power and
computation speed.
Threat Model
In this section, we are describing the specific threats that our proposed technique
shall protect the privacy against.
If a mobile cloud user wishes to outsource her photos to a cloud, such as Google
Drive, Dropbox, but she would not like to share the original content with the cloud
vendor, she may encrypt the content and submit the encrypted photos to the cloud.
However, encrypting each file may drain the battery power in short period time. Another
option, she might use a PRP method to scramble the content of all photos based on bits.
Then, submit the scramble data of the photos to the cloud. In this case, the PRP generator
58 | CHAPTER 4.
should be secure and use a lengthy chunk of data (e.g., using a lengthy size of bits) to
permute the original content and then submit it to the cloud. As another example, if a
user submits a text file or a database file, she might use PRP to scramble the original
content in order to submit a confusion content to the cloud. In the case of a database, a
query also can be performed on a scrambled data without reconstructing the original
content. The detail of implementation of DPM for cloud-based datasets is described in
Chapter 8. Generally, our proposed method shall use available resources on a mobile
device (e.g., cell phone) to scramble the original content. Our proposed method uses GPU
on mobile device to parallelize the method in order to save device’s power resources and
process the method faster than device CPU. In addition, public analysis tools for data
analysis from cloud vendors, or their third-party applications are not able to simply
access the original user’s photo, text file, or database. For instance, third-party application
and bulk data analysis tools are not able to process users’ data without reconstructing the
original data from scrambled data. If users use a complex model of PRP, then
reconstructing the original file would be more difficult for cloud vendors or their-third
party partners.
In the previous chapters, we describe how a mobile device access to the shuffle
addresses for different chunk sizes of an original file. However, in this chapter, we
consider implementation of DPM on GPU to generate PRP numbers to permute the
original content and outsource scrambled content to a cloud that does not allow bulk data
analysis review user’s content. The GPU-based DPM enables the process to be run on GPU
core instead of CPU in order to improve DPM performance.
The rest of this chapter is organized as follows: the next section presents the
motivation of the study and the major challenges to maintain data privacy while using
cloud computing. Section 4.4 presents the related work. Section 4.5 presents the
background of this study. Section 4.6 presents the proposed method on GPU. The
experimental setup and its results are described in Section 4.7. The security analysis of
DPM presents in Section 4.8 which shows the security assumptions and the level of
security for the proposed method. Finally, Section 4.9 concludes the study.
Motivation
The following two options can be considered for shuffling data:
i) using an online PRP generator to produce shuffle addresses;
ii) using a set of pre-generated arrays of PRP (offline mode) as described in Chapter 3.
The first option causes an issue on a mobile device because the computation time of
generating PRP is expensive on mobile devices. The second option is preferred
because it removes additional costs for generating PRP. However, it uses mobile
PARALLEL DPM FOR MOBILE CLOUD USERS | 59
device storage that could cause indirect issue. In this chapter, we consider the first
option but we generate arrays of PRPs on-the-fly and the process distributes to
multiple cores of a GPU in order to reduce computation time.
The important challenge which is our focus is data privacy for mobile users because
when a user outsources data to the cloud, data privacy can be violated by the cloud
vendor, the vendor’s partners, hackers, malicious entities or even by other cloud users.
Related Work
To the best of our knowledge, no research has been published in the area of
implementation of PRP on GPU because its nature of sequential when it stands against
parallelism concept. However, some studies have been published for implementation of
other random number generators on GPU or FPGA. For instance, (Thomas et al. 2009)
compare the performance of three types of random number generators on CPU, GPU and
FPGA. In this study, authors use an appropriate algorithm, such as the uniform,
Gaussian, and exponential distribution for each hardware platform in order to have
efficient power peaks and computations. This study shows that the performance of the
different random-number generators relies on their platform. In this chapter, we consider
CUDA platform on GPU in order to optimize the performance and power consumption
which is not investigated in (Thomas et al. 2009). In another study, (Tsoi et al. 2003)
implemented two different random number generators for embedded cryptographic
applications on FPGA. The first is a true random number generator (TRNG) which is
described by (Killmann et al. 2001) and it is based on oscillator phase noise, and the
second is a bit serial implementation of a Blum Shub (BBS) which is described in (Blum et
al. 1986). The study shows that TRNG is recommended for low-frequency-clock
processors. Since GPU often consists of thousands of cores and with low speed and
smaller than CPU cores, we consider this fact for designing small-scale generators in our
study which is described in Section 4.5. This consideration makes PRP to be highly suited
to the target platform. In the similar study by (Manssen et al. 2012), the authors evaluate
different random number generators with different granularity. There are some studies
on processing AES on GPU (Manavski et al 2007, Shao et al. 2010 and Li at al. 2012) or
using the similar method for the security processing (Wang et al. 2009) on a GPU.
saves 72% battery power over AES encryption method because DPM can be run in O(1)
time complexity for each chunks and it requires O(n) for n chunks.
𝐹 𝑖𝑠 𝑎 𝑃𝑅𝑃 𝑖𝑓:
(𝑖𝑖𝑖) 𝐷: Pr(𝐷𝐹𝐾 (1𝑛 ) = 1) − Pr(𝐷 𝑓𝑛 (1𝑛 ) = 1)| < 𝜀(𝑠) , 𝑤ℎ𝑒𝑟𝑒 𝐾 ← {0,1}𝑠 (4)
As discussed in Equation (4), the PRP provides a uniform distribution between all
generated elements of F. This property of PRP passes the important perfect secrecy
parameter which was introduced on Shannon’s theory (Shannon 1949) for encryption
functions.
𝜉 = {𝑃𝑘 }𝜔
𝑘=0 (6)
The DPM splits the content of an original content to 𝜔 number of chunks. Then, it
uses 𝜉 to shuffle the content of chunks. We employed conflict-remover algorithm which
PARALLEL DPM FOR MOBILE CLOUD USERS | 61
is described in Chapter 3 to provide a set of unique addresses for each chunks based on
input parameter (𝜇) and selected pattern by a user.
Both processes need heavy computation and we can accelerate DPM by processing
both on GPU. However, we face several challenges on both processes when we
implement DPM in parallel. The following sub-sections explain these challenges as well
as a possible solution for each.
4.6.1 Generating 𝝃
The original content which is an input to DPM comes from different sources and it
depends on the type of application where DPM is employed. For instance, In Chapter 8,
we employed DPM for a database system management system where the input is a set of
database queries; in Chapter 3 employed DPM for protecting data privacy of JPEG files
and each chunks is composed of one or multiple Minimum Coded Unit (MCU) blocks;
and for healthcare electronic systems which is descried in Chapter 7 where data privacy
plays a key role.
The nature of generating a set of 𝜉 is a sequential process that stands against data
parallelism. We consider ℳ as a 2D-array for generating PRP addresses in parallel that
allows us to use each GPU core to generate different sets of 𝜉. Each GPU core is
corresponding to partial part of 𝜉. We map the original input (𝒟) to a 2D-array (ℳ) to
maximize the usage of GPU cores. Then, we apply 𝜉 to ℳ in the next step. Figure 4.1
shows an example of mapping from the original input (𝒟) to a 2D-array (ℳ). In this
figure, ℳ = ⋃𝛿𝑖=0 𝐷𝜅 where 𝛿 is the maximum number of chunks for the original
content/file and 𝜅 is the size of each chunk. 𝜉 generates 𝑛 set of 𝜉s for 𝑚 chunks.
Code 4.1 shows how each thread get same seed with different sequence numbers.
RowCell and ColCell represent the number of rows and columns of ℳ𝑚,𝑛 , respectively.
This configuration provides the best performance because each block of threads receives
a unique initial seed and each block provides a unique set of 𝜉. During the initialization
(config function) ColCell is considered as a parameter of application. Let’s RowCell and
ColCell be different size of ℳ. In our experiment (Section 4.7.1), we describe different
configurations for the application in order to implement different size of ℳ.
62 | CHAPTER 4.
Code 4.2 shows an implementation of generating of RowCell and ColCell of the PRP
generator function in the main() function. In this code, Line 1 allocates space for results on
host. Space allocation for results on device is defined in Line 2-4. The 𝜉 set is configured
in Line 4 and PRP generator is called in Line 6.
4.6.2 Appling 𝝃 to 𝓜
When random addresses have been generated in the previous section and it is stored in
𝜉, then different solutions are available for applying 𝜉 to ℳ as follows:
The second option is the implementation of applying 𝜉 to ℳ. In this case, there are
two elements of ℳ that needs to be exchanged when the PRP shows that an exchange is
required between two 𝜅-bits, ℳ𝑖 where it is the original address and ℳ𝑗 where it is the
destination address for the exchange. If we consider each element of ℳ as a 𝜅 bit element,
then minimum memory requirement for the implementation is 𝜅. When DPM needs to
exchange the content of two bits of ℳ (ℳ𝑖 and ℳ𝑗 ) based on 𝜉, one of the following
possibilities can raise: (𝑖) ℳ𝑖 = ℳ𝑗 : In this case, if we assume that 𝜅 = 1 then no exchange
is required because the content of both bits are the same and we can save the computation
overhead of exchanging the content. If 𝜅 > 1 then the 𝜅-bits of the content from ℳ𝑖
should be equal to all 𝜅-bits of ℳ𝑗 . (𝑖𝑖) ℳ𝑖 ≠ ℳ𝑗 : The exchange is required. In this case
ℳ𝑖′ and ℳ𝑗 ′ are the final value of the exchange process on ℳ𝑖 and ℳ𝑗 after exchange
process, respectively. Table 4.1. summarizes these two different conditions.
𝓜𝒊 𝓜𝒋 𝓜𝒊 ′ 𝓜𝒋 ′ Exchange is required
0 0 0 0 N
0 1 1 0 Y
1 0 0 1 Y
1 1 1 1 N
= 0 ⨁ℳ𝑗 (7)
= 0 ⨁ℳ𝑖 (8)
Therefore, by using Equation 7 and 8, we do not need to run the exchange function
in order to optimize the exchange function.
cudaProfilerStart();
cudaProfilerStop();
By default, the first call of CUDA API starts the profiler (in this case cudaGetDevice
initializes the profiler). An example of output of the profiler is shown in the List 4.1. In
this example, the profiler shows that 10384 API calls (CUDA API) where 98.27% of time
is taken in execution of cudaMemcpy which includes the following functions:
cudaMemcpyHostToHost,
cudaMemcpyHostToDevice,
PARALLEL DPM FOR MOBILE CLOUD USERS | 65
cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice.
Figure 4.2.a shows a set of 𝜉 and Figure 4.2.b shows six sets of 𝜉 with different initial
values. In these figures, each point represents a value of 𝜉, X-axis represents the iteration
number and Y-axis represents the value of PRP function. These figures illustrate the
proposed method generates different sets of 𝜉 when each one does not introduce any
conflict to other values of other 𝜉 sets. Figure 4.3 shows the distribution variances of 𝜉
where each set of 𝜉 values is illustrated in Figure 4.2. As shown in Figure 4.3, the values
of each set is uniformly distributed through the range of 𝜉 values. As clearly shown in
Figure 4.2.b and 4.3.b there is not any spot pattern or a cluster pattern that helps an
attacker to estimate values by knowing values (partial values) of one or different 𝜉 sets.
66 | CHAPTER 4.
120
100
80
60
40
20
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
Figure 4.2. The evaluation results of one and six sets of 𝜉 with different initial values
Overlap of different values from different sets of 𝜉 is one of the security challenge for
the proposed method. Each 𝜉 provides a permutation model. An overlap between
different value of 𝜉 sets, allows an attacker to understand a permutation model by
knowing one or multiple permutation models. Another challenge is finding a pattern
between different subsets of 𝜉. As shown in Figure 4.3, different curves do not show
similar patterns in any curves even for one set of 𝜉.
PARALLEL DPM FOR MOBILE CLOUD USERS | 67
140
120
100
80
60
40
20
0
0 10 20 30 40 50 60 70
(a) A set of 𝜉
120
100
80
60
40
20
0
0 10 20 30 40 50 60
result, the proposed method is capable to increase the size of input data with minimal
transferring cost between CPU and GPU.
The evaluation results show that the 2D-array with the size of 128*128 provides better
performance over other input sizes. However, the energy consumption is another
parameter that can be assessed in order to provide an overall result for this performance
evaluation. This evaluation also provides an overall view of the performance of the
proposed method.
Security Analysis
In this section, we describe the security assumption and the level of security for the
proposed method.
Let 𝑆𝐶(ℳ) be the scramble function of DPM on n-core GPU. Perfect secrecy as
described in Shannon theory is the probability of two different encrypted messages and
in our study, 𝑆𝐶(ℳ𝑖 ) and 𝑆𝐶(ℳ𝑗 ), which is defined as follows:
where 𝜉𝑖 and 𝜉𝑗 are defined as different sets of PRP with different initial values, 𝜇 and
𝑃0 is defined in Equation 5. 𝑀 is a set of all original messages and 𝐶 consists of permutated
messages based on a set of 𝜉 values.
To proof Lemma 1, we must proof the following sub-lemmas, Lemma 1.1 and Lemma 1.2
as follows:
Lemma 1.1: By a given c (scrambled data), the adversary cannot learn about 𝑚𝑖 and
𝑚𝑗 (two different original messages). Therefore, we must generate different outputs for
all different inputs.
Proof: Each separate original content in DPM should be scrambled with different sets
of 𝜉 to avoid similarity between 𝑆𝐶(𝜉, 𝑚𝑖 ) and 𝑆𝐶(𝜉, 𝑚𝑗 ). Each set of 𝜉 is generated by
ℂ𝑚 GPU-core independently. Each core uses different initial values to generates different
𝜉 sets without any conflict with other 𝜉 sets, or with minimal partial conflict to other sets.
#𝜁 ∈ Ζ such that SC(𝜁, 𝐶𝑜𝑛𝑡𝑒𝑛𝑡𝑖 ) = 𝑐
∀𝐹𝑖𝑙𝑒𝑖 , 𝑐: 𝑃𝑟𝜁 [𝑆𝐶(𝜁, 𝐶𝑜𝑛𝑡𝑒𝑛𝑡𝑖 ) = 𝑐] =
|Ζ|
PARALLEL DPM FOR MOBILE CLOUD USERS | 69
Since the initialization value of each 𝜉 is different for each GPU-core (the security
API Calls cudaMalloc
(a) (b)
cudaLaunch 0.5
13.4 0.45
0.4
13.2
0.35
13 0.3
0.25
12.8 0.2
0.15
12.6 0.1
0.05
12.4 0
32*64 64*64 64*128 128*64 128*128
12.2 Input Size
32*64 64*64 64*128 128*64 128*128
Input Size cuDeviceGetAttribute cudaGetDeviceProperties
(c) (d)
assumption), then an attacker by accessing to the scrambled content is not able to learn
about 𝑚𝑖 and 𝑚𝑗 , if and only if the attacker cannot learn about sequence of 𝜉 values which
means the attacker should not have knowledge of parameters of 𝜉 generator. As our
evaluation of generated PRP shows in Figure 4.2 and 4.3, then the attacker is not able to
learn about 𝑚𝑖 and 𝑚𝑗 by accessing to 𝑐. ∎
Lemma 1.2: The 𝜉 generator has perfect secrecy for all GPU cores.
Proof: The PRP must provide a uniform distribution for all entries of n bits as follows:
where 𝑈 = {0,1}𝑛 .
1
∀ 𝑥 ∈ 𝑈: 𝑃(𝑥) =
|𝑈|
Since each GPU-core generates a unique set of 𝜉 values, then the probability of all 𝜉
1
sets are equal and 𝑃(𝑥) = |𝑈| for each GPU-cores satisfied the generator condition of
perfect secrecy.
∎
Summary of Chapter
Cloud computing offers new opportunities to users to efficiently outsource data and
applications. Data privacy is one of the major issues in cloud computing systems. In
Chapter 3, we introduced a light-weight data privacy method (DPM) that allows users to
protect their data before submitting original file to the cloud. Graphic Process Units (GPU)
allows parallel processes to be run efficiently. GPU kernel is able to process
computationally intensive tasks on client side by using a GPU platform, such as NVIDIA
CUDA Toolkit.
numbered are used for permutation of an original file. On security side, we considered
the security assumption of the method and we assessed the result of pseudo-random
numbers, distribution of this random numbers and perfect security assessments to
analysis the security of the proposed method on multiple GPU cores.
72
Chapter 5
This chapter extends DCCSOA which is describe in Chapter 2 for supporting Internet-of-
Things (IoT). The next chapter describes how DPM can be implemented on the proposed
architecture in this chapter.
Introduction
This chapter describes a convergence of two recent and popular paradigms that include
cloud computing and the Internet-of-Things (IoT). These two paradigm defines a new cost-
effective model for a network of devices that are connected through the Internet. In this
section, we define these two paradigms, and advantages of each as follows when we plan
to collect a large amount of data and process the big data. The processing big data
efficiently on the cloud environment is an opportunity to use a variety of IoT devices to
transfer raw data to intelligent data.
The cloud enables users to increase or decrease the capacity of virtual storages, and
the number of virtual processing machines on-demand. The cloud vendor bills cloud
users based on a pay-per-use basis model. This dynamic resource allocation allows users
to support any number of user’s requests without paying additional cost to maintain a
large number of processing machines, and a massive amount of data. Cloud computing
has become popular to both startup businesses, and corporations because startup
CLOUD-ASSISTED IOT BASED ON DCCSOA | 73
businesses could start their businesses with low investment, and the corporations could
save maintenance costs when the number of user’s requests is decreased.
Since cloud vendors maintain all storages and processing units, the cloud users do
not need to pay for additional maintenance costs, or even recovery costs in a disaster
situation. The cloud vendors also deploy global backup servers to prevent users’ data
loss.
Another advantages of cloud-assisted IoT is providing a hub between all IoT devices
from around the world to collaborate with each other while using the cloud as the main
connection.
The rest of the chapter is organized as follows: the next section describes the
architecture issue of cloud-assisted IoT as one of the major challenge that faces users when
they want to use different cloud platforms. Section 5.3 describes a cloud architecture
74 | CHAPTER 5.
based on Service-Oriented Architecture (SOA) (Perrey 2003) and DCCSOA that allows
different cloud vendors to define a generic and dynamic platform in order to facilitate
user’s transformation for data, applications and IoT devices. Section 5.4 explains some
big data characterization and how DCCSOA can support these characterization of big
data. Section 5.5 describes advantages of the proposed architecture, such as
standardization between heterogeneous cloud platforms by using the proposed
architecture, security of the proposed architecture and it reviews a case study of
implementation of data security on the proposed architecture. Section 5.5 reviews related
works, and finally, Section 5.6 concludes this chapter.
On one hand, less modification on user’s side (IoT) device means more
standardization between heterogeneous cloud platforms and more independent cloud
services, and on the other hand, less modification on the architecture means offering a
cost effective model that supports a variety of user’s requests with minimal costs for
customization.
The proposed architecture for cloud-assisted IoT is divided into three layers as follows:
i. Cloud vendor: This layer shows the implementation of a cloud vendor. Different
cloud vendors offer heterogeneous cloud architecture in this level. Each cloud
vendor offers a variety of cloud services which are called value-added services.
For instance, Infrastructure-as-a-Service (IaaS) provides virtual infrastructures, such
as virtual storage and virtual CPU to users, Platform-as-a-Service (PaaS) offers a
76 | CHAPTER 5.
In this level, DCCSOA allows any type of cloud services with heterogeneous
platforms with different interfaces to be offered by a cloud vendor.
ii. Cloud-assisted IoT: This layer provides a generic interface on the top of
heterogeneous cloud platforms to client-side. The Dynamic Template Service Layer
(DTSL) is divided into two sub-layer as follows:
𝑚 𝑛
(1)
𝑇(𝐾𝑠 ) = ∑ 𝑖𝑛𝑝𝑢𝑡𝑘,𝑖 + ∑ 𝑜𝑢𝑡𝑝𝑢𝑡𝑘,𝑗
𝑖=1 𝑗=1
where 𝐾 ∈ {𝑃} and 𝑃 is a set of different cloud platforms which are heterogeneous, and
each given service, 𝑠 has 𝑚 number of input parameters and 𝑛 number of output
parameters.
iii. Client: end-users’ applications that include software and IoT devices are located
in this layer. If an IoT vendor offers cloud-assisted storages, and computing units,
the vendor can use the dashboard component (as illustrated in Figure 5.1) to have
collaboration/interacts with cloud vendor on one side, and their customer on the
other side. An IoT vendor who uses cloud vendor infrastructure to provide IoT
services, can use heterogeneous cloud vendors while using a generic interface (a
defined template) at the top of each cloud vendor.
where s is a service at 𝐹𝑇𝑎𝑎𝑆 and 𝒮𝑎𝑡 is a satisfaction function which is defined as follows:
𝑆𝑎𝑡(𝑠): ℛ → 𝒪 (3)
A client is able access to a template without concerning about the location of the data
source and other related configuration. The client accesses to the data source in this
example through 𝐹𝑇𝑎𝑎𝑆 which is defined as a web service.
DCCSOA allows users to interact with a template at FTaaS that provides cloud
services. The user does not need to have any knowledge of cloud-value-added services
because they are implemented at BTaaS. This independency would be a great opportunity
for IoT users who use different IoT devices and it allows them to freely transfer their data
and applications from one vendor to another.
In this study, we consider each IoT device sends data to cloud vendors. The data is
processed through FTaaS and the corresponding BTaaS, and then it is submitted to the
cloud. Big data is defined by four characteristics - volume, velocity, variety and veracity
which are described in Chapter 1. We describe the requirements of each of these
characteristics as well as our recommended solutions to maintain these characteristics
based on the proposed cloud-assisted IoT architecture.
CLOUD-ASSISTED IOT BASED ON DCCSOA | 79
5.4.1 Volume
Big data is characterized by extremely high volume data which is indicated the by the
size of data. Regularly, each IoT device generates a small portion of big data; however,
periodically generating small data and collecting all data from a large number of devices,
we will have a large volume of data.
5.4.2 Velocity
Velocity indicates the speed of data processing in term of response time. The response
time could be a batch, real-time or stream response-time. When we consider the velocity
only for a portion of data which is generated by an IoT device in a short period of time,
it is not a challenge but if we consider a long-term data collection from a large number of
IoT devices, it is a challenge for response time of processing data in a timely manner. In
this case, sometimes it requires a cloud computing system to support real-time response.
The real-time response is required a High-Performance Computing (HPC) system. In
Chapter 4, we discuss a similar challenge for mobile users who need data privacy. We
use Graphics Processing Unit (GPU) rather than CPU that allows 1000s of threads run a
portion of the task when each thread is processed on a GPU-core. In this case, the
proposed architecture is able to efficiently process a request by parallelizing the task and
splitting the task among to multiple GPU-cores. In order to offer this type of service (i.e.,
a real-time system), we may bind a BTaaS to a set of GPUs and provide a function as an
interface at FTaaS. Therefore, any task is submitted to the function at FTaaS will be able
to processed in real-time at its corresponding BTaaS when the task actually runs on
thousands of GPU-cores. Some similar studies for implementing parallel tasks on the
cloud can be found in (Suttisirikul 2012 and Oikawa 2012). These systems can be
connected to BTaaS to perform HPC on cloud computing.
5.4.3 Variety
Variety represents heterogeneity/diversity in data which is collected from different IoT
devices. Another challenge is analyzing of the variety of data including structured,
unstructured and semi-structured data. We have a variety of IoT devices and they might
generate a set of video files, text files, JSON-based output files that includes structured
data such as database, semi-structured data such as text and even unstructured data, such
as multimedia (e.g., voices, images, videos). When we are using a cloud computing, it is
capable of processing of a variety of formats. Some cloud-based tools that allow users to
80 | CHAPTER 5.
analyze a variety of data, such as Talend (see Chapter 1) which is a data integration, data
management, enterprise application integration and big data software tool. The similar
cloud-based tools are available to solve the variety issue when each tool can be bound to
BTaaS and the functionalities of each application can be exposed to the users at FTaaS.
For instance, if Talend is bound to BTaaS, it can provide data integration at DTSL layer
but each function or multiple functions of data integration can be implemented in FTaaS
layer. Flexibility in defining services through defining multiple FTaaS allows cloud
vendor to support a variety of data type for IoT devices.
5.4.4 Veracity
Veracity is the level of accuracy of data. For example, a sensor that generates data may
provide a wrong value (e.g., an IoT device which reports inaccurate temperature). The
proposed architecture is able to verify the accuracy of collected data when minimum of
two similar IoT devices are located in the same environment and they are connected to
an FTaaS (e.g., two IoT devices that measure the temperature are connected to FTaaS).
The architecture allows the veracity of data to be reviewed on the edge (which is also the
purpose of Fog Computing (Bonomi 2014) at FTaaS. In this scenario, if the value of
collected data from two sensors are not the same, then an intelligent application might
review the history of each device to see which device provided the correct result, or we
can cross check the received data at FTaaS. Reviewing data at FTaaS allows us to remove
additional overheads on the back-end of cloud computing system. Therefore, the veracity
of data can be accomplished without entering to the back-end (BTaaS). In this example,
the accurate data can be collected at BTaaS and can be stored in the cloud.
5.5.1 Standardization
One of the major issues in cloud computing is a lack of standardization between
different cloud platforms and as we discussed it causes vendor lock-in issue that does not
allows IoT devices, and their data and applications freely transfer to another cloud
vendor. DCCSOA provides a dynamic service layer (DTSL) to enable different vendors to
offer a generic platform through defining the same FTaaS in different cloud platforms.
When different cloud vendors provide the same FTaaS to their customers, the customers
are able to transfer their data and applications from one vendor to another, or even they
can transfer data and applications to their own IT department. It also allows IoT devices
to use all functions while transferring from one vendor to another.
CLOUD-ASSISTED IOT BASED ON DCCSOA | 81
The templates enable the vendor to have flexibility for offering a generic cloud service.
Although defined templates in Figure 5.2 bind to the similar cloud services from both
vendors, each template can be bound to different services from different cloud vendors.
For instance, 𝑇1 can be bound to SaaSa and PaaSb.
Figure 5.2. One snapshot of DTSL and its interaction with two heterogeneous cloud platforms
82 | CHAPTER 5.
The proposed architecture includes a dynamic layer (DTSL) that allows cloud
vendors to customize their cloud architecture on-demand because a vendor is able to
define a template for a particular service. Although the template binds to a particular
service at BTaaS layer, it provides a generic and customized service at FTaaS layer. When
a cloud vendor defines a new template that interacts with several cloud services, it enables
users to have an integrated-service from different cloud services. This integrated-service
cannot be only customized by cloud vendors or their partners, it also can be provided as
a generic service on the top of heterogeneous cloud platforms. In addition, offering an
integrated-service could attract a variety group of users. For instance, if a cloud vendor
offers a message service that allows different applications to contact each other, and a
virtual private network (VPN), the cloud vendor can integrate these services and customize
them into a template. In this case, a user is able to subscribe to the template and only
subscribe to the template.
Low level Customization that customize services at the Implementation Level where
cloud developer implements a system. This level of customization is not a cost-
effective practical method for both cloud vendors and users because each
individual application needs to be modified.
High Level Customization customizes the cloud services at the Architecture Level
that adopts new and customizes existing cloud services. If the architecture
implements on the top of existing architecture, it allows a cloud vendor to
customize the service with minimal modifications. It also allows existing
applications to be used without any modifications. Since DCCSOA allows a
cloud vendor to modify a template, the existing applications can be run without
modification and a cloud vendor only needs to modify the template.
Summary of Chapter
In the era of big data, using cloud computing gives several advantages to users to not
only collect a large volume of data from Internet-of-Things (IoT) but also process data
efficiently through a variety of tools. The combination of cloud computing and the
Internet-of-Things (IoT) paradigms allows users: to sense environment through IoT
devices, to outsource data directly from IoT devices to the cloud, and compute a massive
amount of raw data on the cloud.
Although the cloud-assisted IoT are providing a cost effective model for users, we do
not have an acceptable standard among cloud vendors for supporting different IoT
devices that causes heterogeneous cloud platforms. This cause lock-in issue for users that
does not allow users to freely transfer their data and applications from one vendor to
another, or even returning back to their IT department. In this chapter, we proposed a
dynamic cloud computing architecture which is designed based on Service-Oriented
Architecture (SOA) that allows cloud vendors to offer a generic interface (FTaaS) to their
users that supports heterogeneous cloud platforms at its corresponding BTaaS. The SOA-
based feature of the architecture allows cloud vendor to define a generic interface with
minimal modifications on their platforms in order to avoid additional costs.
The proposed architecture uses a dynamic layer (DTSL) to offer vendor services and
it is divided into front-end layer (FTaaS), and the back-end layer (BTaaS) which binds each
generic service to a particular service in a cloud computing system. The proposed
architecture allows heterogeneous platforms to provide generic and standard services to
the users with minimal modifications on the vendor side.
To the best of our knowledge, we could not find standardization and customization
on the cloud for IoT devices. However, a limited study has been conducted on cloud
architecture, service customization and standardization between clouds. In this section,
we review the most major studies.
85
Chapter 6
In the previous chapter, we describe how Cloud-assisted IoT may use DCCSOA (see
Chapter 2) to provide flexible services, dynamic feature to the users. As we described, the
second goal of our study is maintaining users’ data privacy. The questions that aims to
answer in this chapter is: “How the DCCSOA-based architecture for cloud-assisted IoT can
maintain users’ data privacy?” This chapter described a convergence of DPM (Chapter 3)
and Cloud-assisted IoT architecture based on DCCSOA (Chapter 5) to answer the
question.
Introduction
The Internet of Things (IoT) (Gubbi 2013) paradigm provides a connected network of
smart devices through the Internet. The devices include sensors, actuators, cameras and
Radio Frequency IDentification (RFID) devices to record current environmental
condition. The goal of the IoT is making smart decisions based on archived sensed data
and the current situation of the environment. IoT is used for different environments, such
as healthcare systems, smart homes, smart cities, smart cars and more recently, self-
driving cars. The network of these smart devices that comprise of sensors and other smart
technologies working in tandem and communicating efficiently are creating a new world
of operation called the IoTs.
Massive amounts of data need to be collected from IoT objects to achieve smart
decision making, but IoT objects are not capable of collecting massive amounts of data in
their storage and hence they have to outsource their data. As described in Chapter 3,
Mobile Cloud Computing (MCC) (Dinh 2013) provides an efficient platform for IoT
devices to outsource their data directly through the Internet. MCC allows IoT objects:
(i) to read data from sensors and archive data in MCC;
86 | CHAPTER 6.
(ii) to process data on cloud computing systems; and (iii) to retrieve or download data
by other devices from everywhere. In addition, MCC paradigm allows users to use
pay-per-use basis and offers the ability to upgrade the size of resources on-demand.
Tiny computing machines, IoT devices, have two challenges- resource limitation and
data privacy issue, which are described, in the following sub-sections.
The IoT devices are smart compact devices designed for efficiency and portability
and ease of use. All these features lead the designer of these devices to think smaller and
compact devices, which in turn leads to limited resources to accomplish full blown
operations. In addition, IoT devices have limited storage capacity, generally less than 256
KByte. This size of storage adds an obstacle for most of the encryption methods to run
efficiently on IoT devices.
IoT devices require a light-weight data privacy method while using MCC to save
their resources, such as power consumption, and to maintain data privacy. In Chapter 3,
we presented a Data Privacy Method (DPM) for MCC applications on mobile devices
with respect to limited battery power. DPM splits a file into several files, and each file is
divided into several chunks. The method scrambles chunk of each split file randomly by
using a chaos system (Kocarev 2001). The encryption/decryption method saves 72%
battery power over AES encryption method because DPM can be run in O(1) time
complexity and it does not add a significant overhead to the mobile device.
This chapter introduces the customization of DPM for IoT devices. However, as
discussed previously, the biggest challenge to customizing DPM for IoT devices is their
limitation of storage capacity because scrambling a small size of data is not secure. We
DPM FOR CLOUD-ASSISTED IOT | 87
propose a data privacy scheme in Section 6.6 that provides a secure method based on
DPM for IoT devices.
The rest of the chapter is organized as follows: the next section reviews motivation of
the study and the major challenges IoT devices face to maintain data privacy while using
MCC. Section 6.3 presents the motivation of this study. Section 6.4 explains some
limitation for IoT devices. Section 6.5 describes the related work for this chapter. Section
6.6 presents the proposed data privacy scheme DPS for IoT devices. Section 6.7 presents
the experimental setup. Its results in Contiki simulation tools is described in Section 6.8.
Finally, Section 6.9 presents a summary of this chapter.
Motivation
With the advancements in technology and communication the use of IoT devices to
enhance and improve living has created a plethora of devices. The major issue for mobile
and IoT devices to run an encryption method is the time complexity of the method. The
advantage of DPM for IoT devices is its time complexity for both running fast on IoT
device with limited resources, and heavy computations against attacker.
The DPM disassembles a file in O(1) time complexity but in an attack scenario, an
adversary needs O(n!) time complexity to assemble a file. In Chapter 3 we explain by
increasing the size of n, the complexity will be increased exponentially. As illustrated in
Figure 6.1, the worst case of time complexity is O(n!), and the next order is O(2n). When
88 | CHAPTER 6.
a heavy computation is required to retrieve the original data, it creates an obstacle against
an adversary to use unauthorized data. Practically, a method with O(2n) can provide more
than 100 years of computation when the size of n increases to more than 256.
Another advantage of DPM is the simplicity of the scrambling method that needs
O(1) time complexity when we have an array of pseudo-random numbers. This
advantage allows the method to run on a low-speed CPU. In DPM, a set of random
numbers based on a chaos system can be generated, and the method uses this set to
scramble chunks of original data.
This chapter describes a method with minimum time complexity to maintain data
privacy and to provide maximum time complexity (i.e., O(2n)) against an adversary to
retrieve an original file.
10,000
9,000
8,000
7,000
6,000
t(n)
5,000
4,000
3,000
2,000
1,000
0
26
61
91
1
6
11
16
21
31
36
41
46
51
56
66
71
76
81
86
96 Input Size: n
Cloud computing paradigm provides a cost effective storage for connected IoT
devices and it allows each device to outsource its data to the cloud computing servers. In
addition, the data can be accessible by any device from everywhere.
Some of IoT devices may carry sensitive data that requires privacy protection, such
as health care systems or the department of defense systems. In addition, there are legal
regulations to consider, such as the U.S. Health Insurance Portability and Accounting
Act (HIPPA) that does not allow a mobile device to outsource the original data to a third-
party.
DPM FOR CLOUD-ASSISTED IOT | 89
300 35
250 30
25
200
MCU (bit)
Size (KB)
20
150
15
100
10
50 5
0 0
One of the advantages of IoT technology is a network of sensors that uses the
6LoWPAN protocol. Although the protocol provides secure communications through
IPSec, the protocol is defined for devices with limited resources and it is almost
impossible to encrypt generated data in an efficient way (Bogdanov et al. 2007) on IoT
device. Figure 6.2 summarizes major 6LoWPAN modules’ specifications. X-axis
represents the name of IoT modules and their manufacturers. Y-axis represents modules’
resources (RAM, Flash and CPU core bits) and Z-axis represents the name of modules’
resources that includes RAM, Flash and CPU core bits. As shown in this Figure 6.2, the
34
https://www.ietf.org/
90 | CHAPTER 6.
maximum size of RAM and Flash is less than 96KByte and 512Kbyte, respectively. It is
not efficient and it is sometimes impossible to run traditional encryption methods on
these tiny computing machines with these capacities. AES encryption method is a round-
based method that needs to run for each plaintext with a key length of 128 bits to 256 bits.
Practically, the module cannot pass the requirements to run AES encryption (Bogdanov
et al. 2007).
Related Works
(Ayuso et al. 2009) presents an encryption method that splits the plain text to encrypt
data but most of our target devices as shown in Figure 6.2 have a low-speed CPU with 8-
bit core. (Marin 2013) also represents a method for IoT devices to run AES encryption by
splitting the blocks. This method requires processing time to finish the encryption
process. It needs to stop reading other processes from sensors, and it is almost impossible
for an IoT device to stop sensing the target environment or to stop transferring the
generated data while their CPU is busy to run encryption method. Several other studies
such as (Ayuso et al. 2009 and Marin 2013) have been developed for encryption on IoT
devices but each method can be run on a specific device, i.e., (Marin 2013) is developed
for 16-bits devices on 6LoWPAN. However, the proposed scheme in this chapter can be
run on any IoT device with considering how to extend the size of n.
As related works indicate, we need cloud computing to outsource data to use the
cloud’s benefits while also considering that encrypting the generated data with
traditional security methods, such as AES, is not efficient. We are interested in submitting
to the cloud is not efficient for IoT devices because the target devices have limited
resources as shown in Figure 6.2.
As described in Figure 6.2, our IoT target devices have limited Flash and RAM
capacity and each device generates a small data size. If we scramble generated data each
time, the number of n will be too small and an attacker can retrieve the original data in a
short period of time. For example, if we only scramble 16 chunks and submit it to the
cloud, an attacker can retrieve the original data in 𝑂(216 ) time complexity. In this case,
the attacker needs to try 65,536 permutation combinations to retrieve the original data
that takes only several minutes to compute it on a PC. We can increase n by submitting
partial data to the cloud and buffering the rest in the buffer. We scramble a full-buffer
before submitting the content of buffer to the MCC.
In addition, we consider each bit as a chunk of the original data that allows us to
increase the size of n rather than considering Bytes or Kbytes as a chunk.
92 | CHAPTER 6.
Algorithm 6.1. shows the distribution of generated data from a sensor where Di is a
bit of each generated sequence bytes from the module’s sensor. The algorithm decides
where Di must be stored based on {𝜓𝑖 } where 𝜓𝑖 is a set of PRP numbers. If 𝜀𝑖 =1, Di will
be buffered in B. Otherwise, Di will be stored in the next available slot of S. The algorithm
scrambles S based on 𝜓𝑗 where 𝜓𝑗 is another set of PRP numbers. Finally, the algorithm
submits the scrambled data to the MCC.
In Algorithm 6.1, we consider two sets, 𝐵 and 𝑆, to split generated sequence of bits.
These two sets add complexity by increasing the size of n. 𝑆 represents the selected bits
that must be submitted to the MCC and 𝐵 represents the selected bits that must be
buffered in RAM. If the size of RAM and buffer defines as ℂ𝑅𝐴𝑀 and ℂ𝐷 , respectively, the
buffer will be full when ℂ𝐷 = ℂ𝑅𝐴𝑀 .
01 While (1)
02 {
03 While (ℂ𝐷 ! = ℂ𝑅𝐴𝑀 )
04 If (𝜀𝑖 ==1)
05 {𝑩} ⟵ 𝑫𝒊
06 Else
07 {𝑺} ⟵ 𝑫𝒊
08 If (ℂ𝐷 == ℂ𝑅𝐴𝑀 )
09 {
10 {𝑆} ⟵ 𝑆𝐶𝜓𝑖 {𝐵}
11 𝑺𝑻𝑹𝑬𝑨𝑴{𝑺}
12 {𝑩} ⟵ {}
13 }
14 }
Generated data by an IoT device, D, defines a set of bits for each clock-cycle in which
the device reads its sensors. After a time period, 𝑇, the device reads the sensors and it
generates a new sequence of bits, 𝐷. We assume that the device generates k set of n-bits
and it defines as follows:
𝑘
𝐷=⋃ {0,1}𝑛
𝑖=0 (2)
𝑘 𝑘
(3)
𝐷 = ∑ 𝐵𝑖 + ∑ 𝑆𝑖
𝑖=0 𝑖=0
DPS define two PRP based on Equation 1, 𝜓𝑖 and 𝜓𝑗 as shown in Equation 4 and 6.
𝜓𝑖 generates decision bits for 𝐷𝑖 whether 𝐷𝑖 must be stored in buffer, B or 𝐷𝑖 must be
stored in S to submit to the MCC. 𝜓𝑗 is used for scrambling the buffer when B is full.
0 𝑖𝑓 𝑃𝑘 𝑚𝑜𝑑 2 = 0 (5)
𝜀𝑖 = {
1 𝑖𝑓 𝑃𝐾 𝑚𝑜𝑑 2 ≠ 0
where 𝑃𝑘 can be generated from Equation 1 with the initial values of 𝜇𝑖 and 𝑃𝑗 .
DPS uses 𝐵 to buffer partial bits based on 𝜓𝑖 , then, DPS scrambles S based on 𝜓𝑗 ,
and finally, the scrambled data, 𝑆𝐶𝜓𝑖 {𝐵} is submitted to the MCC.
DPS adds a set of bits (Di) to S, if (𝜀𝑖 =1) until ℂ𝐷 < ℂ𝑅𝐴𝑀 . If (ℂ𝐷 = ℂ𝑅𝐴𝑀 ) DPS uses 𝜓𝑗
to scramble B where 𝜓𝑗 is defined as follows:
where 𝜀𝑖 = 𝑃𝑙 for i=0.. ℂ𝑅𝐴𝑀 . 𝑃𝑙 can be generated from Equation 1 with initial values of 𝜇𝑗
and 𝑃𝑗 .
Finally, scrambled data, 𝑆𝐶𝜓𝑗 {𝑆}, is transferred to S for submitting to the cloud.
The algorithm increases the complexity of retrieving the original data from scrambled
data, 𝔼𝐷 by using 𝐵 and 𝑆. The complexity of the algorithm for retrieving the original data
is defined as follows:
2
𝑂(𝔇(𝔼𝐷 )) = 𝑂(2ℂ𝑅𝐴𝑀 ) (7)
where 𝔇(𝔼𝐷 ) is the complexity of retrieving the original data from scrambled data, 𝔼𝐷 .
If we consider maximum size of RAM in Figure 6.1, the time complexity of retrieving
an original data from its scrambled data is 𝑂(296 ). However, we increase the size of n
94 | CHAPTER 6.
from 96 to 962 that increases the complexity of retrieving the original data from the
scrambled data by 𝑂(29216 ) when we consider 1Kbyte as an input.
Figure 6.4. An example of submitting 8-bit generated data from IoT device with 128KB
buffer, and 64 KB stream data to cloud
Moreover, the complexity can be increased to 𝑂(29437184 ) when we consider each bit
as a chunk. In this case, the scheme scrambles an array of 9,437,184 bits which requires
several hundred years of computation to retrieve the original data.
An example of the transferring of a generated 8-bit length data from the sensors to
MCC is shown in Figure 6.4. In this example, 8 bits generated from the sensors as follows:
The scheme uses 𝜓𝑖 in Equation 6 to generate 𝜀𝑖 and uses 𝜀𝑖 to decide the location of
Di. The scheme scrambles B based on 𝜓𝑗 when 1024th-bit added to B and then it clears the
buffer. In this case, if we assume 𝜓𝑖 decides each two consecutive bits to be stored in B
and S, randomly, B can be completed after 256th reading data from the sensors (k=256).
In this case, an attacker needs to run a brute-force algorithm with O(22048) time complexity
to retrieve an original data from scrambled data.
DPM FOR CLOUD-ASSISTED IOT | 95
Experimental Setup
We conducted an experiment for the proposed method by a network of sensors on Cooja
simulator with Contiki Ver. 2.7 operating system (Dunkels 2004), which is developed in
Java. In this experiment, we use Sky Mote to emulate Tmote Sky mote (a wireless sensor
module) in UDP Sink and UDP Sender. An UDP Sink is able to read sensors, to transfer
its data, and to transfer data from one node to another node (works as a router).
An UDP Sender is able only to read the sensors and to transfer its data to the UPD
Sink.
Each node is connected to the Internet and obtained an IPv6 address by using
6LoWPAN protocol. All nodes are able to transfer their data to the MCC. We ran the
experiment with 12 nodes for two hours.
Experimental Results
In this experiment, we were interested in the result of the differential of power
consumption when a node runs the proposed data privacy scheme and when it does not.
We ran the first-half of the experiment (the first hour) without the proposed scheme and
then we ran the proposed scheme for the second–half of the experiment (the second
hour). As we previously discussed in Section 6.2, we expected the proposed scheme to
run the algorithm in O(1) which means the algorithms must not introduce additional
power consumption during the experiment.
Figure 6.6 shows the experimental results. Figure 6.6.a shows power consumption
during the two-hour experiment and the result indicates that the power consumption of
the nodes does not change dramatically. Figure 6.6.a shows the experimental result for 8
different nodes. Nodes 2, 3, 4 and 8 did not use the proposed scheme during the
experiment and nodes 12, 13, 18 and 23 use the proposed scheme during the second-hour
96 | CHAPTER 6.
of experiment. Figure 6.6.b shows that the average power consumption which is not
changed when we run the proposed scheme.
Summary of Chapter
In this chapter, we considered Mobile Cloud Computing (MCC) as a solution for
outsourcing the generated data from IoT devices. There are two obstacles for IoT devices
to use MCC. First, submitting the original data to the MCC, exposes users’ data privacy
to the cloud vendor and the vendor’s partners. Second, data encryption for mobile users,
in particular for tiny mobile devices, such as IoT, is not practical because they have limited
resources, such as storage capacity less than 256 Kbyte. In this chapter, we presented a
scheme that allows IoT devices to maintain their data privacy while each device
outsources its data to MCC directly. We implemented the proposed scheme on one of the
popular simulation tools by simulating Tmote Sky which uses IPv6 and 6LoWPAN
protocol. We simulated a network of Tmote Sky on Contiki for two hours. The
experimental results show that process of the proposed scheme on these modules does
not introduce additional power consumption overhead.
DPM FOR CLOUD-ASSISTED IOT | 97
10
1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
The value (decision bit) of
Figure 6.5. The repetition rate for the first 152 values of 𝜖𝑖 in 𝜓𝑖
Chapter 7
Electronic Health Record (EHR) systems collect and process sensitive patients’ health
data. In order to allows an EHR system to be deployable on heterogeneous cloud vendors
and protect patients’ data, we describe a novel EHR cloud-based platform.
Introduction
Cloud computing provides a cost effective model through pay-per-use that allows each
individual or healthcare business (Rodrigues 2009) to start a cloud based service with
minimum investment. The cloud has several major issues which are described in Chapter
2. Let us describe these issue for EHR systems as follows.
a network. Since cloud computing uses the Internet as part of its infrastructure, stored
data on a cloud is vulnerable to both a breach in data and network security.
Background
In Chapter 2, we proposed a dynamic cloud computing architecture based on Service-
Oriented Architecture (DCCSOA). The architecture provides a new layer, Template-as-a-
Service (TaaS), on the top of a cloud computing system that allows a cloud vendor to
standardize its cloud services by defining TaaS services. TaaS is divided into two sub-
layers: front-end (FTaaS) that allows different cloud vendors to define a generic and
standard cloud service, and back-end (BTaaS) that allows a cloud vendor to bind a defined
generic cloud service, FTaaS, to its cloud computing system. In other words, DCCSOA
enables different cloud vendors to standardize their services through a uniform interface
at FTaaS that allows users to transfer their data and applications from one vendor to
another.
In this chapter, we use DCCSOA to provide a template, TaaS, for EHR system. A
template allows an EHR system to use heterogeneous cloud computing systems. It
provides flexibility, customizability and standardization for EHR services that needs to
be run on the cloud computing.
As previously discussed, the data security and data privacy are two major issues in
cloud computing system for EHR systems. We use a light-weight data privacy method
(DPM) which is described in Chapter 3 that allows clients to scramble the original data
on the client side before submitting to the cloud, and AES encryption method on the
proposed platform. We evaluate the performance of implemented platform while clients
use the methods. Our contribution in this chapter are as follows:
The rest of the chapter is organized as follows: In the next section, we introduce the
proposed platform based on DCCSOA and the implementation of the proposed platform.
We compare the behavior of DPM against AES on the proposed platform for a massive
healthcare dataset in Section 7.5. We review related work in Section 7.6, and finally, we
conclude our study in Section 7.7.
FTaaSeH provides a generic and a uniform interface with standard services. BTaaSeH
binds specific cloud value-added services to the uniform service interfaces at FTaaSeH.
Figure 7.1 illustrates a general view of cloud stacks for the proposed platform. A
client (end-user) accesses a generic and a uniform cloud service interfaces through an
eHealth Client Application. The proposed platform can be simply transferred from a vendor
V1 to another V2 by using the same FTaaSeH in another cloud but with different BTaaSeH.
FTaaSeH is a dynamic layer, and it allows cloud vendors to customize their cloud
services as a template. First, cloud vendors bind defined generic and uniform services at
FTaaSeH to their value-added services through BTaaSeH. As shown in Equation 1 each
service at 𝐹𝑇𝑎𝑎𝑆𝑒𝐻 must pass a satisfaction function 𝒮 to propose a uniform service
interface.
𝑆𝑎𝑡(𝑠): ℛ → 𝒪 (2)
Code 7.1 which is described also in Chapter 5 shows an example of how a client
accesses FTaaS through a uniform data access layer with an abstraction on a cloud service
(database access in this case). In this code, a client loads a web service, FTaaS_Service_Ref,
for accessing services on the proposed platform. Then, the client requests a data access
by calling GetDataList procedure from the web service, and finally, it retrieves the result
on an object, DataGridView.
DataGridView.DataBind();
On one hand, defined services at FTaaS are dynamic, and the services can be
customized by a cloud vendor to provide different type of services to the clients. Cloud
102 | CHAPTER 7.
vendors bind services from BTaaS to their value-added cloud services that facilitates a
service accessibility on heterogeneous cloud services for an EHR system. On the other
hand, an EHR application, and its data can be transferred to another cloud vendor with
minimal modifications at the client side. In addition, providing a generic and a uniform
service is important for mobile health care devices because software modification for
these devices can be expensive, and sometime requires hardware modifications.
Experimental Setup
We implemented the proposed platform through a case study based on a defined template
for an EHR system. The proposed platform provides a generic data access at FTaaS to
end-users for accessing to an Electronic Medical Record (EMR). We implemented two
methods on the proposed platform to protect patients’ data privacy - one is a light-weight
data privacy method (DPM) which is described in Chapter 3 and another method is AES
encryption (Harrison 2008). These methods allow us to assess the performance of the
proposed platform.
We consider the following scenario for the implementation of the proposed platform.
“A client requests a data access to an Electronic Medical Record (EMR) which is implemented
as a web service at FTaaS. FTaaS provides a generic, and a uniform function to the client. The
request will be submitted from FTaaS to the BTaaS. Each retrieved response is processed through
two user-data protection methods, DPM and AES encryption. BTaaS is implemented by Windows
Communication Foundation (WCF) (Resnick 2008), and it is bounded to a SQL database. We ran
different queries at this level, and uses data protection methods to evaluate the performance of the
proposed platform. BTaaS’ responses sent to the client at FTaaS by a web service.”
We used an Artificial Large Medical Dataset35 as our EMR database that contains
records of 100,000 patients, 361,760 admissions, 107,535,387 lab observations, and with the
size of 12,494,912 KB (~12.2 GB). We ran 31 different queries on the largest table, lab
observations. Each query retrieved different numbers of fields with different size. We ran
DPM and AES Encryption at BTaaS to protect patients’ data privacy on each retrieved
35
http://www.emrbots.org retrieved on July 12, 2015
AN EHR PLATFORM BASED ON DCCSOA | 103
field. It allows us to assess the performance of the methods on the proposed platform by
monitoring the computation time of the methods for each retrieved field from database.
The processed queries in this experiment are based on Select Distinct Top in TSQL
language that retrieves data from 6 fields to 30,000 fields with the total queries’ result size
from 180 Byte to 911 Mbyte.
In this chapter, we are interested in evaluation of both quantity parameters and quality
parameters in the proposed platform.
Scalability: A scalable service allows the service to provide the same performance
when the number of transactions is increased.
Customization: The higher level of this parameter allows a cloud vendor to customize
provided services with minimum modifications.
Independence of services: The higher level of this parameter allows the administrator to
freely transfer an EHR system to another cloud vendor or bring it back to a traditional IT
department with minimal service modifications.
Standardization of service: The higher level of this parameter allows an EHR system to
interact with heterogeneous cloud services with minimal modifications.
Experimental Results
Figure 7.2 illustrates the experimental results for the evaluation of the quantity
parameters on the proposed platform for an EHR system. We ran 31 different queries on
the EMR database. Each submitted query from FTaaS is processed on the proposed
platform to retrieve data from database at BTaaS. The platform is retrieved the response
of each query and ran DPM and AES encryption on each retrieved field (result) from
BTaaS. Figure 7.2.a shows the performance of the implemented methods on the proposed
platform.
that DPM provides a better performance over AES encryption for all query results as we
expected.
Figure 7.2.b illustrates the performance of DPM and AES encryption for different size
of an input string while the methods are not performed on the proposed platform. We
considered each input string as a Unicode character with a size of 16 bits each. X-axis
represents the size of input string, and Y-axis represents its response time (millisecond).
In our experiment, we assumed that DPM does not need to generate a set of PRP by
accessing to predefined arrays that described in Chapter 3.
Figures 7.2.a and 7.2.b show that the performance of processing of DPM and AES on
the proposed platform (Figure 7.2.a) is not different from a single string (in Figure 7.2.b).
Another parameter which can be evaluated is quality parameters that includes service
independency and a service standardization.
As described in Code 7.1, a client can access the platform by using the provided
generic service. Since the service is independent of the cloud value-added services at the
BTaaS, it allows users to interact with the cloud services without concerning about its
requirements or type of output of a service. For instance, an application at client side in
Scenario 1 retrieves data without understanding the type of database, and the location of
the database. The service at FTaaS can be bind to any kind of services at BTaaS.
Different cloud vendors are able to define the similar services at FTaaS in Scenario I
that allows an EHR system use different cloud standardized services.
Related Work
Several cloud-based services and platforms have been developed for EHR systems.
For instance, (Fan et al. 2011) developed a platform which is used from capturing health
care data for processing on the cloud computing. The platform relies on its architecture,
and the authors did not describe how the proposed platform can be implemented for
different architectures or how it can customize services for heterogeneous clouds. As
discussed previously, a dynamic and a customizable cloud platform allows
administrators to implement, and to transfer an EHR system to different cloud computing
systems. There is also a vendor lock-in issue as described in Chapter 2, if a platform’s
services rely on a specific cloud architecture. In another study, (Lounis et al. 2012)
developed a secure cloud architecture which is only focused on wireless sensor networks,
and the study has limited work on the architecture. The study does not discuss the
architecture features, such as service modifications or dynamic services. (Magableh et al.
2013) proposed a dynamic rule-based approach without considering the cloud
environment. Finally, (Hoang et al. 2010) focus on mobile users features in their proposed
AN EHR PLATFORM BASED ON DCCSOA | 105
6.5
AES Encryption DPM
6.0
5.5
5.0
4.5
0.06
Enc DPM
Avg. Response Time (ms)
0.05
0.04
0.03
0.02
0.01
Figure 7.2. Experimental Results: a comparision between the performance of DPM and AES on the
proposed platform
architecture, and the study does not discuss the overall of the architecture. In our study
in this chapter, we proposed a dynamic platform for EHR system, and we showed how
the proposed platform implements a dynamic service at FTaaS.
Summary of chapter
In this chapter, we proposed a dynamic cloud platform for an EHR system based on a
cloud SOA architecture, DCCSOA. The proposed platform can be run on the top of
heterogeneous cloud computing systems that allows a cloud vendor to customize and
106 | CHAPTER 7.
standardize services with minimal modifications. The platform uses a template layer
which is divided into FTaaS that allows cloud vendors to define a standard, generic, and
uniform service, and BTaaS that allows defined services at BTaaS to bind to the cloud
vendor value-added services. In addition, we implemented a data access scenario on the
proposed platform with two different methods to evaluate its performance. The first
method is a light-weight data privacy method (DPM), and the second is AES encryption
method. The evaluation shows that the platform is scalable and the methods which are
ran on the platform have not introduce additional overheads.
107
Chapter 8
This chapter aims to use DPM which is described in Chapter 3, to provide users’ data
privacy in cloud-based databases. Although the proposed method can be deployed on
traditional SQL databases, we focus only on NoSQL databases in this chapter.
Introduction
As we described in Chapter 3, most of the time, cloud vendors are not fully trusted by
the users, and are vulnerable to users’ data privacy violation by the cloud vendor. Users
have several options to use the cloud. First, the users may employ a hybrid-cloud (Li et
al. 2013) that allows them to outsource sensitive data to their private storage, and uses a
public cloud for their non-sensitive data. This option may not be a practical solution due
to the complexity of the system integration (Li et al. 2013) and network security issues.
Another option is to encrypt user data before outsourcing it to an untrusted cloud vendor.
However, most well-known encryption methods, such as AES (Daemen et al. 2013) are
expensive because they increase computation time due to encryption/decryption of data
during query processing. The third option is light-weight data security methods that
secure data based on some conditions which are discussed in Section 8.2. In this chapter,
we are interested in this option that allows users to protect their outsourced data with
minimal computation overheads. The final option, is outsourcing data without
considering users’ data privacy.
Several studies (Denning et al. 1986, Popa et al. 2011, Osborn 2011 and Laur et al.
2013) have been conducted to secure a database with different encryption schemas.
Although an encrypted database causes additional computation overheads to run
queries, it enables users to protect their outsourced data, in particular sensitive
information. In this chapter, we assume that users are willing to protect their outsourced
database on an untrusted cloud vendor. We assume that the vendor must not be able to
108 | CONCLUSION
access the database, and users may be able to access the database with minimal
computation overheads.
The rest of the chapter is organized as follows. The next section introduces some
background related this study. Section 8.3, introduces the proposed schema, and its
various components. Section 8.4 presents a security analysis of the proposed schema. The
experimental setup of the implementation of the proposed schema, and the experimental
results are discussed in Section 8.5, and Section 8.6, respectively. The related work is
discussed in Section 8.7, and finally, Section 8.8 concludes this chapter.
Background
In Chapter 3, we proposed a light-weight data privacy method (DPM) that scrambles
chunk of data based on a chaos system. The DPM uses the following equation in a chaos
system that generates sets of distributed random numbers.
where 𝑃 ∈ {0,1} and 𝜇 are two initial parameters of this equation, and 𝑖 is the index of
each set of 𝜓.
In another words, 𝜓 provides a set of numbers that does not allow an adversary who
knows 𝑃𝑙 to predict the future numbers, 𝑃𝑚 where 𝑚 > 𝑙. The content of each chunk (a set
of bits or bytes) of an original data (input message) can be scrambled based on 𝑖th set of
scrambled addresses in 𝜓𝑖 which relocates the content of the original data. 𝜓𝑖 generates
repeated numbers, and DPM uses an algorithm to remove collision in addresses (see
Chapter 3), and to cover all addresses of a given chunk of data.
The advantage of DPM is its time complexity. On one hand, a user scrambles a chunk of
data with 𝑂(1) time complexity, and on the other hand, an adversary needs 𝑂(2𝑛 )
109 | CONCLUSION
computation time to retrieve the original data from scrambled data when he does not
know the initial parameters, where 𝑛 is the size of each chunk.
The size of each chunk, n is important to DPM to provide a sufficient level of security.
For instance, DPM can be secured with 𝑛 > 120 based on current computational
capabilities. If an adversary runs an exhaustive search on the scrambled data, he needs to
perform 𝑂(2120 ) computational steps to retrieve the original data. In implementation
work of the proposed schema which is described in Section 8.5, we consider each bit as
an input that allows us to increases the size of 𝑛. If we consider a field of a record as an
input, it could be small enough to retrieve the original data fast. We can combine multiple
field(s) of a record as a chunk of the original data, and we can consider bits of the chunks
as an input of DPM in order to increase the size of n. For instance, a Unicode character in
Microsoft SQL Server has 2 Bytes, and for an adversary to perform an exhaustive search
over a truly scrambled field (see the next parameter) with 20 characters’ length requires
𝑂(2𝑛 ) computation steps, where 𝑛 = 10 𝑐ℎ𝑎𝑟𝑠 ∗ 2 𝐵𝑦𝑡𝑒𝑠 ∗ 8 𝑏𝑖𝑡𝑠.
DPM needs to run with different initial parameters for each chunk of data (message) in
order to be secure. The proof of this claim is given in Section 8.4.
We can generate different set of 𝜓 for each original data but it adds additional
computation overheads. We can precompute 𝜓 offline, and store them on a database in
order to eliminate online computation overheads. A detail of implementation of these
parameters is discussed in the next section.
i
Adversary
KeyDB
SecureDB MapDB
Query User
Update Query Result
The proposed schema for a cloud-based database is illustrated in Figure 8.1. Each
submitted query from a user will go through the proxy server in order to scramble data
prior to running the query operation (insert, update or select) on the database (SecureDB).
The scrambled data is stored in SecureDB. The proxy server uses MapDB to access different
set of 𝜓 which is defined in Equation 1. We can remove MapDB by adding a 𝜓 generator
function that produces several sets of 𝜓. The proxy server uses KeyDB to store a user’s keys
for a record in SecureDB.
SecureDB: This database stores scrambled data. Authorized and unauthorized users
including cloud vendor administrators are not able to retrieve the original data from this
database without knowing the keys that are stored in KeyDB. Only submitted
transactions from Proxy Server which has access to KeyDB, is able to retrieve the original
data. Even if this database is compromised on the cloud, an internal and an external
adversary cannot retrieve the original data.
KeyDB: This database stores an index to 𝜓 which is located in MapDB, for each record in
SecureDB. This database is updated with an insert/update operation, and it is used for
reconstructing a record of SecureDB by providing 𝜓 for the corresponding record. The
KeyDB can be used locally in order to protect SecureDB from an untrusted cloud vendor.
MapDB: This is an optional database that collects a set of predefined 𝜓 in order to avoid
adding runtime computation overhead for generating 𝜓 with different initial parameters.
For instance, Table 7.1 shows a definition of Customer’s dataset with 5 fields, fields’ types,
and the size of each field (Bytes). If we consider the combination of all fields as an input
111 | CONCLUSION
to the scramble process, we need 2,272 bits (284 Bytes) to be scrambled for this table. The
join of all fields as an input increases computation time against an adversary to retrieve
the original data from scrambled data. In this example, MapDB stores different shuffle
addresses from the first bit to 2,272 by defining different initial values of 𝜇 and 𝑃0 which
is discussed previously in Equation 1. The proxy server uses one record of MapDB (shuffle
addresses) to scramble and insert data to SecureDB. Then, the proxy server stores the
record number of inserted data and its correspondence shuffle addresses from MapDB
into KeyDB, that allows the proxy server to retrieve data later by using this information.
MapDB can be used for several SecureDBs, on multiple clouds because this database can
protect each SecureDB against an adversary from each cloud.
Proxy Server: This server allows a user to retrieve, update, or insert data to SecureDB. It
runs DPM on each submitted user’s query. Each user’s operation, such as Insert, Update
or Select, needs to be submitted to the Proxy Server. If a new record needs to be added to
the database, proxy server assigns an index of a 𝜓 to the record, and then, it scrambles the
record based on the assigned 𝜓, and finally the index is stored in KeyDB for future record
retrieval.
Algorithm 8.1, shows the insert procedure in the proposed schema that uses a user’s key
and the input record to insert data into SecureDB.
112 | CONCLUSION
1: i = NewKey (Key)
2: 𝜓𝑖 =Map(i)
4: Rec# = Insert(NewScrambledRec)
First, the procedure stores an index of 𝜓 in 𝑖, and it stores 𝑖th set of shuffle addresses
from MapDB in 𝜓𝑖 (step 2). Second, it scrambles the user’s input record (step 3), and it
inserts the scrambled data into SecureDB (step 4), and stores the record number in Rec#.
Finally, it updates KeyDB with 𝑖 (the corresponding 𝜓 of the record), the record number,
and the name of table.
The proxy server uses the record number, and its corresponding 𝜓 from MapDB to
reconstruct a record when it needs to retrieve or update a record.
Security Analysis
A schema has a perfect secrecy, if it can pass the following conditions.
i) The adversary cannot learn about two scrambled records, 𝑟𝑖 and 𝑟𝑗 when he knows a
scrambled data, 𝔰;
ii) The chaos system generator has perfect secrecy.
For the first condition, each record of a table of SecureDB in the proposed schema needs
to be scrambled with different initial parameters, in order to avoid similarity between
scrambled records as follows.
In other words, the proposed schema uses different 𝜓′𝑠 which are defined with
different initial parameters to prevent an adversary from learning about two original
records by knowing their scrambled data.
∎
For the second condition, 𝜓 must provide a uniform distribution of addresses in 𝜓𝑖
for all entries of n bits as follows:
where 𝑈 = {0,1}𝑛 .
1 (7)
∀ 𝑥 ∈ 𝑈: 𝑃(𝑥) =
|𝑈|
In this case, the generator must produce different addresses with a uniform
probability. As previously discussed in Section 8.2, the generator provides scrambled
addresses in each 𝜓, which is stored in MapDB. DPM uses a set of shuffle addresses in 𝜓
to scramble data. If DPM provides the same probability for each scrambled address in 𝜓,
it must show the difference between the original addresses, and the scrambled addresses
are not the same, and DPM must not show any relation between addresses. The Figure
8.2 illustrates a statistical model of the first 100 differences between the original addresses
and the scrambled addresses in Equation 1 with the initial parameters of 𝑃0 = 0.999, 𝜇 =
3.684 for the length of 921 bits (𝑛). I Figure 8.2, X-axis represents the address of the
original bit and Y-axis represents the difference between the original address and the final
address in the scrambled bits. The result shows that DPM scrambles data with a uniform
distribution with different differences that does not allow an adversary to find a pattern
between scrambled addresses.
∎
114 | CONCLUSION
In addition, more security analysis has been conducted against DPM which is
discussed in evaluation section of Chapter 3.
As shown in Figure 8.2, there is no pattern between scrambled addresses that allows
an adversary to predict the addresses. For instance, if an adversary knows the first bit
moves to 13th bit when it is scrambled, still he cannot predict that the second bit moves to
48th address, or 3rd bit moves to 180th bit.
Experimental Setup
We conducted an experiment based on the proposed schema. We used TPC-H (Council
2008) which is a standard database benchmark with the scale of 1 GB. We ran different
queries on Customer dataset. Each submitted query went through the proxy server that ran
Figure 8.2. The difference between the original address and the scrambled address
DPM and AES encryption separately in order to compare the performance of both
methods on the proposed schema. We use ADO.Net (Lerman 2010) at client side to
retrieve and bind data from the database. DPM and AES encryption were implemented
as a class (Lerman 2010) and written in C#.Net version 4.5, and executed on a PC with
CPU Intel Core i7 with 8 GB RAM.
Experimental Results
Figures 8.3 and Figure 8.4 show the experimental results for the performance of the
security methods (AES and DPM) on the proposed schema, and Figure 8.5 shows the data
binding latency for different range of queries’ responses.
In Figure 8.3, X-axis represents the number of the fields which were requested by a
user’s query, and Y-axis represents the total response time (millisecond) of AES
encryption, and DPM on the proposed schema. Figure 8.3.a shows the total response time
for 22 queries with a small query range from 9 fields to 9,000 fields with the increase rate
115 | CONCLUSION
of 450 fields for each next query. Figure 8.3.b shows the total response time for 9 queries
with a larger query range from 9 fields to 81,000 fields with the increase rate of 9,000 fields
for the next query. As shown in these figures, DPM provides superior performance over
AES encryption. In particular, the response time difference between AES and DPM
increases for the larger queries. Figure 8.4 shows the response time difference between
AES and DPM for the query range of 9 fields to 81,000 fields. In this figure, X-axis
represents the number of requested fields for a given query, and Y-axis represents the
performance difference between AES and DPM. For instance, as shown in this figure,
DPM saves 2,909 milliseconds (~3 seconds) computation time for a database management
system (DBMS) over AES for a query with a request of 54,000 fields.
In addition, some studies on databases, such as CryptDB (Popa et al. 2011) show that
queries can be executed over encrypted database without decryption. Our proposed
method in this study can be used in CryptDB in order to reduce AES encryption
overheads.
Related Works
To the best of our knowledge, early a limited number of studies have been conducted on
data privacy for cloud-based databases. Most of the studies consider encryption methods,
or role-based data access methods on DBMS side, but any database security method that
runs on a server side cannot protect users’ data privacy.
implemented based on RSA and AES encryption. CryptDB uses a proxy server to encrypt
or decrypt each user’s query. Database likes CryptDB can be extended by using DPM in
order to remove additional computation overheads of AES.
Chapter summary
Users are facing several challenges when they must outsource their data to a cloud
computing system. First challenge in cloud computing is data privacy because any entity
from the cloud vendor’s side can violate users’ data privacy. Second challenge is data
security because cloud computing is a form of the Internet-based services that need users
to access their data through an untrusted and public network. A cloud-based database
can be compromised by authorized cloud vendor users, or unauthorized users. In this
chapter, we introduced a schema that consists of several components for cloud-based
databases that protect users’ data privacy. In the case of a compromised database, the
data can be only accessible to users who have the key. Although the schema can be
implemented by any encryption method, it uses a light-weight data privacy method
(DPM) that allows users to efficiently protect each record inserted into the database. We
conducted several experiments to evaluate the performance of the proposed schema
while using DPM and AES encryption. The experimental results show that the proposed
schema provides efficient response when DPM is employed. In addition, we analyze the
security of DPM and the level of users’ data protection.
117 | CONCLUSION
250
AES
225 DPM
200
175
Response Time (ms)
150
125
100
75
50
(a)
25
0
9 900 1800 2700 3600 4500 5400 6300 7200 8100
Number of requested fields in query
(a)
400
AES
350
DPM
300
Response Time (ms)
250
(b)
Figure 8.3. A comparison between AES encryption and DPM on NoSQL databases 200
150
3000
100
Difference Response Time of AES and DPM (ms)
2500 50
0
2000 450 900 1350 1800 2250 2700 3150
Number of requested fields in query
1500
(b)
1000
500
Figure 8.5. A comparison of data binding latency between AES encryption and DPM
0
9 900 1800 2700 3600 4950 5850 6750 7650 8550 9000 27000 45000 63000
Number of requested fields in query
.
Figure 8.4. The response time difference between AES and DPM
118 | CONCLUSION
Chapter 9
Cloud computing is a trending technology now. In order to use the advantages of the
cloud, users need to outsource their data and applications to a cloud vendor which plays
as a third-party. Outsourcing data to a third-party adds several challenges to the users,
such as transferring data and application from one vendor to another, transferring data
and application from a vendor to the in-house IT department, and users’ data privacy.
This thesis answers these questions.
The first goal of this study was to improve the architecture-level issues of the cloud.
The second was to preserve users’ data privacy, and the third goal deal with different use
cases of the two improvements.
of native cloud services. Each template maps FTaaS services to BTaaS and each BTaaS is
bound to native cloud services.
The second goal of this study was data privacy preservation in the cloud computing
environment. Data privacy is one of the key challenges for cloud users because users must
outsource their data to cloud in order to use the advantages of cloud computing.
Outsourcing data to cloud computing or performing computation on the data raises a
challenge for user that how the cloud vendor is preserving users’ data privacy against anyone
from inside the cloud including the third-parties of cloud vendors. In Chapter 3, we presented
a light-weight Data Privacy Method (DPM) that makes an obstacle for an attacker from
inside or outside the cloud to access users’ data. The method is a light-weight that allows
users to deploy it on a mobile device, such as a cellphone. DPM can be deployed on client-
side as discussed in Chapter 3, as a server-side which is explained in Chapter 7, or as a
middle box which is described in Chapter 8.
By considering DCCSOA as the host cloud architecture, DPM can be run at BTaaS
that allows a cloud vendor to preserve users’ data privacy.
We developed different scenarios for the proof of concept and for the third goal of
this study, which are described as follows.
In order to efficiently and securely process DPM for preserving users’ data privacy,
we deployed a parallelization model of DPM which is explained in Chapter 4. We used
CUDA, a GPU-based platform, which was introduced by NVIDIA. The parallel computing
model of DPM allows a device to use one or multiple GPUs to perform heavy
computations where each GPU-core consists of thousands of small and low speed cores.
Each submitted task of DPM to a core is required small computation. In this chapter, we
discussed both security level of parallel DPM and the performance of each GPU function.
device. The experimental results show that the proposed scheme does not introduce
additional power consumption overhead.
In the future, we plan to extend the result of data privacy preservation of cloud-
assisted IoT in Chapter 5 and Chapter 6, by implementing DPM on different IoT devices.
We plan to extend the EHR platform by adding more capabilities, such as computing
DPM on GPU for EHR systems.
Finally, DPM can be used as a proxy server (middle box) as explained in Chapter 8 to
preserve users’ data privacy when user wishes to outsource data to a cloud database.
As a future work for Chapter 8, we plan to extend the schema with a zero knowledge
paradigm that allows users to run queries on scrambled dsata without reconstructing
data from database. It will remove additional overheads on the database management
system, and it will allow users to protect their data privacy efficiently.
121
Bibliography
Adibi, S., Nilmini Wickramasinghe, and C. Chan. "CCmH: The Cloud Computing
Paradigm for Mobile Health (mHealth)" The International Journal of Soft Computing and
Software Engineering, 3.3 (2013): 403-410.
Ayuso, Jesús, et al. "Optimization of Public Key Cryptography (RSA and ECC) for 16-bits
Devices based on 6LoWPAN." 1st Int. Workshop on the Security of the Internet of Things,
Tokyo, Japan. 2010.
Bahrami, Mehdi, and Singhal, Mukesh. "The role of cloud computing architecture in big
data." Information granularity, big data, and computational intelligence. Springer
International Publishing, 2015. 275-295.
Bahrami, Mehdi, Singhal, Mukesh, "A Light-Weight Permutation based Method for Data
Privacy in Mobile Cloud Computing." Mobile Cloud Computing, Services, and
Engineering (MobileCloud), 2015 3rd IEEE International Conference on. IEEE, 2015.
Bahrami, Mehdi, Singhal, Mukesh and Zixuan Zhuang, "A Cloud-based Web Crawler
Architecture" in 2015 18th Int. Conf. Intelligence in Next Generation Networks:
Innovations in Services, Networks and Clouds (ICIN 2015), Paris, France, IEEE, 2015.
Bahrami, Mehdi, Mukesh Singhal, "A dynamic cloud computing platform for eHealth
systems." 2015 17th International Conference on E-health Networking, Application &
Services (IEEE HealthCom). IEEE, 2015.
Bahrami, Mehdi. "An Evaluation of Security and Privacy Threats for Cloud-based
Applications." Procedia Computer Science 62 (2015): 17-18.
122 | BIBLIOGRAPHY
Bahrami, Mehdi, Li, Dong, and Singhal, Mukesh, Kundu, Ashish “An Efficient Parallel
Implementation of a Light-weight Data Privacy Method for Mobile Cloud Users”
IEEE/ACM SC’16 – DataCloud Workshop, Utah IEEE, 2016.
Bahrami, Mehdi, Khan, Arshia, & Singhal, M. “An Energy Efficient Data Privacy Scheme
for IoT Devices in Mobile Cloud Computing” (IEEE MS 2016), San Francisco, IEEE 2016.
Bahrami, Mehdi, and Mukesh Singhal. "CloudPDB: A light-weight data privacy schema
for cloud-based databases." 2016 International Conference on Computing, Networking
and Communications (ICNC). IEEE, 2016.
Barry M. Leiner, et al. “A brief history of the internet”, SIGCOMM Comput. Commun.
Rev. 39, 5 (October 2009), 22-31, 2009.
Berner, Eta S. Clinical Decision Support Systems. Springer Science+ Business Media, LLC,
2007.
Bessis, Nik, et al. "The big picture, from grids and clouds to crowds: a data collective
computational intelligence case proposal for managing disasters." P2P, Parallel, Grid,
Cloud and Internet Computing (3PGCIC), 2010 International Conference on. IEEE, 2010.
Bist, Meenakshi, Manoj Wariya, and Amit Agarwal. "Comparing delta, open stack and
Xen Cloud Platforms: A survey on open source IaaS", Advance Computing Conference
(IACC), 2013 IEEE 3rd International. IEEE, 2013.
Blum, Lenore, Manuel Blum, and Mike Shub. "A simple unpredictable pseudo-random
number generator." SIAM Journal on computing 15.2 (1986): 364-383.
Bonomi, Flavio, et al. "Fog computing: A platform for internet of things and analytics."
Big Data and Internet of Things: A Roadmap for Smart Environments. Springer
International Publishing, 2014. 169-186.
123 | BIBLIOGRAPHY
Buscema, Massimo, et al. "Auto-Contractive Maps: an artificial adaptive system for data
mining. An application to Alzheimer disease" Current Alzheimer Research 5.5 (2008):
481-498.
Candea, George, Stefan Bucur, and Cristian Zamfir. "Automated software testing as a
service." Proceedings of the 1st ACM symposium on Cloud computing. ACM, 2010.
Chakraborty, Debrup, and Palash Sarkar. "A new mode of encryption providing a
tweakable strong pseudo-random permutation" Fast Software Encryption. Springer
Berlin Heidelberg, 2006.
Chen, Yinong, Zhihui Du, and Marcos García-Acosta, "Robot as a service in cloud
computing", Service Oriented System Engineering (SOSE), 2010 Fifth IEEE International
Symposium on. IEEE, 2010.
Choo, Euijin, et al. "SRMT: A lightweight encryption scheme for secure real-time
multimedia transmission." Multimedia and Ubiquitous Engineering, 2007. MUE'07.
International Conference on. IEEE, 2007.
Curino, Carlo, et al. "Relational cloud: A database-as-a-service for the cloud", 2011.
Daemen, Joan, and Vincent Rijmen, “The design of Rijndael: AES-the advanced
encryption standard”, Springer, 2002.
Daemen, Joan, and Vincent Rijmen. The design of Rijndael: AES-the advanced encryption
standard. Springer Science & Business Media, 2013.
Denning, Dorothy E., et al. "Views for multilevel database security." Security and Privacy,
1986 IEEE Symposium on. IEEE, 1986.
Dinh, Hoang T., et al. "A survey of mobile cloud computing: architecture, applications,
and approaches." Wireless communications and mobile computing 13.18 (2013): 1587-
1611.
124 | BIBLIOGRAPHY
Doelitzscher, Frank, et al. "Private cloud for collaboration and e-Learning services: from
IaaS to SaaS." Computing 91.1 (2011): 23-42.
Doraswamy, Naganand, and Dan Harkins. IPSec: the new security standard for the
Internet, intranets, and virtual private networks. Prentice Hall Professional, 2003.
Dunkels, Adam, Bjorn Gronvall, and Thiemo Voigt. "Contiki-a lightweight and flexible
operating system for tiny networked sensors" Local Computer Networks, 2004. 29th Annual
IEEE International Conference on. IEEE, 2004
Fairhurst, Paul. "Big data and HR analytics." IES Perspectives on HR 2014 (2014): 7.
Fan, Lu, et al. "DACAR platform for eHealth services cloud." Cloud Computing
(CLOUD), 2011 IEEE International Conference on. IEEE, 2011.
Foster, Ian, and Steven Tuecke. "Describing the elephant: The different faces of IT as
service." Queue 3.6 (2005): 26-29.
Gewin, V. “The New Networking Nexus”, Nature, vol.451, no.7181, pp. 1024-1025, 2008.
Grossman, Robert L., et al. "An overview of the open science data cloud" Proceedings of
the 19th ACM International Symposium on High Performance Distributed Computing.
ACM, 2010.
Gubbi, Jayavardhana, et al. "Internet of Things (IoT): A vision, architectural elements, and
future directions" Future Generation Computer Systems 29.7 (2013): 1645-1660.
Han, Zhang, et al. "A new image encryption algorithm based on chaos system." Robotics,
intelligent systems and signal processing, 2003. Proceedings. 2003 IEEE international
conference on. Vol. 2. IEEE, 2003.
Harrison, Owen, and John Waldron, “AES encryption implementation and analysis on
commodity graphics processing units”, Springer Berlin Heidelberg, 2007.
Hoang, Doan B., and Lingfeng Chen. "Mobile cloud for assistive healthcare (MoCAsH)"
Services Computing Conference (APSCC), 2010 IEEE Asia-Pacific. IEEE, 2010
Howe, Doug, et al. "Big data: The future of biocuration." Nature 455.7209 (2008): 47-50.
Hu, Bo, et al. "A CCRA Based Mass Customization Development for Cloud Services",
Services Computing (SCC), IEEE International Conference on. 2013.
125 | BIBLIOGRAPHY
Huang, Song, Shucai Xiao, and Wu-chun Feng. "On the energy efficiency of graphics
processing units for scientific computing." Parallel & Distributed Processing, IPDPS 2009.
International Symposium on. IEEE, 2009.
Itani, Wassim, Ayman Kayssi, and Ali Chehab. "Privacy as a service: Privacy-aware data
storage and processing in cloud computing architectures." Dependable, Autonomic and
Secure Computing, 2009. DASC'09. Eighth IEEE International Conference on. IEEE, 2009.
Jacob, Adam “The Pathologies of Big Data”, Communication of the ACM, Vol.52, No. 8,
pp.36-44, 2009.
Jonscher, Dirk, and Klaus R. Dittrich. "An approach for building secure database
federations." Proceedings of the 20th International Conference on Very Large Data Bases.
Morgan Kaufmann Publishers Inc., 1994.
Josette Rigsby, Studies Confirm Big Data as Key Business Priority, Growth Driver,
retrieved on Jan 21, 2014 at http://siliconangle.com/blog/2012/07/13/studies-confirm-big-
data-as-key-business-priority-growth-driver
Juve, Gideon, E., Vahi, K., Mehta, G., Berriman, B., Berman, B. P., & Maechling, P.
"Scientific workflow applications on Amazon EC2." E-Science Workshops, 2009 5th IEEE
International Conference on. IEEE, 2009.
Kaufman, Cynthia C. “Getting Past Capitalism: History, Vision, Hope”, Rowman &
Littlefield, 2012.
Kelly, Jeff “Big Data in the Aviation Industry”, Wikibon, Sep 16, 203, retrieved on March
18, 2014 at: http://wikibon.org/wiki/v/Big_Data_in_the_Aviation_Industry
Killmann, W., Schindler, W.: AIS 31: Functionality Classes and Evaluation Methodology
for True (Physical) Random Number Generators, version 3.1, Bundesamt für Sicherheit
in der Informationstechnik (BSI), Bonn (2001)
Kumar, Karthik, and Yung-Hsiang Lu. "Cloud computing for mobile users: Can
offloading computation save energy?" Computer 43.4 (2010): 51-56.
Landau, Susan. "Highlights from Making Sense of Snowden, Part II: What's Significant in
the NSA Revelations" Security & Privacy, IEEE 12.1 (2014): 62-64.
Laur, Sven, Riivo Talviste, and Jan Willemson. "From oblivious AES to efficient and
secure database join in the multiparty setting." Applied Cryptography and Network
Security. Springer Berlin Heidelberg, 2013.
Lerman, J. Programming Entity Framework: Building Data Centric Apps with the ADO.
NET Entity Framework. " O'Reilly Media, 2010.
Li, Qinjian, et al. "Implementation and analysis of AES encryption on GPU." High
Performance Computing and Communication & 2012 IEEE 9th International Conference
on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International
Conference on. IEEE, 2012.
Lian, Shiguo, Jinsheng Sun, and Zhiquan Wang. "A novel image encryption scheme
based-on JPEG encoding." Information Visualisation, 2004. IV 2004. Proceedings. Eighth
International Conference on. IEEE, 2004.
Liu, Fang, et al. "NIST cloud computing reference architecture." NIST special publication
500 (2011): 292.
Lounis, Ahmed, et al. "Secure and scalable cloud-based architecture for e-health wireless
sensor networks." Computer communications and networks (ICCCN), 2012 21st
international conference on. IEEE, 2012.
Magableh, Basel, and Michela Bertolotto, "A Dynamic Rule-based Approach for Self-
adaptive Map Personalisation Services", International Journal of Soft Computing and
Software Engineering, vol.3. no.3, 104, March 2013.
Manavski, Svetlin. "CUDA compatible GPU as an efficient hardware accelerator for AES
cryptography." Signal Processing and Communications, 2007. ICSPC 2007. IEEE 2007.
Manyika, James, et al. "Big data: The next frontier for innovation, competition, and
productivity." (2011).
127 | BIBLIOGRAPHY
Marin, Leandro, Antonio Jara, and Antonio Skarmeta Gomez, "Shifting primes:
Optimizing elliptic curve cryptography for 16-bit devices without hardware multiplier."
Mathematical and Computer Modelling 58.5 (2013): 1155-1174.
Marx, Vivien. "Biology: The big challenges of big data." Nature 498.7453 (2013): 255-260.
Matheson, David, and James E. Matheson, “The Smart Organization: Creating Value
through Strategic”, Rand D. Harvard Business Press, 1998.
McAfee, Andrew, and Erik Brynjolfsson. "Big data: the management revolution."
Harvard business review 90.10 (2012): 60-66.
McHugh, Mary L. "The chi-square test of independence." Biochemia Medica 23.2 (2013):
143-149.
Oikawa, Minoru, et al. "DS-CUDA: a middleware to use many GPUs in the cloud
environment." High Performance Computing, Networking, Storage and Analysis (SCC),
2012 SC Companion:. IEEE, 2012.
Osborn, Sylvia. "Database security integration using role-based access control." Data and
Application Security. Springer US, 2001.
Osvik, Dag Arne, et al. "Fast software AES encryption" Fast Software Encryption.
Springer Berlin Heidelberg, 2010.
Pal, Subhankar, and Tirthankar Pal. "TSaaS—Customized telecom app hosting on cloud"
Internet Multimedia Systems Architecture and Application (IMSAA), 2011 IEEE 5th
International Conference on. IEEE, 2011.
Pedrycz, W., Granular Computing: Analysis and Design of Intelligent Systems, CRC
Press/Francis Taylor, Boca Raton, 2013
Perrey, Randall, and Mark Lycett. "Service-oriented architecture." Applications and the
Internet Workshops, 2003. Proceedings. 2003 Symposium on. IEEE, 2003.
Podesser, Martina, Hans-Peter Schmidt, and Andreas Uhl. "Selective bitplane encryption
for secure transmission of image data in mobile environments." Proceedings of the 5th
IEEE Nordic Signal Processing Symposium (NORSIG’02). 2002.
Popa, Raluca Ada, et al. "CryptDB: protecting confidentiality with encrypted query
processing." Proceedings of the Twenty-Third ACM Symposium on Operating Systems
Principles. ACM, 2011.
Ra, Moo-Ryong, Ramesh Govindan, and Antonio Ortega. "P3: Toward Privacy-
Preserving Photo Sharing" NSDI. 2013.
Resnick, Steve, Richard Crane, and Chris Bowen, “Essential windows communication
foundation: for .Net framework 3.5”, Addison-Wesley Professional, 2008.
Ristenpart, Thomas, et al. "Hey, you, get off of my cloud: exploring information leakage
in third-party compute clouds" Proceedings of the 16th ACM conference on Computer
and communications security. ACM, 2009
Rodrigues, Joel JPC, ed. “Health Information Systems: Concepts, Methodologies, Tools,
and Applications”, Vol. 1. IGI Global, 2009.
Rodrigues, Joel JPC, et al. "Analysis of the security and privacy requirements of cloud-
based Electronic Health Records Systems" Journal of medical Internet research 15.8
(2013).
Rodrigues, Joel JPC, et al. "Distributed media-aware flow scheduling in cloud computing
environment" Computer Communications 35.15 (2012): 1819-1827.
Schonfeld, Erick, Google Processing 20,000 Terabytes A Day, And Growing, retrieved
on Jan 21, 2014 at http://techcrunch.com/2008/01/09/google-processing-20000-terabytes-
a-day-and-growing/
Shayan, J., Azarnik, A., et al.,"Identifying Benefits and risks associated with utilizing
cloud computing", International Journal of Soft Computing and Software Engineering,
Vol. 3, No. 3, pp. 416-421, 2013.
Shannon, C.E. “Communication Theory of Secrecy Systems", Bell System Tech. J., Vol. 28,
1949, pp. 656-715.
129 | BIBLIOGRAPHY
Shao, Fei, Zinan Chang, and Yi Zhang. "AES encryption algorithm based on the high
performance computing of GPU." Communication Software and Networks, 2010.
ICCSN'10. Second International Conference on. IEEE, 2010.
Singhal, Mukesh, Santosh Chandrasekhar, Gail-Joon Ahn, Elisa Bertino, Ram Krishnan,
Ravi Sandhu and Ge Tingjian, “Collaboration in Multi-Cloud Systems: Framework and
Security Issues”, IEEE Computer, Vol 46, No 2, February 2013, pp. 76-84.
Stanik, Alexander, Matthias Hovestadt, and Odej Kao. "Hardware as a Service (HaaS):
The completion of the cloud stack." Computing Technology and Information
Management (ICCM), 2012 8th International Conference on. Vol. 2. IEEE, 2012.
Tan, Wei, et al. "Social-Network-Sourced Big Data Analytics" Internet Computing, IEEE
17.5 (2013): 62-69.
Thomas, David Barrie, Lee Howes, and Wayne Luk. "A comparison of CPUs, GPUs,
FPGAs, and massively parallel processor arrays for random number generation."
Proceedings of the ACM/SIGDA international symposium on Field programmable gate
arrays. ACM, 2009.
Truong, Hong-Linh, and Schahram Dustdar. "On analyzing and specifying concerns for
data as a service." Services Computing Conference, 2009. APSCC 2009. IEEE Asia-Pacific.
IEEE, 2009.
Tsai, Wei-Tek, Xin Sun, and Janaka Balasooriya, "Service-oriented cloud computing
architecture", Information Technology: New Generations (ITNG), 2010 Seventh
International Conference on. IEEE, 2010.
Tsoi, Kuen Hung, K. H. Leung, and Philip Heng Wai Leong. "Compact FPGA-based true
and pseudo random number generators." Field-Programmable Custom Computing
Machines, FCCM 2003. 11th Annual IEEE Symposium on. IEEE, 2003.
Wang, Wei, et al. "Accelerating fully homomorphic encryption using GPU." High
Performance Extreme Computing (HPEC), 2012 IEEE Conference on. IEEE, 2012.
Wei-Tek Tsai, Wu Li, Hessam Sarjoughian, and Qihong Shao. 2011. SimSaaS: simulation
software-as-a-service. In Proceedings of the 44th Annual Simulation Symposium (ANSS
'11). Society for Computer Simulation International, San Diego, CA, USA, 77-86.
Wilson, Lori A. "Survey on Big Data gathers input from materials community" MRS
Bulletin 38.09 (2013): 751-753.
Ye, Guodong. "Image scrambling encryption algorithm of pixel bit based on chaos map."
Pattern Recognition Letters 31.5 (2010): 347-354.
Yoshikawa, Masaya, and Hikaru Goto. "Security Verification Simulator for Fault Analysis
Attacks", International Journal of Soft Computing and Software Engineering, vol.3, no.3,
71, March 2013.
Young, Mark “Automotive innovation: big data driving the changes”, retrieved Jan
26,2014 at http://www.thebigdatainsightgroup.com/site/article/automotive-innovation-
big-data-driving-changes
Zhang, Liang-Jie, and Qun Zhou, "CCOA: Cloud computing open architecture", Web
Services, ICWS 2009. IEEE International Conference on. IEEE, 2009.
Zhang, Tao, and Xianfeng Li. "Evaluating and analyzing the performance of RPL in
contiki." Proc. of the first int. workshop on Mobile sensing, computing and
communication. ACM, 2014.
Zibin Zheng; Jieming Zhu; Lyu, M.R., "Service-Generated Big Data and Big Data-as-a-
Service: An Overview," Big Data (BigData Congress), 2013 IEEE International Congress
on, vol., no., pp.403,410, June 27 2013-July 2 2013
Zorrilla, Marta, and Diego García-Saiz. "A service oriented architecture to provide data
mining services for non-expert data miners." Decision Support Systems 55.1 (2013): 399-
411.