Securing Data With Block Chain and Ai

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

SPECIAL SECTION ON ARTIFICIAL INTELLIGENCE IN CYBERSECURITY

Received May 4, 2019, accepted May 20, 2019, date of publication June 7, 2019, date of current version June 27, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2921555

Securing Data With Blockchain and AI


KAI WANG 1,2 , (Member, IEEE), JIAQING DONG1 , YING WANG3 ,
AND HAO YIN1 , (Member, IEEE)
1 Research Institute of Information Technology, Tsinghua University, Beijing 100084, China
2 School of Computer and Control Engineering, Yantai University, Shandong 264005, China
3 School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, China
Corresponding author: Hao Yin ( [email protected])
This work was supported in part by the China Postdoctoral Science Foundation, under Grant 2017M620786, Shandong Provincial Natural
Science Foundation, China, under Grant ZR2017BF018, and National Natural Science Foundation of China (NSFC), under Grant
61702439.

ABSTRACT Data is the input for various artificial intelligence (AI) algorithms to mine valuable features, yet
data in Internet is scattered everywhere and controlled by different stakeholders who cannot believe in each
other, and usage of the data in complex cyberspace is difficult to authorize or to validate. As a result, it is very
difficult to enable data sharing in cyberspace for the real big data, as well as a real powerful AI. In this paper,
we propose the SecNet, an architecture that can enable secure data storing, computing, and sharing in the
large-scale Internet environment, aiming at a more secure cyberspace with real big data and thus enhanced
AI with plenty of data source, by integrating three key components: 1) blockchain-based data sharing with
ownership guarantee, which enables trusted data sharing in the large-scale environment to form real big data;
2) AI-based secure computing platform to produce more intelligent security rules, which helps to construct
a more trusted cyberspace; 3) trusted value-exchange mechanism for purchasing security service, providing
a way for participants to gain economic rewards when giving out their data or service, which promotes the
data sharing and thus achieves better performance of AI. Moreover, we discuss the typical use scenario of
SecNet as well as its potentially alternative way to deploy, as well as analyze its effectiveness from the aspect
of network security and economic revenue.

INDEX TERMS Data security, data systems, artificial intelligence, cyberspace.

I. INTRODUCTION there is not a reliable way to record how the data is used and
With the development of information technologies, the trend by who, and thus has little methods to trace or punish the
of integrating cyber, physical and social (CPS) systems to a violators who abuse those data [8]. That is, lack of ability to
highly unified information society, rather than just a digital effectively manage data makes it very difficult for an individ-
Internet, is becoming increasing obvious [1]. In such an infor- ual to control the potential risks associated with the collected
mation society, data is the asset of its owner, and its usage data [9]. For example, once the data has been collected by a
should be under the full control of its owner, although this is third party (e.g., a big company), the lack of access to this
not the common case [2], [3]. data hinders an individual to understand or manage the risks
Given data is undoubtedly the oil of the information soci- related to the collected data from him. Meanwhile, the lack of
ety, almost every big company want to collect data as much as immutable recording for the usage of data increases the risks
possible, for their future competitiveness [4], [5]. An increas- to abuse them [10].
ing amount of personal data, including location information, If there is an efficient and trusted way to collect and
web-searching behavior, user calls, user preference, is being merge the data scattered across the whole CPS to form
silently collected by the built-in sensors inside the products real big data, the performance of artificial intelligence (AI)
from those big companies, which brings in huge risk on will be significantly improved since AI can handle massive
privacy leakage of data owners [6], [7]. Moreover, the usage amount of data including huge information at the same time,
of those data is out of control of their owners, since currently which would bring in great benefits (e.g., achieving enhanced
security for data) and even makes AI gaining the ability
The associate editor coordinating the review of this manuscript and to exceed human capabilities in more areas [11]. According
approving it for publication was Chi-Yuan Chen. to the research in [12], if given large amount of data in
2169-3536
2019 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 7, 2019 Personal use is also permitted, but republication/redistribution requires IEEE permission. 77981
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
K. Wang et al.: Securing Data With Blockchain and AI

an orders of magnitude more scale, even the simplest AI Thus, SecNet introduces blockchain-based data sharing
algorithm currently (e.g., perceptrons from the 1950s) can mechanisms with ownership guarantee, where any data ready
achieve fanciest performance to beat many state-of-the-art for sharing should be registered into a blockchain, named
technologies today. The key lies in how to make data sharing Data Recording Blockchain (DRB), to announce its availabil-
trusted and secured [13]. Fortunately, the blockchain tech- ity for sharing. Each access behavior on data by other parties
nologies may be the promising way to achieve this goal, via (not the data owner) should also be validated and recorded
consensus mechanisms throughout the network to guarantee in this chain. In addition, the authenticity and integrity of
data sharing in a tamper-proof way embedded with economic data can only be validated by DRB as well. Besides, SecNet
incentives [14], [15]. Thus, AI can be further empowered enables economic incentive between different entities if they
by blockchain-protected data sharing [16]–[18]. As a result, share data or exchange security service, by embedding smart
enhanced AI can provide better performance and security contract on data to trigger automatic and tamper-proof value
for data. exchange. In this way, SecNet guarantees the data security
In this paper, we aim at securing data by combining and encourages data sharing throughout the CPS.
blockchain and AI together, and design a Secure Networking Furthermore, data is the fuel of AI [11], and it can greatly
architecture (termed as SecNet) to significantly improve the help to improve the performance of AI algorithms if data can
security of data sharing, and then the security of the whole be efficiently networked and properly fused. Enabling data
network, even the whole CPS. sharing across multiple service providers can be a way to
In SecNet, to protect data, one of the biggest challenges maximize the utilization of scattered data in separate enti-
is where and how to store data, because users have to give ties with potential conflicts of interest, which can enables
their data to service providers if they want to use certain a more powerful AI. Given enough data and blockchain-
services or applications [1], [3]. This is caused by the inher- based smart contract [20] on secure data sharing, it is not
ent coupling of user data and application in current service surprised that AI can become one of the most powerful
mechanisms, which significantly hinders the development of technologies and tools to improve cybersecurity, since it can
data protection and application innovation. Inspired by the check huge amount of data more quickly to save time, and
concept of Personal Data Store (PDS) from openPDS [5] and identify and mitigate threats more rapidly, and meanwhile
the Private Data Center (PDC) from HyperNet [1], SecNet give more accurate prediction and decision support on secu-
finally inherits and adopts PDC instead of PDS, as PDC is rity rules that a PDC should deploy. Besides, embedded with
more suitable to deploy and to deal with this problem, since Machine Learning [21] inside, AI can constantly learn pat-
it provides more secure and intelligent data storage system terns by applying existing data or artificial data generated by
via physical entities instead of software-based algorithms as GAN [22] to improve its strategies over time, to strengthen
in openPDS. Each PDC actually serves as a secured as well as its ability on identifying any deviation on data or behaviors
centralized physical space for each SecNet user where his/her on a 24/7/365 basis. SecNet can apply these advanced AI
data lives in. Embedding PDC into SecNet would allow users technologies into its Operation Support System (OSS) to
to monitor and reason about what and why their data is adaptively identify more suspicious data-related behaviors,
used as well as by who, meaning the users can truly control even they are never seen before. In addition, swarm intelli-
every operation on their own data and achieve fine-grained gence can be used in SecNet to further improve the data secu-
management on access behaviors for data. Actually, besides rity, by collecting different security knowledge from huge
PDC, other choices can also be applied for the data storing in amount of intelligent agents scattered everywhere in the CPS,
SecNet according to certain requirements (see Section V). with the help of trusted exchange mechanisms for incentive
The trust-less relationship between different data stake- tokens [23].
holders significantly thwarts the data sharing in the whole The rest of this paper is organized as follows. Section II
Internet, thus the data used for AI training or analyzing is overviews related works. Section III presents the SecNet
limited in amount as well as partial in variety. Fortunately, architecture. Section IV gives a typical use scenario of SecNet
the rise of Blockchain technologies bring in a hopeful, effi- on medical care area. Section V discusses an alternative way
cient and effective way to enable trust data sharing in trust- to deploy a different data storage model in SecNet. Section VI
less environment, which can help AI make more accurate provides the analysis on both security improvement of the
decisions due to the real big data collected from more places network system and the incentive for users to share learned
in the Internet. SecNet leverages the emerging blockchain security rules. Finally, section VII concludes this paper and
technologies to prevent the abuse of data, and to enable gives some future directions.
trusted data sharing in trust-less or even untrusted environ-
ment. For instance, it can enable cooperations between differ- II. RELATED WORK
ent edge computing paradigms to work together to improve Data security is among key concerns of any network archi-
the whole system performance of edge networks [19]. The tectures, and is the base for AI algorithms to improve due
reason why blockchain can enable trusted mechanisms is to its requirement for huge amount of data from as much as
that it can provide a transparent, tamper-proof metadata possible places in Internet. Meanwhile, with a more powerful
infrastructure to seriously recode all the usage of data [17]. AI, data security can be further protected at a higher level

77982 VOLUME 7, 2019


K. Wang et al.: Securing Data With Blockchain and AI

as an enhanced AI can figure out advanced and complicated All these ideas and solutions above propose to protect data
threats more easily than normal AI. security, by designing a new service paradigm supporting the
To enhance the security of data in CPS, numbers of decoupling of data and application, or by designing a specific
efforts are conducted. The work in [3] presents an archi- blockchain to meet demands of certain applications, or by
tecture named Amber to enable decoupling data from the integrating AI algorithms as a functional component to ana-
web applications, which gives control ability to web users lyze data security. However, none of them treats the problem
over their personal data, as well as provides a powerful of data security from the view of architecture. To fill this gap,
web-wide query function to search personal data. To extend SecNet tries to construct a common and general networking
the decoupling mechanism of data and applications from only architecture by combining the power of AI and blockchain
web services to all kinds of applications, the research group together at a large scale, which can support dynamic update
from the Media Lab in Massachusetts Institute of Technology of all these functional component separately at any time as
designs the openPDS [5], acting as a secured virtual space needed, to efficiently and effectively improve the data secu-
for users to collect, store and manage their data, separating rity for all applications.
all kinds of applications from operating on data directly. It is worth noting that SecNet is different from
In addition, openPDS introduces a new service paradigm HyperNet [1]. For instance, firstly, AI in HyperNet mainly
named SafeAnswer, to dynamically protect data privacy by acts as the virtual personal assistant to protect privacy of
reducing the dimensions of personal data. a single PDC user while AI in SecNet is also in charge of
Besides, the emerging blockchain technology provides an generating artificial data for training more robust security
efficient and effect way to guarantee the security of data rules, which can be used to enhance AI again. Secondly, how
in CPS, by providing tamper-proof and traceable recording to securely sharing security rules with the help of a detailed
features as well as incentive mechanisms. The authors in [8] on-chain smart contract is given in SecNet, yet HyperNet
develop the OriginChain system to realize the transparency lacks. In addition, SecNet aims at achieving a more secure
and tamper-proof features of the metadata when the supply cyberspace by sharing not only user data but also security
chain traces products. OriginChain enables all related par- rules produced by AI, while HyperNet only aims at securely
ties to obtain the same trusted data and adapt to dynamic sharing user data. Last but not least, PDC is only one of the
environment and regulations. The authors in [10] propose a data storing solutions for SecNet (see Section V), yet is the
blockchain-based MeDShare system to effectively manage only solution for HyperNet.
and protect medical records, as well as share medical data
among cloud repositories, with guarantees on data prove- III. THE SECNET ARCHITECTURE
nance, auditing and controlling. The work in [17] overviews SecNet is build as an architecture for a more secure
the background of blockchain and Intrusion Detection Sys- cyberspace, by integrating three key components:
tem (IDS) in details, and discusses how to apply blockchain 1) blockchain-based data sharing with ownership guarantee;
technologies to IDS, as well as gives reasonable guesses about 2) AI-based secure computing platform based on big data to
possible hidden dangers in this direction. Besides, the work produce intelligent and dynamic security rules; 3) trust value-
in [15] designs a blockchain-based incentive mechanism for exchange mechanism for purchasing security services.
crowdsensing applications, with privacy preserving and data Figure 1 illustrates the overall architecture of SecNet.
security guaranteeing. Nodes in SecNet are connected with Blockchain-based Net-
Furthermore, AI is also a promising way to enhance data working. In the network, nodes communicate with each other
security in CPS, since it can deeply analyze huge amount and reach a consensus based on blockchain techniques. In the
of data, learn hidden patterns and then make accurate pre- meanwhile, they cooperate through the execution of smart
dictions, with the help of availability of enormous data and contracts. In order to reach a consensus, either on node state
increased computational power. The work in [11] has made a or smart-contract execution results, each node contains a
detailed overview about the use of AI for big data as well blockchain ledger to sync state with other nodes. In terms
as the use of big data for AI, and also put forward some of data, SecNet nodes are equipped with the data storage
development directions including how to improve the data module and access control module for data security. SecNet
security by AI. The work in [16] highlights AI can gain better nodes also have an Operation Support System (OSS) module
performance if provided huge amount of data to achieve a which enables AI-based secure computing (ASC) for gener-
better base model, and appeals to develop more efforts for ating knowledge and secure rules from data.
building larger valuable datasets, to empower the AI for better
security of data. Furthermore, the work in [21] overviews and A. DATA SHARING GUARANTEED BY BLOCKCHAIN
presents a comprehensive survey on AI methods for cyber For data protection, SecNet adopts the Private Data Cen-
security. In addition, the work in [20] aims at creating a ter (PDC) from HyperNet [1], and integrates blockchain-
market where participants can exchange machine learning based protection mechanism for data sharing between
modes for rewards, making AI more practical and accessible untrusted entities.
to everyone, and thus providing more AI solutions for better PDC provides physical security for data, leveraging
security of data. advanced architectural and engineering approaches to

VOLUME 7, 2019 77983


K. Wang et al.: Securing Data With Blockchain and AI

FIGURE 1. The SecNet architecture.

operating AI-based OSS. One important feature PDC pro- data validating and behavior recording of data interaction.
vides is the uniform data access control. Uniform data access That is, any interaction with this data would be recorded by
control comes from two aspects. The first one is uniform DRB, and the authenticity and integrity of data can only be
data representation (UDR). UDR helps data be represented validated by DRB as well.
in a standard form, in which data is self-description and can
be easily parsed by applications conforming UDR standard, B. AI-BASED SECURE COMPUTING
which makes it convenient for data sharing among entities. Data is so important for its owner, and different types of
With UDR, various kinds of data will have a uniform repre- data can be produced by reshaping the raw data, according to
sentation to data consumers, which naturally mitigates data different requirements and scenarios. For example, the health
format problem in an environment where different applica- information of a user stored in PDC can be extracted and
tions have different data formats. The second one is uniform reorganized to become structured medical data which is very
access control (UAC). UAC is very similar to access control convenient for its buyers from hospital, research institutes and
schemes used in many file systems. It’s concerned with giving heathy application developers.
access to agents (users, groups, applications and more) to All the data of an entity in cyberspace is stored in PDC,
perform various kinds of operations (read, write, append, and thus its security is of great importance to its owner, as the
etc) on data. PDC can easily decide whether a request for a data is in fact the digital clone of the entity in real world.
data from a specific entity is legal or not with UAC. Besides To protect data, SecNet introduces ASC component into the
the representation aspects, PDC also provides a mechanism OSS in every PDC.
for data identification. PDC also provides a uniform data AI is one of the core capabilities integrated in PDC. Various
identifier (UDI) platform for data identification and routing. kinds of machine learning techniques have been invented for
With UDI, PDC is capable of identifying the source, ver- different AIs, for instance pattern matching, computer vision
sion, ownership and many other attributes of data, makes and self-driving. Currently there are different AI techniques
it possible to manage and exchange data objects between being investigated for handling different data types. These
different entities and applications. The UDI platform in PDC data-specific AI functions can be treated as a large set of
is decentralized, with which user data could be managed in a ‘‘solution islands’’: the academia and industry has produced
decentralized way and no service provider can control data, numerous isolated software components and mechanisms that
thus the abuse of data or data leakage is avoided. deal with various parts of intelligence separately. PDC works
Every PDC is housed in nondescript facilities and the as an AI operation platform, integrating individual AI compo-
physical access is strictly controlled both at the perimeter and nents into a coherent, intelligent system of a broader nature.
at building ingress points by professional security staff as well Different AI functionalities collaborate with each other in
as GAN-based improved rules, only providing data access for PDC and act as an intelligent system.
legitimate users who have such privileges. For secure computing, at the very beginning stage, ASC
Every entity (e.g., a user, or an institute) has a PDC to store can integrate Generative Adversarial Network (GAN) [22]
data. All the data produced in cyberspace related to an entity module to generate more powerful and evolving security
is stored in a corresponding PDC, and can be merged and rules, and enable a secure and intelligent OSS for PDC.
computed to form a knowledge system, to further improve GAN module of ASC can learn the current security rules
the data security. of a PDC, and then generative malicious but ‘‘look like
Before any data can be shared by Internet, this data should legitimate’’ access requests for some private data to confuse
be registered into the DRB, to announce its availability for and confound the OSS of a PDC, aiming at making OSS
sharing. DRB is in charge of not only data naming, but also lose the ability to classify the access request is illegal or not.

77984 VOLUME 7, 2019


K. Wang et al.: Securing Data With Blockchain and AI

After extensive round of generating and classifying by GAN Blockchain together with SGX-based smart contract exe-
module, the OSS of PDC would become much more intelli- cution in SecNet bootstraps improvement of security rules,
gent and powerful, and fake access requests for data would enabling well-defined security rules owned by one entity can
have little chance to compete such a secure and intelligent be acquired by others in exchange of value. The owner of
OSS of this PDC. a security rule can register that rule on the blockchain with
Different entities can share their computation results with its name together with other metadata, and place a smart
each other, which is protected by blockchain, to achieve contract over a registered rule, hoping to get paid by those
higher performance and lower energy consumption. who acquired the rule through smart contract execution.
Algorithm 1 represents a sample smart contract for data
C. VALUE-EXCHANGE FOR SECURITY SERVICES sharing, and data here means user data, or security rules
Except for the security concerns from every single PDC, produced by AI in PDC. Data owners register their data
the Internet has its own threats. For instance, various cyber on blockchain with procedure RegisterData, supply-
attacks and computer viruses move across the Internet, and ing hash of the data, typically derived from data content,
they are evolving all the time, which makes the protection description, address for fetching the data and desired price
from the view of each single PDC insufficient. To further for subscribing this data. Owners can also withdraw a reg-
improve the security of data in cyberspace, the fragmented istered data with procedure WithdrawData. After reg-
data scattered across the Internet should be combined to istration, data consumers can view data descriptions from
produce more useful security strategies and more intelligent the blockchain and subscribe specific data with procedure
security rules. SubscribeData, which will charge them at the price the
owner designated and grant the subscriber with the permis-
Algorithm 1 Smart Contract on Data sion to access that data. As subscribers of a specific data,
one can request corresponding permission via smart contract
Require:
with procedure RequestData, which will return an access
mapping (hash => struct) public Data;
token for that data. With the access token, the consumer can
mapping (pubkey => int) public balance;
fetch desired data from the storage system at the correspond-
mapping (address => hash) private reverseIndex;
ing address. SGX ensures that the process of smart contract
procedure RegisterData(hash,description,address, price)
execution cannot be meddled by users, and guarantees the
Data[hash].owner ← msg.sender
value-exchange process. With SGX-based smart contract exe-
Data[hash].address ← address
cution, blockchain ledger can guarantee the value-exchange
Data[hash].description ← description
process (e.g., proper value is paid for certain security ser-
Data[hash].price ← price
vices), if the related smart contract is employed properly.
Data[hash].subscribers ← []
Security rules can be categorized according to different
reverseIndex[address] ← hash
types, and each type of rules can be treated as a specific data,
return TRUE
and has its individual name. Thus, security rules can be shared
end procedure
based on their names, and blockchain can enable different
procedure WithdrawData(hash)
PDCs to share their security strategies and rules with each
require (Data[hash].owner == msg.sender)
other based on name, and record every behavior on this name,
reverseIndex[Data[hash].address] ← NULL
to replenish the security rules and improve the security of
Data[hash] ← NULL
involved PDCs, as well as guarantee the traceability of every
end procedure
data interaction behavior. PDC can acquire security services
procedure SubscribeData(hash)
by learning the security rules of other PDCs as described.
require (balance[msg.sender] >= Data[hash].price)
balance[msg.sender] − = Data[hash].price
IV. USE SCENARIO
Data[hash].subscribers + = msg.sender
SecNet will enable enormous applications due to the inherent
end procedure
embedding of AI and blockchain. One of the typical cases
procedure RequestDataWithAddress(address)
for SecNet deployment and application is the trust medical
require (reverseIndex[address] ! = NULL)
data sharing among trust-less different parties, to support an
hash ← reverseIndex[address]
intelligent and secure medical data management ecosystem,
require (msg.sender ∈ Data[hash].subscribers)
which is the key to a global health care system.
return AccessToken for address with TTL
end procedure
A. NECESSARIES OF IMPLEMENTING SECNET FOR
procedure RequestDataWithHash(hash)
MEDICAL CARE
require (msg.sender ∈ Data[hash].subscribers)
The traditional way of medical data management is inefficient
address ← Data[hash].address
for building a global health care system. On the one hand,
return AccessToken for address with TTL
nowadays, the medical data is stored in diversified health
end procedure
care environment and controlled by different entities which

VOLUME 7, 2019 77985


K. Wang et al.: Securing Data With Blockchain and AI

After the RMD is excluded from malicious access behavior


according to the analyzing result from ASC as well as its
submodule GAN, the Access Control module communica-
tions with the Data Storage module for the RMD and then
triggers the on-chain smart contract SC1 between HB and HA
on the requested data MDAlice , and maybe necessarily triggers
the smart contract SC2 between HB and Alice. The former
regulates the value that HA should pay for the requested data
from HB , and the latter for the value that HB should transfer
to Alice since the ownership of MDAlice belongs to her.
When HA receives the requested data MDAlice , correspond-
ing value (e.g., tokens, coins, electric cash) is transferred
from HA to HB and from HB to Alice, according to the smart
contracts SC1 and SC2 respectively. That is, HB gains rewards
by providing storing service for Alice’s medical data, and
Alice is also paid by allowing her medical data to be shared
with HB .
To exploit the data for some further information that may
FIGURE 2. Medical data sharing using SecNet.
be helpful to HB , the Knowledge Computing module of PB
will merge the new received MDAlice with related data storing
may have different commercial requirements. The lack of in its Data Storage module, and may decompose the data into
trust mechanisms for data provenance, auditing and control, different types of data components (e.g., disease name, dis-
makes the sharing of valuable data impossible. Moreover, ease duration, patient name, patient age, drug-using records,
in most cases, patients have to collect their medical records etc.), to exploit further information and potential findings.
by themselves and then provide them to different institutions
(e.g., different hospitals), although these medical records may V. ALTERNATIVE WAY FOR SECNET
be stored several times in other institutes before, because The data storage in SecNet is provided by PDC, and the secu-
different institutions cannot easily share medical records due rity of data is the responsibility of the PDC’s owner. In this
to no standard format for data or no economic incentive. way, data is under control of its owner, and any interaction
On the other hand, medical data carries its owner’s privacy with data can be monitored locally in PDC.
information, but unfortunately, patients are in fact lack of However, if the SecNet users wants to store their data in
authority for the usage of these data. Additionally, for better a secure cloud, provided by a big company which has great
medical care services, patients have to give out their medical reputation and ability to guarantee data security, rather than
data without choices, due to the mismatch of the need for storing in their own PDCs, the philosophy of InterPlanetary
accurate analysis on medical data and the lack of knowledge File System (IPFS) [24] may be a choice to replace the
in medical care for patients. Data Storage module of PDC with distributed file system
To solve those problems above, SecNet employs where data objects are exchanged within one Git repository,
1) blockchain-based data sharing guaranteeing, 2) smart as shown in Figure 3.
contracts to regulate the interactions between trust-less enti- In this way, PDC coordinates and maintains a data storing
ties, 3) AI-based secure computing for behavior analyzing, network, where all the data is treated equally, and is frag-
to effectively provide data provenance, auditing and control, mented into data pieces and then scattered across the whole
as well as behavior tracking, via a tamper-proof way. Embed- network. Thus, the privacy of data as well as the survivability
ded with these characters SecNet provides, the detailed work- can be protected better than storing all the data of a user in a
flow to achieve trust medical data sharing is as follows. single PDC. For instance, if malicious parties destroy or hack
into some PDCs, they may get only some pieces of different
B. MEDICAL DATA SHARING WORKFLOW USING SECNET data but cannot easily get a complete data containing valuable
As shown in Figure 2, if the hospital HA wants to use Alice’s information, which significantly reduces the chance for pri-
medical data MDAlice , which is currently stored in another vacy leakage and degrades the risk that a data is completely
hospital HB , to support a very important medical experiment. destroyed due to the centralized storing in a local PDC.
HA needs to access its PDC PA , and then send the data request However, the disadvantage is that it becomes very difficult
RMD containing the metadata/identifier IDR to the PDC PB to enable personalized knowledge computing or AI-based
belonged to HB . secure computing by exploiting all the personal data for a
When PB receives the RMD from PA , the Access Control certain user, because these data is scattered across the whole
module analyzes the RMD with the help of ASC module SecNet, not stored in the data-owner’s PDC. One possible
in OSS, and meanwhile record this request behavior to the solution is to construct some secure computing nodes in
Blockchain Ledger, waiting for state synchronization. SecNet, where data can flow in yet only answers but no

77986 VOLUME 7, 2019


K. Wang et al.: Securing Data With Blockchain and AI

FIGURE 3. Alternative storage model of SecNet.

FIGURE 4. Vulnerability of SecNet when suffering DDoS security.


high-dimensional data can be flow out, to support AI-based
computing and knowledge extracting for some certain users,
without causing damage for data security and privacy. This is because the growth in the number of shared security
rules leads to a more comprehensive knowledge of network
VI. PERFORMANCE ANALYSIS security for all the participants, which makes it more difficult
In this section, we evaluate the design of SecNet in two for attackers to launch a successful DDoS attack to avoid the
aspects: vulnerability when suffering notorious network detection of the growing security rules.
attacks such as the Distributed Denial of Service (DDoS)
Attacks, and revenue for contributors who provide the secu- B. REVENUE FOR CONTRIBUTORS
rity rules on blockchain. The security level of SecNet will be improved continuously if
every contributor shares his own security rule on blockchain
A. VULNERABILITY OF ARCHITECTURE with eath other, since all participants in the system have more
DDoS attacks continue to be one of the most serious net- security knowledge to protect against attacks. The revenue for
work attacks for both the Internet infrastructure [25] and its each contributor is a key factor affecting contributor initiative.
applications [26]. Attackers can use this type of attacks to Firstly, we investigate how the revenue for each participant
exhaust the bandwidth resource for some popular and critical varies when sharing security rules public for a more secure
Web applications, making these services unavailable to the network, with different levels of rule quality control. Con-
users or even blocking Internet connectivity for a large part sidering the factor in the quality effect of the real market,
of a country, and thus can result in huge economic lost. the revenue for every contributor will increase linearly at
For example, even a single minute of service downtime can the very beginning stage but at different rates, yet will vary
cost up to 22000 dollars in revenue [26]. In SecNet, due to in different directions after the number of shared security
the sharing of security rules by every Internet user resulting rules exceeds a threshold. Accordingly, in this simulation,
in a more comprehensive knowledge on network security, we reasonably set the increasing rate of the revenue of a
the vulnerability that can be exploited by DDoS attackers will contributor at the very beginning stage as the quality control
be decreased dramatically. That is, SecNet can greatly reduce level of the shared security rules. The quality control level
the impact of the notorious DDoS attacks. For a scenario that represents the degree to which a rule can completely block an
DDoS attacks are happening independently and identically, attack. In the evaluations, three quality control levels for the
the number of attacks being detected can be considered as shared security rules (αq = 0.5, 0.7, 0.95) are investigated.
following the Poisson distribution. In this case, we assume After quality effect of the real market is formed, the revenue
all the users will report their learned security rules to the for the contributor with the highest quality will increased at
blockchain once suffering DDoS attacks. Figure 4 shows a much higher rate than other ones. That is, our theoretical
the vulnerability that can be exploited by DDoS attackers model fits the form of piecewise functions. Figure 5 illustrates
(the probability of the SecNet can be attacked by DDoS the modeling results that the revenue for the contributor will
attacks) varies with the sharing number of security rules, increase at a higher rate if the shared security rules are with
where four different security factors (λ = 0.2, 0.4, 0.6, 0.8) higher quality, especially after the quality effect of the real
are considered. The security factor indicates the severity market is formed. This is because a high-quality security rule
of network threats (e.g., the frequency of DDoS attacks). can characterize the network threats more accurately, and
The results show that the vulnerability of SecNet reduces thus is more effective on countering threats than the ones
dramatically as the number of shared security rules increase. with lower quality. In addition, although the whole revenue

VOLUME 7, 2019 77987


K. Wang et al.: Securing Data With Blockchain and AI

price to its fair market value. Figure 6 indicates that if the


rule price that is ready for share publicly is set unreasonably
high, the revenue of the rule publisher may be decreased.
This is because other participants may choose to download
other security rules with similar function yet lower price.
That is, every security rule has an inherent valuation. In fact,
the price of a rule should be determined by the safety benefits
it brings, and maybe an intelligent and fair pricing service for
the shared security rules is needed to be integrated into the
SecNet system in future.

VII. CONCLUSION
In order to leverage AI and blockchain to fit the problem
of abusing data, as well as empower AI with the help of
blockchain for trusted data management in trust-less envi-
ronment, we propose the SecNet, which is a new network-
FIGURE 5. Revenue when sharing security rules with varying rule quality.
ing paradigm focusing on secure data storing, sharing and
computing instead of communicating. SecNet provides data
ownership guaranteeing with the help of blockchain tech-
nologies, and AI-based secure computing platform as well
as blockchain-based incentive mechanism, offering paradigm
and incentives for data merging and more powerful AI to
finally achieve better network security. Moreover, we discuss
the typical use scenario of SecNet in medical care system,
and gives alternative ways for employing the storage function
of SecNet. Furthermore, we evaluate its improvement on
network vulnerability when countering DDoS attacks, and
analyze the inventive aspect on encouraging users to share
security rules for a more secure network.
In future work, we will explore how to leverage blockchain
for the access authorization on data requests, and design
secure and detailed smart contracts for data sharing and
AI-based computing service in SecNet. In addition, we will
FIGURE 6. Revenue when sharing security rules with different rule price.
model SecNet and analyze its performance through exten-
sive experiments based on advanced platforms (e.g., inte-
for all contributors is increase, the revenue for every single grating IPFS [27] and Ethereum [28] to form a SecNet-like
contributor is very different. As can be seen from the figure, architecture).
when the quality effect of the real market is formed, contribu-
tors who share high-quality security rules benefit much more REFERENCES
quickly while other contributors earn little. This is because
[1] H. Yin, D. Guo, K. Wang, Z. Jiang, Y. Lyu, and J. Xing, ‘‘Hyperconnected
the majority of consumers will prefer to choose high-quality network: A decentralized trusted computing and networking paradigm,’’
security rules which are more effective on protect themselves IEEE Netw., vol. 32, no. 1, pp. 112–117, Jan./Feb. 2018.
than those with lower quality and litter effect. [2] K. Fan, W. Jiang, H. Li, and Y. Yang, ‘‘Lightweight RFID protocol for
medical privacy protection in IoT,’’ IEEE Trans Ind. Informat., vol. 14,
Then, the effect of different rule pricing strategies on the no. 4, pp. 1656–1665, Apr. 2018.
revenue for each participant when sharing security rules is [3] T. Chajed, J. Gjengset, J. Van Den Hooff, M. F. Kaashoek, J. Mickens,
investigated. In our analyzing model, for a certain type of R. Morris, and N. Zeldovich, ‘‘Amber: Decoupling user data from Web
applications,’’ in Proc. 15th Workshop Hot Topics Oper. Syst. (HotOS XV),
network threat with similar characteristics, the security rule Warth-Weiningen, Switzerland, 2015, pp. 1–6.
with a higher price has more detailed description of attack [4] M. Lecuyer, R. Spahn, R. Geambasu, T.-K. Huang, and S. Sen, ‘‘Enhancing
characteristics and faster attack detection performance, yet selectivity in big data,’’ IEEE Security Privacy, vol. 16, no. 1, pp. 34–42,
may be only suitable for the security level requirements of Jan./Feb. 2018.
[5] Y.-A. de Montjoye, E. Shmueli, S. S. Wang, and A. S. Pentland, ‘‘openPDS:
high-end customers (e.g., commercial Banks, government Protecting the privacy of metadata through SafeAnswers,’’ PLoS ONE,
data centers). For the majority of ordinary individual con- vol. 9, no. 7, 2014, Art. no. e98790.
sumers, the security rules with similar functionality but at a [6] C. Perera, R. Ranjan, and L. Wang, ‘‘End-to-end privacy for open big data
markets,’’ IEEE Cloud Comput., vol. 2, no. 4, pp. 44–53, Apr. 2015.
lower price may be preferred. In the evaluations, three price
[7] X. Zheng, Z. Cai, and Y. Li, ‘‘Data linkage in smart Internet of Things
levels for the shared security rules (αp = 1.05, 1.5, 2) are systems: A consideration from a privacy perspective,’’ IEEE Commun.
investigated. The price level indicates the ratio of a fixed Mag., vol. 56, no. 9, pp. 55–61, Sep. 2018.

77988 VOLUME 7, 2019


K. Wang et al.: Securing Data With Blockchain and AI

[8] Q. Lu and X. Xu, ‘‘Adaptable blockchain-based systems: A case study KAI WANG received the B.S. and Ph.D. degrees
for product traceability,’’ IEEE Softw., vol. 34, no. 6, pp. 21–27, from Beijing Jiaotong University. He is currently a
Nov./Dec. 2017. Postdoctoral with the Research Institute of Infor-
[9] Y. Liang, Z. Cai, J. Yu, Q. Han, and Y. Li, ‘‘Deep learning based inference mation Technology, Tsinghua University, China.
of private information using embedded sensors in smart devices’’ IEEE He is an Assistant Professor with the School
Netw. Mag., vol. 32, no. 4, pp. 8–14, Jul./Aug. 2018. of Computer and Control Engineering, Yantai
[10] Q. Xia, E. B. Sifah, K. O. Asamoah, J. Gao, X. Du, and M. Guizani, University, China. His current research interest
‘‘MeDShare: Trust-less medical data sharing among cloud service
includes cyberspace security. He has published
providers via blockchain,’’ IEEE Access, vol. 5, pp. 14757–14767, 2017.
more than 20 papers in prestigious international
[11] D. E. O’Leary, ‘‘Artificial intelligence and big data,’’ IEEE Intell. Syst.,
vol. 28, no. 2, pp. 96–99, Mar. 2013. journals and conferences (e.g., the IEEE Network,
[12] A. Halevy, P. Norvig, and F. Pereira, ‘‘The unreasonable effectiveness of Information Sciences), and serves as the TPC Member of IPCCC 2018/2019,
data,’’ IEEE Intell. Syst., vol. 24, no. 2, pp. 8–12, Mar. 2009. the Guest Editor of International Journal of Digital Multimedia Broad-
[13] Z. Cai and X. Zheng, ‘‘A private and efficient mechanism for data upload- casting, and Technical Reviewers for many important international journals
ing in smart cyber-physical systems,’’ IEEE Trans. Netw. Sci. Eng., to be (e.g., ACM Computing Surveys).
published. doi: 10.1109/TNSE.2018.2830307.
[14] A. Dorri, M. Steger, S. S. Kanhere, and R. Jurdak, ‘‘BlockChain: A dis-
tributed solution to automotive security and privacy,’’ IEEE Commun.
Mag., vol. 55, no. 12, pp. 119–125, Dec. 2017.
JIAQING DONG received the B.S. degree in
[15] J. Wang, M. Li, Y. He, H. Li, K. Xiao, and C. Wang, ‘‘A blockchain based
privacy-preserving incentive mechanism in crowdsensing applications,’’ computer science from Peking University. He is
IEEE Access, vol. 6, pp. 17545–17556, 2018. currently pursuing the Ph.D. degreein com-
[16] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, ‘‘Revisiting unreasonable puter science with Tsinghua University. His
effectiveness of data in deep learning era,’’ in Proc. IEEE Int. Conf. research interests include knowledge discovering,
Comput. Vis. (ICCV), Oct. 2017, pp. 843–852. software-defined networking, and mobile network
[17] W. Meng, E. W. Tischhauser, Q. Wang, Y. Wang, and J. Han, ‘‘When measurement.
intrusion detection meets blockchain technology: A review,’’ IEEE Access,
vol. 6, pp. 10179–10188, 2018.
[18] J.-H. Lee, ‘‘BIDaaS: Blockchain based ID as a service,’’ IEEE Access,
vol. 6, pp. 2274–2278, 2017.
[19] K. Wang, H. Yin, W. Quan, and G. Min, ‘‘Enabling collaborative edge
computing for software defined vehicular networks,’’ IEEE Netw., vol. 32,
no. 5, pp. 112–117, Sep./Oct. 2018. YING WANG received the B.S. degree from
[20] A. B. Kurtulmus and K. Daniel, ‘‘Trustless machine learning con- Hunan University, and currently pursuing the
tracts; evaluating and exchanging machine learning models on the master’s degree in software engineering with
ethereum blockchain,’’ 2018, arXiv:1802.10185. [Online]. Available: the Wuhan University of Technology. Her
https://arxiv.org/abs/1802.10185 research interests include networking architecture,
[21] A. L. Buczak and E. Guven, ‘‘A survey of data mining and machine blockchain-based applications, and data mining.
learning methods for cyber security intrusion detection,’’ IEEE Commun.
Surveys Tuts., vol. 18, no. 2, pp. 1153–1176, 2nd Quart., 2016.
[22] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial net-
works,’’ 2014, arXiv:1406.2661. [Online]. Available: https://arxiv.org/
abs/1406.2661
[23] E. C. Ferrer, ‘‘The blockchain: A new framework for robotic swarm
systems,’’ 2017, arXiv:1608.00695. [Online]. Available: https://arxiv.org/ HAO YIN received the B.S., M.E., and Ph.D.
abs/1608.00695 degrees from Huazhong University of Science and
[24] IPFS. Accessed: Jun. 5, 2019. [Online]. Available: https://ipfs.io/ Technology, Wuhan, China, in 1996, 1999, and
[25] S. T. Zargar, J. Joshi, and D. Tipper, ‘‘A survey of defense mechanisms 2002, respectively, all in electrical engineering.
against distributed denial of service (DDoS) flooding attacks,’’ IEEE He is a Professor with the Research Institute of
Commun. Surveys Tuts., vol. 15, no. 4, pp. 2046–2069, 4th Quart., 2013.
Information Technology (RIIT), Tsinghua Univer-
[26] A. Praseed and P. S. Thilagam, ‘‘DDoS attacks at the application layer:
sity.. He was elected as the New Century Excel-
Challenges and research perspectives for safeguarding Web applications,’’
IEEE Commun. Surveys Tuts., vol. 21, no. 1, pp. 661–685, 1st Quart., 2019. lent Talent of the Chinese Ministry of Education
[27] J. Benet, ‘‘IPFS—Content addressed, Versioned, P2P file system,’’ 2014, in 2009, and won the Chinese National Science
arXiv:1407.3561. [Online]. Available: https://arxiv.org/abs/1407.3561 Foundation for Excellent Young Scholars in 2012.
[28] G. Wood, ‘‘Ethereum: A secure decentralised generalised transaction His research interests include multimedia communication and computer
ledger,’’ Ethereum Project Yellow Paper, 2018. Accessed: Jun. 5, 2019. networks.
[Online]. Available: https://ethereum.github.io/yellowpaper/paper.pdf

VOLUME 7, 2019 77989

You might also like