On The Design of A Blockchain-Based System To Facilitate Healthcare Data Sharing
On The Design of A Blockchain-Based System To Facilitate Healthcare Data Sharing
Sharing
Anastasia Theodouli, Stelios Arakliotis, Konstantinos Moschou, Konstantinos Votis, Dimitrios Tzovaras
CERTH / ITI
Thessaloniki, Greece
{anastath , saraklio, konsmosc, kvotis, tzovaras}@iti.gr
Abstract—Blockchain technology though originally designed That being said, healthcare data are highly sensitive and
for keeping financial ledgers, recently has found applications Data Owners, i.e. Patients, may hesitate to share their data
in many different fields including healthcare. Sharing for research purposes despite the positive impact that such a
healthcare data for research purposes will boost research sharing can have as outlined above, since an inappropriate
innovation in this area. That being said, healthcare data disclosure of their data and/or of their identities could have a
sharing raises many privacy and security issues for the direct impact on their health, and/or indirect financial or
Patients who share their data. In this work, we present the social implications as regards their employers, involved
potential of Blockchain technology to facilitate (i) private and insurance companies, and so on.
auditable healthcare data sharing and (ii) healthcare data
To alleviate Patients’ concerns as regards their data
access permission handling by proposing a blockchain-based
system architecture design.
sharing, we present our contribution, a blockchain-centric
system architecture design which is used to ensure (i) shared
Keywords- blockchain; healthcare data; smart contracts; data integrity, (ii) patient pseudonymity, (iii) auditing and
privacy; security; pseudonymity; auditing; data integrity accountability, and (iv) workflow automation by leveraging
inherent properties of the blockchain technology like
immutability, auditability, and accountability combined with
I. INTRODUCTION the usage of smart contracts, a transaction-aware state-
A Blockchain consists of a continuously growing list of machine mechanism which enables a (quasi) Turing-
records called blocks. Each block represents a set of complete fully-programmable logic in the way that the
transactions and is cryptographically linked to its previous Blockchain state changes; these scripts are automatically
block thus forming a chain. A Blockchain is managed by a executed upon a pre-defined set of rules included within the
peer-to-peer network of nodes that validate new blocks using smart contracts. Moreover, the usage of smart contracts is
a consensus algorithm. The consensus algorithm ensures that quite tailored to our approach that tackles with complex
the next block in a blockchain is the one and only version of workflows.
the truth, thus preventing powerful adversaries from The remainder of this paper is structured as follows, in
successfully forking the chain.1 As a result, all nodes of the section 2, we discuss related work. In Section 3 we present
network contain the same replica of data, eliminating the our setting (blockchain model proposed, involved entities,
need of a central trusted authority to manage data. [1] incentives of involved entities for system adoption and level
Blockchain, being a cutting-edge technology and an of trust among them). In section 4, proposed system
emerging research field, has numerous applications to architecture is presented. Section 5 presents the smart
several domains, e.g. in cryptocurrencies, Digital Auctions, contracts used, while section 6 gives an overview of the
Digital Supply Chains, IoT and smart cities, Digital supported use case scenarios and their corresponding
Identities, etc. [2] Applications of Blockchain to healthcare workflows. Section 7 outlines the added value of the system.
domain have been extensively explored to enable Finally, section 8 concludes the paper.
interoperability between several Health Units in a secure and
auditable way. [3, 4, 5] II. RELATED WORK
Access of medical research centers to healthcare data In this section, we discuss works that focus on healthcare
stored on Web / Cloud Clinical Platforms can have a positive data sharing/management leveraging (i) Blockchain
impact on medical research innovation. In such a case, infrastructure, (ii) other technologies such as cloud
Medical Researchers can have access to a distributed ‘pool computing and big data.
of data’ of medical treatments and healthcare outcomes
based on values stored via eHealth and mHeath in web/cloud A. Healthcare data management with Blockchain
clinical Platforms. Moreover, by enabling medical Blockchain has been proposed as an appropriate
researchers to filter out specific features of the data they are infrastructure for healthcare data sharing by the authors of
looking for, one could achieve a facilitation in the formation this work [4]. With an aim to facilitate healthcare data
of demographic cohorts, and also enhance precision interoperability between institutions, the authors also
medicine. introduced a new consensus algorithm, called ‘Proof of
Interoperability’ that was based on conformance to the Fast
1
https://www.coindesk.com/short-guide-blockchain-consensus-protocols/ Healthcare Interoperability Resources (FHIR) protocol. In
this work [5], authors illustrate how to apply blockchain
technology in pervasive social network (PSN)-based O(n) , where n is the number of data records. This is due to
healthcare. the fact that we store hashed Patient data which are of a fixed
Healthcare data is highly sensitive and there is a need to length and thus not affected by the actual size of the data
protect it from unwarranted access. Towards this end, record before hashing. Other than that, due to the distributed
authors of this work [3] present MedRec, a novel, nature of the Blockchain, data stored on-chain are replicated
decentralized record management system to handle to all the nodes of the network avoiding a single point of
Electronic Health Records (EHRs), using blockchain failure for the system, i.e. if a node fails, assuming m
technology. The block content represents data ownership and network nodes, there are still m-1 nodes holding the data.
viewership permissions shared by members of a private,
peer-to-peer network. Via smart contracts on an Ethereum III. PROPOSED SETTING
blockchain, they log patient-provider relationships that The involved entities, their incentives to use the system,
associate a medical record with viewing permissions and the level of trust among them and the Blockchain model
data retrieval instructions (essentially data pointers) for assumed are analyzed below:
execution on external databases. MedRec architecture further
extends its value proposition in empowering medical A. Patients
researchers to mine in the Blockchain network getting Patients who want to share their healthcare Data
anonymized medical data as mining rewards. acknowledging (i) the benefits of such a sharing regarding
In medChain project2, a federated Blockchain based on medical research boosting in general, (ii) the positive impact
the Ethereum platform is used for Patient Data Storage and on their own healthcare treatment and outcomes in the long
Retrieval. Patient Data Privacy is ensured using encryption run. On the other hand, Patients don’t want to compromise
with Patient’s private key and hashing of electronic Protected privacy and security of their Data when sharing them.
Health Information (ePHI) before being stored on the Moreover, according to the proposed system design, there is
medChain Blockchain. no significant overhead for Patients to share their data to the
Authors of this work [6] propose the usage of smart Blockchain network. Patients use dedicated Web / Cloud
contracts deployed on a private, permissioned Ethereum Platforms to export their data in the appropriate format. In
blockchain to govern Clinical Trial Authorization (CTA) our setting, it is assumed that Patients are trusted and the data
details and a private IPFS network to store the data structure they upload are correct. Patients are also enabled to filter
that holds the clinical trial protocol whenever large file their historical data and check past transactions informing
storage is required with an aim to improve data transparency them who accessed their data, when, and what data did they
in clinical trials. access.
Various design aspects as well as technology
requirements and challenges of a blockchain platform B. Web / Cloud Platforms
architecture for clinical trials and precision medicine have Web / Cloud Platforms having their own local databases
been discussed in frames of this work [7]. Usage of the that keep Patient healthcare data. They can export the Data in
public Bitcoin Blockchain network with an aim to enhance an appropriate format for sharing with the upper layers of the
transparency and traceability of the Consent given by system. No need to be Blockchain nodes. They can use the
Patients involved in Clinical Trials is discussed in frames of upper layers of the system (see Architectural layers of the
this work [8]. system in Figure 1 below) as a Blockchain-as-a-Service
(BaaS) infrastructure and this decreases capital expenditure
B. Healthcare data management without Blockchain for integrating with the system and thus increases chances of
Healthcare data sharing among interested stakeholders the system adoption from their part. In general, Web / Cloud
(e.g. public health institutions, research institutions, patients, Platforms database administrators could be regarded either as
etc.) as regards multi-source, heterogeneous data using honest-but-curious which means that they will follow the
Cloud computing and Big Data analytics techniques has been agreed with the Data Owner protocol and return the results of
explored in frames of this work [9]. In the data management the computations done on its side, however they may look at
layer of their proposed architecture, they propose techniques the data they processes, or as malicious, which means that
based on distributed parallel computing and distributed file they may not follow the agreed protocol and/or may not
storage based also on memory analysis, to cope with return the results of the computations. In our setting, the
scenarios of real-time analysis of big data stored on their malicious Web / Cloud database administration security
infrastructure. threat model is assumed, in which the administrator may see
Compared with the techniques proposed in this work, in and alter data but not deny access to them and measures to
our design, we neither store nor process the collected data tackle with this threat are described in Section 4 below.
on-chain. What we store on-chain, is metadata (hashed data,
data reference URLs, and permissions) that enable the data C. Medical research centers
sharing in a secure, private and auditable way. As regards the Medical research centers who want access to the
Patient data kept on-chain, their storage can be regarded as Healthcare Data stored on Clinical Platforms for research
purposes. E.g. they might want to use a common pool of data
2
from which to define demographic cohorts or enhance
https://www.medchain.us/# precision medicine practices. They are not by default trusted
so there is a need for an off-chain verification of their means that Platforms located at Layer 1 do not need to be
Identity before being accepted as nodes of the Blockchain Blockchain nodes in order to send the data; they only need to
network. export the data in an appropriate format that can be
consumed by the Web services exposed at Layer 2. This
D. Validators increases adoption of the system from multiple /
Validators are a subset of the Blockchain network nodes heterogeneous Platforms as it is not bound to their local
which assemble new blocks of valid transactions. All infrastructure (e.g. which DBMS they use).
verified Entities participating as nodes of the network The system architecture representing the above described
will/can be Validators. layers together with the components in each layer and
interactions among them is shown in Figure 1 below.
E. Blockchain model assumed
The Blockchain model assumed is a consortium
blockchain in which identities of medical research centers
that participate as nodes of the network are assumed to be
verified off-chain. Once the medical research centers are
verified and allowed to be network nodes, they are
considered to be trusted by all the other peers of the network.
IV. SYSTEM ARCHITECTURE
In this section, architectural layers, components and
interactions among the components of our proposed system
are presented.
A. Layer1: Web / Cloud Platforms
In this layer, there are multiple Platforms either Web
hosted or provided as Cloud Services which store Patient
healthcare Data on their own local databases. Such an
example Web Clinical Platform is myAirCoach3. They can
export Patient data in a format appropriate for exchange over
Restful Web Services. The data are being hashed before Figure 1. System Architecture
being transferred so as to avoid data leakage during their
transfer between layers 1 and 2. The data transfer cost V. SMART CONTRACTS
between layers 1 and 2 is also typically decreased due to the In this section, the Smart Contracts used by our system
hashing. are analyzed.
B. Layer2: Cloud middleware A. Registry Contract (RC)
This is the cloud middleware, which connects multiple This smart contract acts as the Registry of all the Users of
VMs that are set up in order to ensure that there is no single the System. Users can be separated in two different
point of failure as opposed to a centralized setting in which categories, i.e. (i) medical research centers and and (ii)
one dedicated server hosts the middleware infrastructure. Patients. RC contains a mapping between all system Users
This component connects the Web / Cloud Clinical uniquely identifying field with a unique smart contract
Platforms located at layer 1 with the consortium Blockchain address that is called Patient Data Contract (PDC) and
network located at layer 3. It interacts with layer 1 by corresponds to each Patient Data. This uniquely identifying
receiving the data via RESTful API over HTTP(s) and then field should have the following properties (i) be unique per
stores them to the Blockchain by contacting with the Patient, (ii) not able to reveal their identities in a direct or an
dedicated smart contracts at layer 3 using an appropriate API indirect manner. Note that in the case that the User is not a
to interact with these smart contracts. Patient the address of the PDC can be a null or empty field
C. Layer3: Blockchain network that does not point to an existing PDC deployed within the
Blockchain.
This is the consortium Blockchain network. The smart
contracts that administer the data sharing and permission B. Patient Data Contract (PDC)
management are deployed in this Blockchain network. This smart contract is unique to each Patient and contains
Communication between Layers 2 and 3 is achieved via an the hashed Patient healthcare Data along with a URL
appropriate API. pointing to the Patient healthcare Data in the Web / Cloud
A key feature in this architectural design is that Layers 2 Clinical Platform local database. The hashed copy of the data
and 3 are exposed to the Web / Cloud clinical platforms is used in order for the Entities that want to access the Data
located at Layer 1 as a Blockchain-as-a-Service (BaaS). This (medical research centers) to be able to verify the data
integrity to tackle with the malicious database administrator
3
http://myaircoach.eu/myaircoach/
threat model that has been assumed as already explained in In the case of sharing data with an account on the
section 3 above. Clinical Platform, the uniquely identifying field of the
Patient should be given by the user via the Patient
C. Permissions Contract (PC) application UI separately and not to be stored on the
This smart contract administers the Permission database, so that there is no matching in the Clinical
management of Patient Data. In particular, it contains a Platform database between the field and personal data like
mapping between the Patient Data Contract address, the data name, username, emails, etc. Since the Patient has an
requesting Entity (medical research center) uniquely account on the Web Platform in this case, some (encrypted)
identifying field with a field called ‘Permissions Status’ data on the URL stored on the Blockchain can enable data
which contains the Patient approval to access their data. requesting entities to retrieve data back on the Web / Cloud
The Smart Contracts of the system along with the data Platform.
they contain and high-level relationships among them are In the case of sharing data without an account on the
depicted in Figure 2 below. Clinical Platform, the user will upload and store data on the
Clinical Platform anonymously (e.g. with two factor
authentication, email verification, JWT temporary token
login, etc.). The data will be stored there along with the
uniquely identifying field of the Patient. In this case, there is
no problem to store the field, since the Patient does not store
personal data on the Clinical Platform and thus the uniquely
identifying field and personal data cannot be matched;
patient pseudonymity i.e. who shares data via the Platform is
thus preserved. The Clinical Platform is used for exporting
data in correct format for communication with the BaaS (e.g.
RESTful web service with JSON message format exchange
that sends POST HTTP(s) requests to the HTTP server
located at the layer 2) and also for sharing the Blockchain
stored URL for the medical researchers to access the data.
Since the Patient has not an account on the Web Platform in
this case, the uniquely identifying field is used to match the
user with their data.
The workflow of this use case scenario is the following:
1. Web / Cloud Platform hashes locally Patient Data
and sends them along with the uniquely identifying
field of the Patient to the cloud HTTP server which
Figure 2. Smart Contracts along with their data and calls the smart contract API.
relationships between them
2. The smart contract API searches the RC for the
VI. USE CASE SCENARIOS uniquely identifying field of the Patient. If not found,
adds a new record to the RC for this Patient, then
In this section, the following use case scenarios of our
creates a new PDC contract and stores the data of the
proposed system are presented, (i) User Registration, (ii)
Patient Data Sharing, and (iii) Medical Research centers Patient into this newly created PDC. If already
Request Permissions to access Patient Data. found, it locates the PDC and stores the data in the
related fields of the PDC (this is actually a data
A. User Registration update).
Users4 using a dedicated application UI register to the RC
by entering a uniquely identifying field generated for them. C. Request Permisions
In case that the User to be registered is a Patient (option
ticked within their UI), then a new PDC is created within the A key design concept of our proposed system is that the
RC and the PDC address is written as a reference back to the medical Research centers should not know whose Patient
RC. data they access. This ensures Patient pseudonymity, i.e. data
requesting entities know what the healthcare data are, but
B. Patient Data Sharing they do not know whose data they are. The only prerequisite
For pseudonymity issues, Patients can connect with the is that the Patient has been registered with the system which
Web / Cloud Clinical Platform via which they will share serves as an implicit consent that they allow their healthcare
their data in two ways (i) with an account on the Platform, data to be accessed by requesting Entities in a pseudonymous
(ii) without any account. way.
The workflow of this use case scenario is the following:
4
1. The data requesting Entity, which in our setting can
In our setting, User can be either a Patient οr a medical research centre. be a medical research center, calls a dedicated
function within the RC and selects randomly a VII. SYSTEM VALUE
Patient. The reason for random selection is that there
should be no difference in the times each Patient is A. Data Integrity
selected to be notified to provide access to their data, The data requesting entity, after obtaining permission by
consider e.g. the scenario in which Patients are the Patient to access their data can see if the data
sequentially selected from the RC to approve access downloaded from the Web / Cloud clinical Platform ( after
to their data. In such a case, the higher a Patient is in being hashed), match the hashed data stored on the PDC
the Registry, the more times they will be notified to Smart Contract. This is significant if we assume a malicious
provide access to data requesting entities. Note that administrator security model in the web/cloud clinical
to implement this functionality, a dedicated function platform database side.
within the RC should exist which in order to
distinguish between Patients and other Users e.g. by B. Patient Pseudonymity
checking the PDC address value; in the case of a Patient Pseudonymity is ensured since only the uniquely
non-Patient User this filed should have an empty or identifying field of the Patient is stored on the RC Smart
null value. Contract and none of their personal data are stored there.
2. A temporary permission contract (PC) between the Patients are selected in a random order so there is no chance
Entity and the Patient with Permission Status to track patients according to their registration order (no
‘Pending’ is created. information leakage).