Delegated Content Erasure in IPFS: Future Generation Computer Systems June 2020
Delegated Content Erasure in IPFS: Future Generation Computer Systems June 2020
net/publication/342523804
CITATIONS READS
2 231
5 authors, including:
Some of the authors of this publication are also working on these related projects:
Zero-day malware detection based on supervised learning algorithms of API call signatures View project
All content following this page was uploaded by Fran Casino on 11 September 2020.
article info a b s t r a c t
Article history: The InterPlanetary File System (IPFS) is employed extensively nowadays by many blockchain projects
Received 19 September 2019 to store personal data off-chain to comply with the Right to be Forgotten (RtbF) requirement of the
Received in revised form 7 May 2020 General Data Protection Regulation (GDPR), the new regulatory regime for personal data protection in
Accepted 23 June 2020
the EU. In such a way, when a request for content erasure is to be carried out under the RtbF, the
Available online 29 June 2020
onus of removing the actual personal information moves to the IPFS protocol. Nevertheless, enforcing
Keywords: data erasure across the entire IPFS network is not actually feasible, mainly due to its decentralized
IPFS nature. Consequently, the implementation of a delegation mechanism for handling content erasure
Content erasure requests within the IPFS would be the most conducive way towards aligning the IPFS with the GDPR.
Decentralized storage To that end, in this work, we propose an anonymous protocol for delegated content erasure requests
Privacy in the IPFS. The proposed protocol could be smoothly integrated into the IPFS to distribute an erasure
Right to be Forgotten request among all the IPFS nodes and, ultimately, to fulfil the erasure requirements foreseen in the
GDPR
RtbF. Furthermore, the protocol complies with the primary principle of the IPFS to prevent censoring;
therefore, erasure is only allowed to the original content provider or her delegates. A formal definition
and the security proofs are provided, along with a set of experiments that prove the efficacy of the
proposed protocol. We demonstrate that the overhead introduced by the proposed protocol does not
affect the system’s efficiency. Our experimental results exhibit a robust performance as the average
times for generating the content-dependent keys and for spreading the erasure requests do not affect
the overall performance of the IPFS.
© 2020 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.future.2020.06.037
0167-739X/© 2020 Elsevier B.V. All rights reserved.
E. Politou, E. Alepis, C. Patsakis et al. / Future Generation Computer Systems 112 (2020) 956–964 957
In numerous cases, the IPFS is used in combination with 2.1. From Usenet to blockchains
blockchains to store off-chain the actual files while maintaining
in the blockchain only the hash pointers to those files. This is a While p2p systems such as Usenet5 seemed to be the most
preferable workaround when personal data are at stake: personal natural approach to the early days of computer revolution, the
information is generally avoided to be included in blockchain prevalence of less expensive and less functional desktop PCs in
transactions as blockchains are in some respects incompatible the late ’80s resulted in the predominance of client–server archi-
with privacy and data protection requirements [3]. For instance, tecture [8]. Yet, the widespread adoption over the past 20 years
blockchain’s inherent immutability is found to be in conflict with of content sharing applications and protocols such as Napster.6
the new regime for data protection across Europe, the General and BitTorrent returned decentralized p2p networks to spot-
Data Protection Regulation (GDPR), and in particular with its light [9] Meanwhile, the prevalence of grid and distributed com-
enshrined ‘‘Right to be Forgotten’’ (RtbF). According to this right,
puting in the last few decades relied also heavily on p2p founda-
data controllers are obliged to erase any personal information
tions to allow the use of spare computing resources to complete
upon request and when certain conditions apply [3–5]. However,
high-performance tasks [8].
storing the actual personal files in the IPFS network does not
As of 2009, the boom of blockchain technology with the advent
remove the burden of having to remove them should the RtbF
be raised. Instead, similar to all technological developments, it is of bitcoin caused the emergence of a new wave of decentralized
of utter importance users to be able to control the dissemination p2p systems. Blockchain is literally a distributed ledger stored
of their data within the IPFS network, especially considering the on a network of machines and is formed by a chain of blocks
emergence of novel threats [6,7]. Nevertheless, such a mechanism connected to each other using hash codes. Bitcoin, the first appli-
has not been foreseen by the protocol layer of the IPFS, mainly cation of blockchain technology to decentralize financial trans-
due to the unfeasibility of enforcing data erasure across all peers actions by creating a ‘‘peer-to-peer electronic cash system’’ [10],
in a decentralized public network. Yet, given the adverse implica- leveraged and improved upon various notions of p2p computing,
tions of non-complying with privacy and data protection rights, and introduced, in effect, decentralization as a means to assure
the alignment of the IPFS with the RtbF is considered critical. In the reliability of non-trusted environments such as those of elec-
this regard, we introduce in this work an efficient protocol for tronic currencies. These days, although blockchain is commonly
anonymous delegated erasure that can handle securely any con- discussed in the context of cryptocurrencies, blockchain applica-
tent erasure requests and can be easily integrated into the IPFS. tions go beyond finance to healthcare, supply chain management,
To the best of our knowledge, this is the first proposal to align the and identity management, among others, to distribute informa-
IPFS with the RtbF and to endorse its GDPR compliance. Therefore, tion securely, transparently, and immutably [11]. Yet, uploading
we believe that our work adds real value to the IPFS in terms of its data to a public blockchain suffers from scalability and privacy
privacy enhancement and, consequently, contributes significantly issues [3,12]. In fact, putting large amounts of data in a blockchain
to its future adoption by applications that are processing personal transaction is not advisable due to the high cost involved as
data. well as the latency problems introduced when full nodes need to
The rest of this work is structured as follows. In Section 2,
download the entire ledger. Furthermore, the immutability and
we provide some background information on decentralized com-
transparency of blockchains, according to which they keep data
puting, including a brief history of its evolution up to the IPFS
indefinitely and in plain sight, forbid their use in applications
network. Next, in Section 3, we present the IPFS protocol and its
where personal data are at stake. Although current research ef-
characteristics, while in Section 4, we describe how data erasure
is currently handled by the IPFS. In Section 5, we discuss the forts towards introducing restricted mutability on blockchains are
need for aligning the IPFS with the RtbF and how this could be indeed astonishing, modifying or completely removing informa-
achieved, whereas in Section 6 we introduce and formally specify tion held in public blockchains is thus far impossible [3].
a protocol (based on the current IPFS architecture) for integrating
a delegated erasure mechanism for erasure requests into the IPFS 2.2. Towards off-chain solutions
protocol. The paper concludes in Section 7 by discussing our
contributions in terms of enhancing decentralized technologies To overcome blockchain scalability and privacy issues while
such as the IPFS towards privacy and GDPR compliance. maintaining the benefits of decentralization, blockchains are com-
monly combined with off-chain storage solutions. These involve
2. Background storing the actual files outside the blockchain and keeping only
the timestamps and the pointers to these files in the blockchain.
As decentralized computing signs a transition away from cloud Still, for these workarounds to be fully decentralized they need to
computing, decentralized storage and file sharing, which works be based on decentralized p2p systems such as Storj [13], SIA [14],
by sharing a file across a p2p network, is commonly discussed
Filecoin,7 IPFS [15], Dat [16] or Swarm,8 among others. Storj, SIA
as an alternative to cloud storage solutions like Dropbox3 and
and Filecoin are open source platforms for decentralized storage
Google Drive4 which rely on large, centralized silos of data. De-
that leverage blockchain technology and cryptocurrencies to in-
centralized storage and file sharing applications offer increased
centivize file storage and sharing. IPFS, Dat and Swarm implement
benefits compared to their centralized counterparts which are
vulnerable to a single point of failure or to outside attacks from their own protocols for efficient decentralized storage and con-
malicious actors who can compromise the data or leak confi- tent distribution. Yet, Swarm, which at the time of writing is still
dential information. These events could result in adverse privacy under development, is being tightly coupled with the ethereum9
implications, especially when the data under consideration per- blockchain ecosystem to provide a sufficiently decentralized store
tain to sensitive personal details about individuals. Decentralized for Ethereum’s DApp code and data. Dat is an application protocol
networks, however, do not suffer from the security limitations that focuses on sharing large files. IPFS, on the other hand,
of centralized storage systems. For instance, due to their in-
herent design that relies on too many peers for securing their 5 http://www.usenet.com/what-is-usenet
state, denial of service attacks to decentralized systems are less 6 http://www.britannica.com/topic/Napster
possible. 7 http://filecoin.io/filecoin.pdf
8 https://swarm-guide.readthedocs.io/en/latest/introduction.html#
3 http://www.dropbox.com introduction
4 http://www.google.com/drive 9 http://ethereum.org
958 E. Politou, E. Alepis, C. Patsakis et al. / Future Generation Computer Systems 112 (2020) 956–964
implements a lower level more generic p2p network protocol, of an object: IPNS and DNSLink, with the latter being more effi-
not tied to just one blockchain platform like ethereum. However, cient [7]. Moreover, the InterPlanetary Linked Data (IPLD)15 set
lacking an incentivization mechanism, as it does not support of standards are implemented in the IPFS to create more flexible
any cryptocurrency, it provides no storage guarantee. Although universally addressable and linkable decentralized data structures
Filecoin is going to fill this gap by implementing an open-sourced, of different sorts of data.
public cryptocurrency on top of the IPFS to incentivize users In the IPFS network, nodes store a collection of objects (hashed
to contribute their unused storage,10 IPFS’ ambitious goal is to files) in local storage, and they connect to each other to transfer
offer the infrastructure for reinventing the Internet and replacing objects. Nodes, i.e. users, are not required to store all the data
the traditional HTTP protocol.11 by connecting all computing published in the network. Instead, they can choose which data
devices with the same file system. The IPFS’ substantial impact they want to persist. Users who want to retrieve any of those
on linking and searching content online, along with its smooth files access an abstraction layer where they simply call the hash of
integration with current blockchain platforms for storing data the file they want. IPFS then, after searching carefully through the
off-chain, contributed to its wide adoption by many blockchain nodes, takes care of finding the closest peer who has what they
projects. On top, big corporations, like Cloudflare, leveraged IPFS need and supplies the users with the file. Accessed resources are
to provide their cloud services.12 In the following section, we cached locally in the IPFS node to make those resources available
delve into the basic characteristics of the IPFS protocol. for upload to other nodes and thereby to help with the load
distribution for popular content.
As opposed to traditional location addressing used by the HTTP
3. The IPFS protocol
where a single server hosts many files and information has to
be fetched by accessing this server, the content addressability,
IPFS is a p2p protocol and a network designed to create a i.e. looking up the content by its cryptographic hash, ensures
permanent, decentralized method of efficient and robust data the authenticity of content regardless of where it is located. The
storage and distribution. IPFS seeks to connect all computing implications of this property are tremendous as the IPFS could
devices with the same file system and to create a new way to transform the Internet from being location-based, to be a content-
serve information on the web [15]. Although there have been based distributed file network. First and foremost, IPFS eliminates
many attempts in the past to introduce decentralized file systems, the HTTP problem of broken links as a given address will always
IPFS is the first general file system that achieves low latency point to the same content added to the IPFS network because
and decentralized distribution on a global scale and in the in- even a slight change will result to a different address. As already
frastructure layer. This is because IPFS provides not only the mentioned, another powerful property of IPFS is its censorship
application to distribute files but also the base protocol to accom- resistance since web content does not depend any more on a
plish this. While IPFS synthesizes successful ideas from previous single entity. This censorship-resistant nature has already been
p2p systems, its main contribution lies in simplifying, evolving, exploited in many occasions to bypass web policy restrictions and
and connecting proven techniques into a single cohesive system, to enable the freedom of speech16 and the right to information.17
greater than the sum of its parts [15]. It combines technologies In this regard, it has been argued that the IPFS could evolve the
such as Distributed Hash Tables (DHT) (as implemented in the web and even replace the most successful ‘‘distributed system of
Kademlia protocol) [17] to coordinate and maintain metadata files’’ ever deployed, the HTTP [15].
about p2p systems, and a BitTorrent inspired communication While the IPFS does not provide object-level encryption yet,
protocol, BitSwap, to coordinate networks of untrusting peers personal information can be encrypted before added to the net-
(swarms) to cooperate in distributing pieces of files to each work. Finally, it should be noted that the IPFS does not provide
other [18]. It also uses Self-Certified Filesystems (SFS) techniques access control at the IPFS connection level to restrict untrusted
to authenticate the server and to establish a secure communica- peers from getting unauthorized data. An overview of how the
tion channel to remote filesystems [19]. On top of these, the IPFS IPFS works is illustrated in Fig. 1.
builds a cryptographically authenticated data structure, similar
to Git,13 to support file versioning and efficient distribution: a 4. Deleting content in IPFS
Merkle Directed Acyclic Graph (DAG)14 of immutable objects
(representing files or other arbitrary data structures) with links While the IPFS has been widely advertised as the new ‘‘perma-
to the cryptographic hash of the target object. The central IPFS nent web’’ where stored information remains available regardless
principles of modelling all data as part of the same Merkle DAG of single point of failure attacks or censorship takedowns, the
object and addressing contents via their hashes provide useful term ‘‘permanent’’ should not be misunderstood to be equivalent
properties such as content addressability, tamper resistance and to the permanent storage and availability of the uploaded con-
deduplication [7]. According to these properties: all contents are tent. Instead, it has been clarified that the term is used to refer
always addressed by their cryptographic hashes; they are by to the permanent reference of the content to which an IPFS link
default immutable since editing a file results to a new address points.18 As previously stated, this is due to the content address-
(hash) of that file (due to the collision-resistant property of the ability property, which ensures that all resources are uniquely
hash function); and duplicate files are only stored once since they and permanently addressed by their contents. The permanent
always refer to the same hash. IPFS provides two protocols for availability of resources in the IPFS is most commonly specified as
creating mutable addresses to reference always the latest version storage persistency and is handled by the functionality of pinning,
which excludes an object and its children from being garbage
collected within an IPFS node. The garbage collector frequently
10 To this end, it employs two variations of proof-of-storage consensus mecha-
runs to delete any cached data the IPFS node downloaded when
nism, Proof-of-Replication (PoRep) and Proof-of-Spacetime (PoSt), to publicly verify accessing resources in the network. Whether content added in
that a node stores a particular file.
11 https://www.sitepoint.com/ipfs-swarm-decentralized-content-publication-
storage/ 15 https://ipld.io/
12 https://blog.cloudflare.com/distributed-web-gateway/ 16 http://www.eurekastreet.com.au/article.aspx?aeid=54133
13 http://git-scm.com 17 https://observer.com/2017/05/turkey-wikipedia-ipfs/
14 http://docs.ipfs.io/guides/concepts/merkle-dag/ 18 https://discuss.ipfs.io/t/deleting-content/202
E. Politou, E. Alepis, C. Patsakis et al. / Future Generation Computer Systems 112 (2020) 956–964 959
Fig. 1. An overview of the main IPFS operations. On the left, we depict a typical operation when data are stored in IPFS (or other DFS) and only their hashes and
metadata are stored in the blockchain. On the right, a high-level overview of the IPFS data storage and retrieval is depicted. First, a user stores a file f in IPFS. Next,
to retrieve these data, another user performs a request for file f using its corresponding hash value (bottom right). As the nodes are aware of the location of that
file (i.e. due to the use of the DHT), they are able to efficiently retrieve the data from the nodes storing the queried file (orange nodes) to deliver it to the user
finally. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
the IPFS network is stored persistently or not depends solely on of infringing on human rights such as the right to privacy. As
the users that choose to pin this content so as not to be garbage a result, the IPFS does not support any efficient methods for
collected after a given period of time. Otherwise, unpinned con- completely removing – and thereby stop from being disseminated
tent is automatically garbage collected, provided that the garbage across its entire network – any illegal, personal or copyrighted
collector is enabled to run on a schedule or manually. Obviously, content. Arguably, this lack may have adverse implications for
the more IPFS nodes are pinning a specific file, the easier and IPFS alignment with at least the data protection and privacy laws.
faster another node can get it. Acknowledging this deficiency, IPFS plans to support block-
Yet, even when a file is garbage collected, i.e. deleted, from lists, i.e. lists of illegal content that needs to be blocked from the
a node, it is not certain that it has also been deleted from all IPFS network. These blocklists will specify policies for content
the other nodes that had previously accessed and thereby cached storage and distribution and therefore will allow subnetworks
that file. Moreover, if a node has pinned it, the file remains per- of peers to agree upon sets of content they would wish to cen-
manently available to other peers. Therefore, as long as anybody sor. Yet, blocklists, as they have been so far designed, present
is willing to continue spending energy to maintain an object some limitations. First of all, they cannot be universally applied
online, that content would be permanently stored in IPFS. For since what is illegal in one jurisdiction is not necessarily in
this reason, Filecoin, when implemented, will incentivize people others; e.g. consider political or even religious-related content.
to share their unused storage for hosting/pinning IPFS files. In Furthermore, the maintenance and coordination of such lists by
a nutshell, a file is preserved in the IPFS network if there is at the IPFS gateways would be proved burdensome given the high
least one node that is actively sharing it (i.e. by pinning it), and it demand for links/content to be censored and the continuously
can only be completely removed from the network if its original increasing size of these lists. Besides, these blocklists can be easily
host, and all other hosts serving it, delete it and its cached copies circumvented since by changing just a bit of the unwanted file the
throughout the network expire. corresponding hash does change, but the actual information may
As a side note, IPFS provides the option of building private IPFS not be radically affected. In addition to these, as the subscription
networks using IPFS Clusters to coordinate connection among to these blocklists will be most probably optional, nothing would
peers who share a secret key.19 In these networks, an entity prevent a node from not subscribing. Last but not least, there
controls all peers and hence collective unpinning and thereby must be pedantic processes in place to carefully examine and add
deleting a file from all the peers participating in the private links/content to these blocklists in order not to violate any other
network is possible. Yet, completely removing content previously legal rights, such as the freedom of speech. Most importantly,
uploaded to the IPFS public network cannot be guaranteed. however, blocklists fall short of meeting the data protection re-
quirement of the RtbF anticipated by the GDPR that mandates the
5. The need for complete content removal erasure of individuals personal data published in the IPFS network
should certain conditions apply. In this regard, we examine in the
As discussed, IPFS is a decentralized public network where following paragraphs the RtbF in terms of its applicability on the
nodes do not need to trust each other and with no single point IPFS protocol.
of failure as there is no entity in control of the dissemination
(or the very existence) of the information within the network. 5.1. GDPR RtbF and IPFS
However, the original vision of the IPFS (and of the Internet
in general) did not consider the malevolent uses for promoting The GDPR enforcement in 2018 provoked greater scepticisms
and disseminating illegal or copyrighted content, or even cases due to its severe impact on the processing of personal data
within and outside the EU territory [5]. Of its provisions, the most
19 https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features. radical and controversial one that has been subject to heated
md#private-networks debates is Article 17 that anticipates the RtbF. In simple terms,
960 E. Politou, E. Alepis, C. Patsakis et al. / Future Generation Computer Systems 112 (2020) 956–964
the RtbF allows the possibility of individuals to request the era- erasure requests in the protocol layer would be the most op-
sure of their personal data from all the available sources to timum and application-agnostic way towards aligning the IPFS
which they have been disseminated when certain conditions are with the RtbF. Thereby, the onus of enforcing the actual erasure
met (Article 17(1)). Beyond any doubt, as explained thoroughly is moved to each individual IPFS node which is a data controller
in [5,20], the impact of encompassing the RtbF in contempo- on its own. By all means, the integration of this kind of erasure
rary information systems is immense, whereas its integration delegation mechanism into the IPFS protocol it will endorse its
into the design of new technological advancements is currently GDPR compliance considerably, and as a result, it will add real
disputable. One such advanced technology emerged over the value to its future adoption by any applications that are process-
long period under which the final GDPR text was being de- ing personal data. In this regard, the security and technical details
bated and finalized, is blockchain. Considering that a few years of implementing and integrating such a protocol into the IPFS,
ago blockchain technology was not the widespread technolog- along with the formal definition and the proof of concept results,
ical trend that it is nowadays, its compatibility with some of are presented in the following section.
the GDPR provisions is currently challenged [3]. A major incon-
sistency between blockchain and the RtbF emerges due to the 6. The proposed protocol
blockchain’s by design immutability. While, as shown in [3], sig-
nificant research is being carried out for implementing restricted 6.1. Assumptions and Desiderata
mutability on blockchain environments, allowing data stored in
public blockchains to be edited or deleted is still a controversial The main goal of the proposed protocol is to allow a user to
topic. To overcome this barrier, several blockchain projects are request the erasure of a content that she has already shared and
adopting the IPFS network for decentralized storage and distri- consequently resides in other nodes of the network. To this end,
bution of the actual personal files while keeping only their hash the proposed protocol introduces a ‘‘proof-of-ownership’’ so that
addresses in the blockchain. This solution, however, moves the the content is linked in an anonymous way with a secret that only
burden of removing the actual information to the IPFS protocol a set of designated users have. Therefore, once someone makes
should a request for erasing personal data under the RtbF be an erasure request of the content with the corresponding secret,
raised. each node validates that the secret matches the ownership proof
and propagates the request to the other nodes. The request may
5.2. Towards aligning RtbF with IPFS be repeated periodically to cater for nodes which might be offline
at the time of the initial erasure request.
Aligning the IPFS with the RtbF is not an easy task. As a matter In our protocol we consider that the removal is performed
of fact, any attempts to implement and enforce an erasure request on all nodes that conform to the protocol and store the content
would most probably be futile since IPFS is a trustless network, along with the ‘‘proof-of-ownership’’. The main motivation be-
and as such other nodes can never be trusted to respect a request hind the introduction of ‘‘proof-of-ownership’’ is to maintain the
for content erasure. A node can keep data for as long as it likes, censorship-resistant nature of the IPFS network. To this end, no
and there is nothing to prevent that from happening because one can arbitrarily request content erasure, e.g. for content that
there is not any way to verify that data in the IPFS have been was not submitted by herself. Therefore, while the current IPFS
removed from the entire network. Above all, IPFS – likewise HTTP model supports sharing of content in the form of, e.g. a file c,
– in its core is actually a protocol, a base layer foundation that we augment it, to support the sharing of tuples of the form (c , s)
other systems may use to be built upon. Consequently, impos- where s is an encrypted string which will be used afterwards
ing data manipulation rules on it is technically impossible since to prove the ownership when an erasure request is made for a
application and data specific functionality is implemented in a specific file c. Upon an erasure request, a content-dependent key
higher architectural layer. k is sent to the network along with the hashed version of the file.
Nevertheless, given the extensive use of IPFS to store personal The content-dependent key k derives from a master key that each
data and the strict data protection obligations anticipated by the user owns, preventing thus the hustle of having to remember
GDPR, providing a kind of erasure request on a protocol level different keys. Note that the proposed extension would allow
and across the entire IPFS network is highly recommended, if not users to continue sharing content without proofs of ownerships.
urgent. According to Article 17(2) of the GDPR: The proofs are only appended if the user considers that she might
once want to delete her shared files.
‘‘the controller, taking account of available technology and the
Obviously, extending the IPFS protocol to embrace the proof-
cost of implementation, shall take reasonable steps, including
of-ownership s and to support accordingly the management of
technical measures, to inform controllers which are processing the
personal data that the data subject has requested the erasure by the corresponding content is a major deviation from its current
such controllers of any links to, or copy or replication of, those implementation. Yet, it is a feasible and workable alternative
personal data’’. to allow IPFS to handle erasure requests successfully. In what
follows, we assume that we have honest nodes that comply with
In other words, regardless of the feasibility of enforcing an era- the protocol, and they duly follow its rules.
sure request, there should be at least a method to disseminate
the request to all data controllers holding the personal data under 6.2. Threat model
consideration. Since by definition every IPFS node fully controls
the data that holds, or put differently, an IPFS node determines Since the scope of this work is to allow the content erasure,
on its own the means and purpose for the processing of these the attacks that could be launched will target the erasure of a
data; it essentially acts as a data controller in GDPR terms. Hence, user’s contents without her knowledge or consent. To achieve
according to the GDPR, each IPFS node should at least be able this, an adversary would try to extract the key, or forge the
to ‘‘take reasonable steps, including technical measures, to inform’’ erasure request to achieve the removal of the content from all
other IPFS nodes holding these data about the removal request. IPFS nodes. We assume that attempts to bypass the protocol so
Based on the above analysis, it is evident that – even though that the content is not removed, e.g. not forwarding and not
it is not feasible to enforce data erasure across all the IPFS nodes complying with the erasure request, are beyond the scope of
– implementing a delegation mechanism for securely handling this work as they imply a node that does not comply with the
E. Politou, E. Alepis, C. Patsakis et al. / Future Generation Computer Systems 112 (2020) 956–964 961
protocol. Such nodes are considered to be misbehaving and can nodes to enhance performance and scalability by ‘‘announcing’’
be isolated from the rest of the network. the content that nodes have by referring to the corresponding
In our work, we assume probabilistic polynomial time (PPT) hash h(c).
passive adversaries that are polynomially bounded and do not
have the ability to break the underlying cryptographic primitives Algorithm 1: Handling of an erasure request from a node.
used in the protocol, i.e. reverse hash functions or break any
1: On receiving a content erasure request d = (h(c), k)
secure block cipher. We also assume that an adversary is able to
2: if Content c with h(c) = h is stored in the node then
monitor all the traffic exchanged within the protocol execution.
3: if CheckProof (h, k) == True then
We do not consider active attacks; we assume that the messages
4: Delete c from local storage.
exchanged in a protocol execution are authenticated, and their
5: Forward d to neighbor nodes using DSHT ℓ times every
integrity is protected. Thus an adversary cannot modify or inject
T seconds.
fake messages pretending to originate from another legitimate
6: end if
user.
7: end if
6.3. IPFS delegated erasure protocol
In the erasure request phase, Alice (or one of her delegates) de-
Let h : {0, 1}∗ → {0, 1}k be a secure hash function, and two cides to erase content c from IPFS. Hence, she uses the GenDelReq
keyed permutation Ex : {0, 1}k → {0, 1}n , ey : {0, 1}k → {0, 1}ν and sends her request as a tuple d = (h(c), k) to the network. Any
for keys x ∈ {0, 1}λ and y ∈ {0, 1}n , respectively, for security receiving node may now use CheckProof to first locate the content
parameters k, n, ν, λ ∈ N. using h(c), and then to verify whether the erasure request is valid,
The protocol is a set of five polynomial-time algorithms, i.e. s = ek (h(c)). Finally, to minimize the network overhead, the
namely Keygen, ConKeygen, GenProof , GenDelReq and CheckProof receiving node forwards the erasure request d to all the neigh-
and is composed of three phases: initialization, content dissemi- bours holding the file by using DSHT and the file’s corresponding
nation and erasure request. hash h(c). Note that this action is already implemented in IPFS
with functions like ( ipfs dht findprovs ⟨h(c)⟩). Therefore, the
• Keygen(1λ ) → mk: A probabilistic algorithm for generating corresponding modification would be made by simply changing
a personal master key mk ∈ {0, 1}λ which is kept secret by the ⟨h(c)⟩ value to (⟨h(c)⟩, k).
each user. Fig. 2 illustrates a workflow overview of the proposed protocol
• ConKeygen(mk, c) → k: An algorithm for generating a starting from the creation of mk, up to the point where other
content-based key k for the master key mk of the user that nodes check the validity of the erasure request sent by a delegate
wants to submit her content c ∈ {0, 1}∗ . The generated key k for a given content c. Finally, Algorithm 1 outlines the handling
can be shared with the users that should be granted content process of an erasure request from a node. Note that the process
erasure. In this work, we set ConKeygen(mk, c) = Emk (h(c)). is repeated ℓ times every T seconds to accommodate for nodes
• GenProof (k, c) → s: An algorithm for generating a proof of which might be offline. Both of these values are constants and can
ownership s for a content c using a content-based key k. We be set either by each node individually or by computing specific
instantiate GenProof as: GenProof (k, c) = ek (h(c)). parameters for node participation distribution and the probability
• GenDelReq(k, c) → (h, k): An algorithm for generating a of storing a content.
request for erasure of content c from the user that dis-
seminated it. Takes as input the content c and the content 6.4. Security proof
based key k. It outputs the erasure request which consists
of the hash h of the content to facilitate its discovery and In what follows, we provide formal proofs regarding the prop-
the corresponding key k that proves the ownership of the erties of the proposed protocol.
content. We instantiate this algorithm as: GenDelReq(k, c) =
(h(c), k). Theorem 1. Alice’s personal master key is secure against any PPT
• CheckProof (h, k) → {‘‘success’’, ‘‘fail’’} an algorithm executed adversary if the keyed permutation E is secure.
by the recipient of an erasure request to determine whether
she has pinned locally a content c with hash h. If this is the Proof. For the sake of brevity and convenience, we prove the
case, the recipient checks whether the corresponding proof theorem for the worst-case scenario, which is the case of a
s was generated using key k, which is done by simply verify- malicious delegate. Contrary to any other adversaries, a delegate
ing that for the hosted tuple (c , s) it holds that s = ek (h(c)). for erasing any of Alice’s contents is the only one who has some
The algorithm returns success if the check was successful output directly linked to Alice’s personal master key.
and fail otherwise. Let us assume that a malicious delegate of Alice, Malory (from
now on denoted as M) wants to extract the personal master key
During the initialization phase, each user executes Keygen of Alice X . Therefore, we assume that M has access to a set of
to generate her personal master key mk ∈ {0, 1}λ , which is m > 0 delegated keys KCj X = EX (h(cj )), j ∈ {1, 2, . . . , m} for the
kept secret. Next, the content dissemination phase takes place, corresponding contents cj , j ∈ {1, 2, . . . , m}. To extract X , M must
in which Alice wants to submit her content c ∈ {0, 1}∗ to perform a known-plaintext attack to E. Since E is a secure keyed
the IPFS network. Further to than simply storing and sharing c, permutation this is not possible, so X is secure from M. □
Alice executes ConKeygen and GenProof to create the proof of
ownership for her content. More concretely, ConKeygen is realized Theorem 2. A PPT adversary cannot forge an erasure request for
by computing key k = Emk (h(c)), which is subject to her personal any given tuple (c , s) if the keyed permutation e is secure.
master key and the content she wants to share. To generate her
ownership proof, she uses GenProof to compute s = ek (h(c)). Fi- Proof. Let us assume that a PPT adversary M wants to forge an
nally, she commits the tuple (c , s) that is disseminated to the IPFS erasure request for a given tuple (c , s). M needs to find κ ∈ {0, 1}k
network by using the IPFS distributed sloppy hash table (DSHT) such that eκ (h(c)) = s. Since e is a secure keyed permutation, κ
and BitSwap protocol [15]. In particular, the IPFS DSHT is used cannot be computed in probabilistic polynomial time. Therefore,
to store a key–value set which is spread over the participating an erasure request for any tuple (c , s) cannot be forged by M. □
962 E. Politou, E. Alepis, C. Patsakis et al. / Future Generation Computer Systems 112 (2020) 956–964
Fig. 2. An overview of the proposed protocol. (1) Alice uses her master key mk to create the content-dependent key k for content c. (2) Alice uses k and c to derive
the proof of ownership s. (3) Tuple (c , s) is pushed to the IPFS network. (4) Alice or one of her delegates use the content-dependent key k to issue a request for
erasure for c by computing (h(c), k). (5) The request is disseminated to the IPFS network, and each node checks the validity of the request.
Proof. The proof of the theorem above follows from the anony-
mity that IPFS provides and the proof of Theorem 3. □ In our simulation, we designed a node that disseminates the
erasure request to different numbers of participants in cases
6.5. Protocol efficiency
between 20 and 100 nodes. The size of the packet travelling
through the network is typically 64 bytes. The X -axis represents
The proposed protocol has a minimal footprint as the addi-
tional overhead for the proofs-of-ownership in terms of storage the average time required in each case (i.e., the time required to
space is equal to the length of the encryption of a hash. In terms disseminate the request) with no deviation of the time. The Y -axis
of creating a proof, one has to perform two encryptions with a represents the number of participating nodes, which had been
symmetric cipher, and two hashes. Similarly, for the validation increased statically to study the time impact when the network
phase, which checks whether a given content exists and whether expands. In terms of scalability, the time increases as more nodes
an erasure request is valid, one has to perform one hash and one are added to the network; hence a linear relationship exists with
decryption with a symmetric cipher. a positive line slope. Extending our previous experiment, we
To validate the efficacy of our proposal, we implemented the studied the impact of the packet size on the IPFS network time
proposed protocol proofs in Python 2.7. We used AES in CBC mode latency to analyse the required time for handling the erasure
with 256-bit keys and SHA-256 hash function. Without using par- request. As shown in Fig. 4, we validated our network for cases
allelization, we generated 1000 random files of 1 MB on a system where the packet size (i.e. user key and the hash of the file)
running with an Intel Core i7-6700K CPU at 4.00 GHz and 16 GB of changes, ensuring the scalability of the method and its resilience
RAM on Ubuntu 19.04. We then created a master key for the user, to future variations of the hash size.
and for each file, we computed the corresponding keys, the proof-
of-ownership and the verification time. The average time for 6.6. Limitations and countermeasures
generating the content-dependent keys, the proofs-of-ownership,
and the verification of the proof are 2.011 ms, 2.023 ms, and As discussed above, the integration of the proposed erasure
1.99 ms, respectively, which can be considered minimal. In regard mechanism into the IPFS protocol provides several benefits. Nev-
to the rest of functionalities, we consider only existing functions ertheless, some limitations should also be pointed out. A first
of the IPFS (see Section 6.3), so that minimal modifications are limitation arises due to the duplication attack according to which
required. a user could claim the ownership of a content that was previously
In Fig. 3, we studied the performance of the proposed protocol deleted by committing it again with her own key. A simple way
using the NS-3 network simulator20 to simulate a p2p communi- to prevent this is to store in a parallel structure (e.g. a Merkle
cation that spreads data packets among the network nodes [21]. DAG) the erasure requests made by the users. For this purpose,
the protocol could be further expanded to check if a file that is
20 http://www.nsnam.org/ going to be added in the IPFS already existed in the system. On
E. Politou, E. Alepis, C. Patsakis et al. / Future Generation Computer Systems 112 (2020) 956–964 963
Fig. 4. Simulation of disseminating large-scale nodes communication with Declaration of competing interest
different packet sizes in the IPFS network to validate a request.
The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared
top of this, the IPFS protocol could be updated to prevent users to influence the work reported in this paper.
from adding an existing file. This can be achieved either by forcing
the execution of the following commands: Acknowledgements
i p f s r e f s l o c a l | grep <hash>
This work was supported by the European Commission under
and the Horizon 2020 Programme (H2020), as part of the projects Cy-
berSec4Europe (https://www.cybersec4europe.eu) (Grant Agree-
i p f s dht f i n d p r o v s <key> ment no. 830929) and LOCARD (https://locard.eu) (Grant Agree-
or by directly checking if the file exists through one of the ment no. 832735).
gateways. Clearly, this checking procedure has an implicit over- The content of this article does not reflect the official opinion
head of several milliseconds [7,22] and therefore, the trade-off of the European Union. Responsibility for the information and
between performance and security must be carefully examined views expressed therein lies entirely with the authors.
and discussed. Besides, to enhance further the scalability and
performance, the erasure requests could have some specific time References
to live (TTL). Nonetheless, it is worth noticing that such kind
[1] P. Bardhan, Decentralization of governance and development, J. Econ.
of attacks may arise in any kind of platform or database and Perspect. 16 (4) (2002) 185–205.
therefore establishing a mitigation strategy against these is very [2] P. Seabright, Accountability and decentralisation in government: An
complex. However, with the proposed approach, we partially incomplete contracts model, Eur. Econ. Rev. 40 (1) (1996) 61–89.
avoid such kind of malicious behaviour. [3] E. Politou, F. Casino, E. Alepis, C. Patsakis, Blockchain mutability: Challenges
and proposed solutions, IEEE Trans. Emerg. Top. Comput. (2019) 1, http:
A second issue closely related to the previous one is the case of //dx.doi.org/10.1109/TETC.2019.2949510.
a malicious user that uploads content from competitors, or legally [4] General Data Protection Regulation (GDPR), Regulation (EU) 2016/679 of
owned by other authors, or any other kind of copyrighted files, the European Parliament and of the Council of 27 April 2016 on the
to claim their ownership dishonestly. Likewise to the duplication protection of natural persons with regard to the processing of personal
data and on the free movement of such data, and repealing Directive
attack, this situation is not specific to our proposed protocol nor
95/46/EC (General Data Protection Regulation), Offic. J. Eur. Union 119 (4
to the IPFS as it may occur in all available storage systems. A May 2016) (2016) 1–88, L.
possible solution to this issue would be the implementation of [5] E. Politou, E. Alepis, C. Patsakis, Forgetting personal data and revoking
an ownership claim protocol which involves an identity manage- consent under the GDPR: Challenges and proposed solutions, J. Cybersecur.
ment system supported by a consensus mechanism. Yet, while 4 (1) (2018).
[6] F. Casino, E. Politou, E. Alepis, C. Patsakis, Immutability and decentralized
this implementation could indeed provide a nice-to-have feature, storage: An analysis of emerging threats, IEEE Access 8 (2020) 4737–4744.
we argue that it is beyond the essence of the IPFS and as such, it [7] C. Patsakis, F. Casino, Hydras and IPFS: a decentralised playground for
is out of the scope of this work. malware, Int. J. Inf. Secur. (2019).
[8] D.S. Milojicic, et al., Peer-to-Peer Computing, Technical Report HPL-2002-
57, HP Labs, 2002.
7. Conclusions [9] P.G. Lopez, A. Montresor, A. Datta, Please, do not decentralize the Inter-
net with (permissionless) blockchains!, in: 2019 IEEE 39th International
As systems for decentralized file storage and sharing are in- Conference on Distributed Computing Systems, ICDCS, IEEE, 2019, pp.
creasingly adopted to store personal data, their immutability 1901–1911.
[10] S. Nakamoto, Bitcoin: A peer-to-peer electronic cash system, 2008.
and data persistence properties create great uncertainty as far
[11] F. Casino, T.K. Dasaklis, C. Patsakis, A systematic literature review of
as their alignment with the privacy and data protection rights blockchain-based applications: current status, classification and open
is concerned. One particular privacy and data protection right issues, Telemat. Inform. (2018).
enshrined in the GDPR is considered to be the holy grail for many [12] M. Scherer, Performance and Scalability of Blockchain Networks and Smart
state-of-the-art decentralized technologies: the RtbF that foresees Contracts, Umea University, Sweden, 2017.
[13] S. Wilkinson, T. Boshevski, J. Brandoff, V. Buterin, Storj a peer-to-peer cloud
the erasure of personal data under certain conditions. Beyond storage network, 2014.
any doubt, even though the GDPR has not taken into account [14] D. Vorick, L. Champine, Sia: Simple decentralized storage, 2014, White
emerging technologies such as the blockchain, let alone the IPFS, paper available at https://sia.tech/sia.pdf.
964 E. Politou, E. Alepis, C. Patsakis et al. / Future Generation Computer Systems 112 (2020) 956–964
[15] J. Benet, IPFS-content addressed, versioned, P2P file system, 2014, arXiv Constantinos Patsakis holds a B.Sc. in Mathematics
preprint arXiv:1407.3561. from the University of Athens, Greece and a M.Sc. in
[16] M. Ogden, Dat-distributed dataset synchronization and versioning, 2017, Information Security from Royal Holloway, University
OSF Preprints. of London. He obtained his Ph.D. in Cryptography and
[17] P. Maymounkov, D. Mazieres, Kademlia: A peer-to-peer information system Malware from the Department of Informatics of Uni-
based on the xor metric, in: International Workshop on Peer-to-Peer versity of Piraeus. His main areas of research include
Systems, Springer, 2002, pp. 53–65. cryptography, malware analysis, security, privacy, and
[18] B. Cohen, Incentives build robustness in BitTorrent, in: Workshop on data anonymization. He has participated in several
Economics of Peer-to-Peer Systems, Vol. 6, 2003, pp. 68–72. national (Greek, Spanish, Catalan and Irish) and Euro-
[19] D.D.F. Mazières, Self-Certifying File System (Ph.D. thesis), Massachusetts pean R&D projects and coordinates the H2020 project
Institute of Technology, 2000. LOCARD. Additionally, he has worked as researcher at
[20] E. Politou, A. Michota, E. Alepis, M. Pocs, C. Patsakis, Backups and the right the UNESCO Chair in Data Privacy and as a research fellow at Trinity College,
to be forgotten in the GDPR: An uneasy relationship, Comput. Law Secur. Dublin Ireland. Currently, he is Assistant Professor at University of Piraeus and
Rev. 34 (6) (2018) 1247–1257. adjunct researcher of Athena Research and Innovation Center.
[21] L. Campanile, M. Gribaudo, M. Iacono, F. Marulli, M. Mastroianni, Computer
network simulation with ns-3: A systematic literature review, Electronics
Fran Casino is a postdoctoral researcher in the De-
9 (2) (2020) 272.
partment of Informatics at Piraeus University (Piraeus,
[22] B. Confais, A. Lebre, B. Parrein, Performance analysis of object store sys-
Greece). He obtained his B.Sc. degree in Computer
tems in a fog/edge computing infrastructures, in: 2016 IEEE International
Science in 2010 and his M.Sc. degree in Computer
Conference on Cloud Computing Technology and Science, CloudCom, 2016,
Security and Intelligent Systems in 2013, both from
pp. 294–301, http://dx.doi.org/10.1109/CloudCom.2016.0055.
Rovira i Virgili University in Tarragona, Catalonia, Spain.
He received a Ph.D. in Computer Science from the
Rovira i Virgili University in 2017 with honours (A
cum laude) as well as the best dissertation award.
Eugenia Politou received her Diploma (BSE) and her He was visiting researcher in ISCTE-IUL (Lisbon-2016).
M.Sc. degrees in electrical and computer engineering He has participated in several European-, Spanish- and
both from the Democritus University of Thrace, Xanthi, Catalan-funded research projects. His research focuses on pattern recognition,
Greece. She is currently pursuing her Ph.D. in Infor- and data management applied to different fields such as privacy and security
matics at the University of Piraeus, Greece. Her current protection, recommender systems, smart health and blockchain.
research interests include privacy and data protection
in decentralized networks and other state-of-the-art
Mamoun Alazab is an Associate Professor at the Col-
technologies such as mobile computing. She has a long
lege of Engineering, IT and Environment at Charles
experience in research, security, analysis, and system
Darwin University, Australia. He received his Ph.D.
design under various national and European large-scale
degree in Computer Science from the Federation Uni-
IT projects within the private and public sector. She
versity of Australia, School of Science, Information
currently works as an Information Security Officer at the Independent Authority
Technology and Engineering. He is a cyber security
for Public Revenue, Greece.
researcher and practitioner with industry and academic
experience. Dr Alazab’s research is multidisciplinary
that focuses on cyber security and digital forensics
Dr. Euthimios Alepis received his B.Sc. in Infor- of computer systems including current and emerging
matics in 2002 and his Ph.D. in 2009, both from issues in the cyber environment like cyber–physical
the Department of Informatics, University of Piraeus systems and internet of things, by taking into consideration the unique chal-
(Greece). He is Assistant Professor in the Department lenges present in these environments, with a focus on cybercrime detection
of Informatics, University of Piraeus since December and prevention. Alazab’s look into the intersection use of Artificial Intelligence
2013. He has authored/co-authored more than 60 and Machine Learning as essential tools for cybersecurity, for example, detecting
scientific papers which have been published in in- attacks, analysing malicious code or uncovering vulnerabilities in software and
ternational journals, book chapters and international hardware.
conferences. His current research interests are in the He has more than 150 research papers, two of his papers were selected as
areas of Object-oriented Programming, Mobile Software the featured articles, and two other papers received the best paper award. He is
Engineering, Human–Computer Interaction, Affective the recipient of short fellowship from Japan Society for the Promotion of Science
Computing, User Modelling and Educational Software. (JSPS) based on his nomination from the Australian Academy of Science. Also,
two teaching and learning awards.