Daruma: Regaining Trust in Cloud Storage: Doron Shapiro Michelle Socher Ray Lei Sudarshan Muralidhar
Daruma: Regaining Trust in Cloud Storage: Doron Shapiro Michelle Socher Ray Lei Sudarshan Muralidhar
Spring 2016
1
University of Pennsylvania SEAS Senior Design
Spring 2016
scale in a dragnet-style surveillance effort or in a targeted manner end, we decided to implement a fault recovery algorithm that would
for an individual investigation. go beyond the basics of merely detecting errors and recovering files
to also securely fix errors as they came up. To help users understand
With this in mind, our primary goal was to protect the confidential-
the overall system health as we did this, we decided to implement a
ity, integrity, and accessibility of user file contents on computers
scoring system that would educate users about the behavior of their
other than the users’ own. As an additional goal, we wanted to be
providers and advise them on how each one was affecting the sys-
able to apply these guarantees to file metadata where possible.
tem. Finally, we decided to make all of our operations atomic with
To do this, we set the following security guidelines:
automatic rollback to reduce the number of states our system could
(1) All network-bound traffic must be encrypted with a key be in.
providers do not know (i.e. SSL is insufficient for this pur- Performance To maintain ease of use, we wanted our users of our
pose). system to experience the same upload and download speeds that
(2) All encryption must be authenticated to prevent tampering. they maight see on their existing providers. Similarly, since we ex-
(3) It should be difficult for any one non-user actor to hold all the pected that many users would have free-tier provider accounts (with
information necessary to reconstruct a key. relatively low storage capacity), space efficiency was a high prior-
(4) It should be difficult for any one non-user actor to, by corrupt- ity. We captured both of these constraints setting as a goal a network
ing or removing access to data, remove access to a user file. cost that increased roughly linearly with plaintext size.
2
University of Pennsylvania SEAS Senior Design
Spring 2016
(2) The filesystem should hide file metadata (e.g. directory struc- encryption suites such as block ciphers (e.g. AES-CBC) where de-
ture and node names). cryption of a single block of ciphertext results in a single block
(3) The filesystem metadata should have a minimal storage over- of plaintext. This scheme ensures that no information is revealed
head. unless all blocks can be decrypted, protecting against certain brute
force attacks. [9]
We decided on the following structure:
(1) All filesytem metadata would be stored in a single man- Rabins Information Dispersal Algorithm (IDA) This dispersal al-
ifest file. This file would map user file paths (e.g. gorithm guarantees that any m of m shares can be used to recon-
”/Users/alice/documents/foo.txt”) to an internal codename struct the secret. It runs in O(m2 ) time and the total space re-
n
(e.g. ”5DB62955FBCD4228968A046A873A9236”) as well as quired for dispersing data of size F is m ∗ F . If the parameters
other metadata, such as encryption keys. are set so that n = m then the dispersal incurs no additional stor-
1
(2) User files would be stored in a flat structure in a single direc- age cost, consistent with AONT. Since each share will be m of its
tory. Each user file would be stored under its code name rather original size, this dispersal achieves maximum space efficiency for
than its file path, thus hiding file paths from providers. its threshold. However, this also implies that the scheme does not
(3) To achieve atomicity, we would treat user files as being im- make strong confidentiality guarantees - that is, some smaller than
mutable and change the codename references in the manifest m subset of shares may reveal information about the secret. [6]
after we had confirmed that any given operation had succeeded.
For more details on this implementation, see appendix B.1. Shamir Secret Sharing Shamir Secret Sharing is very similar to
Rabin’s IDA but with some key differences. In particular, it is a
2.5 Authenticated Encryption k-of-n threshold scheme such that k share are required for recon-
struction and any subset of fewer shares reveals no information. In
We chose Dan Bernstein’s NaCl package for authenticated encryp- order to make this guarantee, it is necessary that each share be the
tion [3]. We selected this package for several reasons: same size as the original secret. Therefore, for data of size F, the
Authentication This package offers authenticated encryption, total stored data for this scheme would be n ∗ F . [10]
which was pivotal for our ability to detect and punish provider mis-
behavior. This guarantees the integrity of our system because suc- Reed-Solomon Encoding This scheme makes the same guarantees
cessful decryption of authenticated ciphertext ensures that the data as Rabin’s IDA with the same space performance. Technically,
has not been modified. Reed-Solomon encoding takes a message of length m and extends
Vetted Cryptography Dan Bernstein’s work is highly respected it to be n symbols such than any m symbols can be used for full re-
within the cryptography community and this package has been vet- construction while the IDA actually generates shares. However, the
ted by those in that community. IDA is discussed primarily in academia while there are a plethora
No User Modifications The goal of this package was to provide us- of Reed-Solomon implementations (which follow the technical be-
able cryptography where all details and decisions are removed from havior of the IDA instead in terms of share generation). [5], [8]
users of the library. It is generally considered poor practice to roll
one’s own cryptography or even attempt to navigate the various set- Secret Sharing Made Short This scheme combines the confiden-
tings and parameters without strong expertise. tiality properties of Shamir Secret Sharing with the space efficiency
of Rabin’s IDA. Specifically, a random key is generated and used to
Using this package, we generate a random key for each file. The encrypt the message that is meant to be kept confidential. The en-
mapping from files to keys is then stored in the manifest. It was crypted data is then distributed with Rabin’s IDA and the key itself
important that each file get its own key (rather than a single master is protected and distributed with Shamir Secret Sharing. [4]
key being used to encrypt all files). Otherwise, our scheme would
reduce to e.g. AES-ECB, an AES mode that is known to reveal
We were excited to find in our research that the Secret Sharing
metadata about the data it encrypts because all identical blocks en-
Made Short Scheme closely mirrored parts of the solution scheme
crypt to the same ciphertext.
we had independently been discussing. We chose this scheme be-
cause it matched the confidentiality guarantees that we needed
2.6 Erasure Encoding while still providing optimal space efficiency for user files.
One of our main goals was to efficiently combine the space avail- We had to make a few modifications on top of this scheme
able to users through a single and secure interface. We therefore in order to match our use case. In particular, we implemented a
needed to distribute user files in such a way that we could maintain stronger variant of Shamir Secret Sharing (see 2.7) and we used an
confidentiality, reliability, and availability. This corresponds to our existing Reed Solomon Encoding library. Additionally, rather than
core promise to users: no provider can read, delete, or modify their generating a single key in order to encrypt a single file, we had to
files. generate a key for each file and store the mapping in a manifest (see
We therefore needed an overall distribution scheme that 2.5). The manifest was then encrypted with a randomly generated
would guarantee these properties while being both time and space master key. We then used Reed Solomon encoding to distribute the
efficient. In our own research, we found the following primitives encrypted files as well as the encrypted manifest and distribute the
and schemes available for such use: master key with Shamir Secret Sharing.
All-or-Nothing Transform (AONT) This is an s-of-s threshold 2.6.1 Dependency Sandboxing. We used the PyECLib library
scheme; all shares are required for the reconstruction of the orig- [2], a wrapper around the liberasurecode library [1], to provide
inal secret. As a result, the algorithm runs quickly, in O(log(s)) our erasure encoding capabilities. During the development of our
time for a message of length s, and no additional storage costs are project, we frequently found that by corrupting the shares provided
incurred. This is often used as a preprocessing step for separable to the library, we could induce a segfault in it and bring down our
3
University of Pennsylvania SEAS Senior Design
Spring 2016
entire program alongside it. To mitigate this, we reported the is- terface, so that the internal system can treat them all same way.
sues upstream to the PyECLib and liberasurecode maintainers, who All providers support GET, PUT, DELETE, and WIPE operations.
helpfully developed fixes to the problems we identified. However, These functions each wrap calls to the provider API.
since the erasure decoding stage of our pipeline necessarily took OAuth tokens and credentials for providers are cached in a
raw data from providers as input, we were wary of undiscovered user credentials JSON file on the user’s disk. This allows Daruma
bugs in this library continuing to crash our program. So, we devel- to automatically load cached providers on load, simplifying the user
oped a sandbox for this and other library dependencies that would experience.
run their stages of our data pipeline in separate processes and report
their results back to the main process via inter-process communi-
cation channels. With this development, any third-party library we
2.9 Resilience
depended on could crash (or be otherwise compromised) without
harming our main application logic - any errors would be passed to A major concern when implementing Daruma was ensuring that no
the resilience subsection (see 2.9). coalition of providers, through action or inaction, could corrupt the
state of the filesystem. This included cases where certain providers
2.7 Robust Secret Sharing strategically go offline during crucial uploads - for instance, if only
some providers are online during an upload, it stands to reason that
In order to encrypt our manifest without forcing users to remem-
different parts of the system could be out of sync. For this reason, it
ber and protect a new credential, we needed to generate a random
was necessary to make all write operations atomic. Similarly, repro-
master key that could be stored across the providers with confiden-
visioning (redistributing files and shares upon changing a threshold
tiality, integrity, and reliability.
or adding/removing a provider) was written such the system did not
For confidentiality and reliability, we needed a threshold
become corrupted if a provider failed during the operation.
scheme such that for shares distributed across n providers, re-
In order to ensure that our guarantees were maintained as var-
construction of the secret would be possible with any k of these
ious cloud providers failed and came back online, it was necessary
providers but no information about the secret would be revealed to
to quickly detect and repair errors on providers as soon as possible.
any subset of fewer than k providers. These requirements were sat-
Daruma needed to account for the fact that a provider could cor-
isfied by Shamir Secret Sharing (see 2.6, [10]). However, Shamir
rupt files at one time frame and thereafter behave correctly as some
Secret Sharing does not provide the integrity guarantees that we
other provider failed. Both our major recovery/distribution proto-
needed. In particular, Shamir Secret Sharing is tolerant to some
cols - Robust Secret Sharing and Reed-Solomon, for master keys
subset of missing shares, but it is vulnerable to corrupt shares. We
and files, respectively - were written to provide, upon successful
therefore needed to apply a verification wrapper around this scheme
recovery, information about which recovered shares were invalid.
that would guarantee the integrity of the reconstructed secret.
The providers with invalid shares (and providers who did not return
Such schemes require metadata to be transmitted as part of
a share at all) were tagged as failing, and upon recovery, repaired.
the shares. Much of the academic work within this space has been
The repair itself consisted of re-encrypting (with a new key) and re-
targeted at minimizing the size of the resultant shares as the num-
distributing a file (in the case of an invalid Reed-Solomon share) or
ber of players increased. However, since we only needed to apply
creating and sharing a new master key, and then redistributing the
this scheme to a single encryption key and our target use case would
manifest (in the case of an invalid Robust-Secret-Sharing share. It
have no more than 10 providers, we favored simplicity over asymp-
is important to note that, in order to ensure that information was not
totic efficiency. We therefore chose the “Verifiable Secret Sharing”
leaked, new keys had to be used on all repairs; if not, a malicious
scheme presented by Rabin and Ben-Or [7].
provider could pretend to lose shares in order to collect multiple
For additional information on the inner workings of Shamir
shares of the same plaintext.
Secret Sharing and the Robust Secret Sharing wrapper, see ap-
In the case of a permanently failing provider, it was important
pendix A
that we have a limit to retries, after which we would decide not to
Open-source implementations of Shamir Secret Sharing are
continue repair attempts. It was also crucial that we report to users
available and we had originally planned to make use of an exist-
when providers were failing badly, so that such a provider could
ing library. We selected a library based on its apparent code qual-
be removed. However, we needed a way to differentiate between
ity and hygiene, but when we implemented our own unit tests we
providers who were experiencing temporary difficulties (and failed
found a security flaw within that library. After communicating with
several times in a short time span, but resumed normal service af-
the team behind that library, we decided that our best option would
terwards) and providers who exhibited patterns of failure over time.
be to implement Shamir Secret Sharing ourselves. Furthermore, we
To do this, we used an exponential smoothing formula of the form
were unable to find any Robust Secret Sharing implementations as
s = αx + (1 − α)s for a provider score. Here, s is the provider’s
this space has been largely academic. As best as we know, ours is
score, and x is a data point - 1 if a provider responded to the most
the first Robust Secret Sharing implementation in the wild.
recent request without errors, 0 otherwise. This score represents
2.8 Providers the amount of time a provider is responding without flaws, and ac-
counts for both past and current behavior, weighting the latter more
Daruma currently supports four popular customer facing cloud stor- heavily. Below a certain threshold, a provider is considered ”red” -
age companies: Dropbox, Google Drive, OneDrive and Box. retries are no longer attempted, and the user is advised to remove
We created two flows for creating providers. The first supports the provider. Below a higher threshold, a provider is considered
providers that implement the OAuth flow, and redirects the user to ”yellow” - the provider has been experiencing failures, but seems
a link where they can log in on the providers’ website. The sec- to be mostly okay. Tweaking the thresholds and α enable us to ac-
ond supports providers that take a single parameter (for instance, a count for a wide variety of provider behaviors - either penalizing
provider residing on a local disk might require only the providers’ harshly or being more tolerant, as necessary.
path on disk for construction). For more information on the resilience protocols and how ex-
After providers are created, all providers share a common in- ponential smoothing parameters were chosen, see B.
4
University of Pennsylvania SEAS Senior Design
Spring 2016
3. RESULTS AND MEASUREMENT Uniform This assumes a uniform capacity distribution across
providers. In order to match the 32 GB total from the Free Tier
3.1 User Interface category, we assumed here that each of the four providers would
Our goal was to provide easy integration with a user’s existing have 8 GB.
filesystem. Below is a comparison of our interface with the tra-
ditional Dropbox interface:
5
University of Pennsylvania SEAS Senior Design
Spring 2016
5. DISCUSSION
Our current product is capable of replacing an application like the
Dropbox client on a user’s computer for most non-social tasks on a
day-to-day basis (e.g. barring collaboration and link sharing). Users
can currently log in with their credentials for Dropbox, Google
Drive, OneDrive, or Box in addition to using standard filesystem
paths as local providers (e.g. to use a mounted local backup drive).
In an upload operation, Daruma first uploads the file, and then Once logged in, the application will watch a Daruma folder in the
uploads an updated manifest. For small files, the extra operation user’s home directory for changes and synchronize the files it stores
causes Daruma to be slower than other providers, but only by a online with the Daruma folder state. When providers go down or
few seconds (about 2 seconds slower for 100kb files). As files grow otherwise remove access to or corrupt files, we properly recover
larger, the file size advantage described previously becomes more the system if at all possible.
influential, and Daruma again becomes significantly faster than the There are still some inherent weaknesses in the system as well
slowest provider. as areas that we see opportunities for improvement in. We break
these areas into the following categories:
4. ETHICAL AND PRIVACY CONSIDERATIONS Threat Model Weaknesses Our system has a threat model that is in-
tentionally limited to only consider parties outside the user’s com-
4.1 Social Context puter as potential attackers. While this covered both providers and
Online security and privacy have gained increased public scrutiny ourselves, we do not take significant steps to protect sensitive ma-
recently due in part to stories of large corporations being hacked terial on the computer system we are running on. If we were to
and revelations of mass surveillance by governments around the expand our threat model, additional thought might be put into how
world. While this is a rapidly evolving situation, several points re- to sandbox our application from local threats.
main clear. First, regardless of the risks, users and businesses are Usability Usability was a high priority for us as we developed, so
willingly trusting more and more of their data to the cloud. Sec- significant effort was put into making interactions feel familiar for
ondly, the threat model a security-conscious company must main- a non-technical user. However, there are other friction points that
tain needs to include itself as an adversary, either due to the possi- might be considered, such as our requirement that users have many
bility of a rogue employee or because it may become the target of existing cloud provider accounts. To mitigate these issues, future
hackers or governments. work could include making it easier to sign up for new provider ac-
This latter point was recently highlighted in the high-profile counts as well as more informative communication regarding sys-
legal battle between Apple, Inc. and the Federal Bureau of Inves- tem statistics such as capacity utilization.
tigation. During the case, it was revealed that Apple had imple- Sharding When sharing files, Daruma currently builds shares for
mented two versions of PIN protection in different generations of the entire file at once. For large files, this results in a significant
phones it manufactures and only in the earlier generation could it memory cost. This can be avoided by cutting files into many small
provide a tool to bypass the protection. While the case sparked a pieces (shards), and sharing these sharesd individually. The shards
wide debate over whether the FBI should compel Apple to produce would then be reconstructed on download. If parallelized, a shard-
such a tool, the conclusion for Apple was clear: in the earlier gener- ing operation would also significantly improve speed, as it would
ation, its position of trust made it an adversary to complete security. allow Daruma to upload multiple parts of a file at once.
6
University of Pennsylvania SEAS Senior Design
Spring 2016
Sharing While our system rivals Dropbox and other cloud [10] Adi Shamir. How to share a secret. In Communications of the
providers in usability, Daruma lacks certain features that have be- ACM, number 22-11, pages 612–613, November 1979.
come fundamental for other cloud providers. For instance, files [11] Zooko Wilcox-O’Hearn. Tahoe-LAFS. https:
sharing (allowing other users to view your files) is a feature com- //www.tahoe-lafs.org.
monly used on Dropbox, but unavailable in Daruma. Such a feature
could be implemented if, on sharing a file with a secondary user,
some public key and manifest information was passed to the sec-
ondary user. This feature would further help users transition from
individual providers to Daruma.
Cross-Computer Usage Currently, our usage model assumes that
users will not have concurrent Daruma sessions on two different
computers. If this assumption was broken, we would have to im-
plement several safeguards to ensure the systems do not go out of
sync or enter a corrupted state if providers maliciously fail. While
hard, these problems are not intractable, and would make Daruma
more usable for everyday users.
Cross-Platform Interfaces Currently, Daruma’s user interface only
works on OSX. Making Daruma’s GUI usable on all platform is a
major future goal.
6. ACKNOWLEDGMENTS
We would like to thank our advisors, Nadia Heninger and Boon
Thau Loo, as well as CIS senior design instructors Ani Nenkova
and Jonathan Smith, for the invaluable technical advice and logis-
tical direction they gave us throughout this process. We would also
like to thank Brett Hemenway for his inspiration during our brain-
storming process and for his in-depth technical guidance as we im-
plemented the project. Finally, we extend our gratitude to Thuy Le
for designing our logo and Melanie Wolff for helping us prepare
our demonstrations.
7. REFERENCES
[1] liberasurecode. https://bitbucket.org/tsg-/
liberasurecode/.
[2] PyECLib. https://bitbucket.org/kmgreen2/pyeclib.
[3] Dan Bernstein. NaCl: Networking and Cryptography library.
https://nacl.cr.yp.to/.
[4] Hugo Krawczyk. Secret sharing made short. In CRYPTO
’93 Proceedings of the 13th Annual International Cryptology
Conference on Advances in Cryptology, number ISBN:3-540-
57766-1, pages 136–146, 1993.
[5] R.J. McEliece and D.V. Sarwate. On sharing secrets and reed
solomon codes. In Communications of ACM, number 24-9,
pages 583–584, September 1981.
[6] Michael O. Rabin. Efficient dispersal of information for se-
curity, load balancing, and fault tolerance. In Journal of the
ACM, number 36-2, pages 335–348, April 1989.
[7] Tal Rabin and Michael Ben-Or. Verifiable secret sharing and
multiparty protocols with honest majority. In Proceedings of
the 21st Annual ACM Symposium on Theory of Computing,
number ACM 0-89791-307-8/89/0005/0073, pages 73 – 85,
May 1989.
[8] I. S. REED and G. SOLOMON. Polynomial codes over cer-
tain finite fields. In Journal of the Society for Industrial
and Applied Mathematics, number 8-2, pages 300–304, June
1960.
[9] Ronald L. Rivest. All-or-nothing encryption and the pack-
age transform. In Lecture Notes in Computer Science, number
1267, pages 210–218, May 2006.
7
University of Pennsylvania SEAS Senior Design
Spring 2016
APPENDIX will correctly recover the secret originally shared [7]. However, we
do not ever want the providers learning the original secret so rather
than having players that broadcast their data to each other upon re-
A. IMPLEMENTED CRYPTOGRAPHY construction, we request information back from each provider. We
then take a vote on the secrets constructed from the view of each
For our project, we implemented two core cryptography schemes.
provider, and the correctness of this scheme reduces to the guaran-
tees made in that paper.
A.1 Shamir Secret Sharing
This is a k-of-n scheme such that any k shares can be used to re- B. RESILIENCE
construct the secret but that any subset of fewer shares reveals no
information. This is fundamentally achieved with polynomial eval- B.1 Atomic Algorithms
uation and interpolation [10]. In order to guarantee that sudden provider failures would not put
Sharing begins with the generation of a polynomial P with k the system in an unstable state, all algorithms needed to be atomic
- 2 random coefficients and the y-intercept set as the secret to be (all-or-nothing). For put and get operations, this was achieved by
shared. The polynomial will therefore be of degree k - 1 and n x- using a manifest update as a ”commit” operation.
values will be selected for evaluation. The resultant (x, P(x)) points
are then distributed as the shares associated with the secret [10].
For reconstruction, a minimum of k of these points Algorithm 1 Put file
can be used to reconstruct the polynomial P. By taking P(0) (1) Share file under new random name
it is straightforward from there to recover the secret [10]. (2) Update manifest to point file to the random name
We can demonstrate how this (3) If file existed previously, delete files with the old random name
scheme works with motivating exam- Fig. 3. n=3, k=2
ple. We consider a case with 3 play-
ers where any 2 of them can be used
for reconstruction. Our polynomial
with the secret as the y-intercept will Algorithm 2 Delete file
therefore be of degree 1 (i.e., a line). (1) Update manifest to remove file
We see from this diagram that with (2) Delete files with the random name previously pointed to by file
any two shares (points) we can re-
construct the line and therefore the
secret. However, an infinite number
of lines could pass through a single In each of these operations, a failure before or after the manifest
point so with one share no information about the secret is revealed. update results in a consistent state across all providers (with per-
While this scheme protects well against missing shares, it haps some garbage files that need to be deleted). While manifest
is vulnerable to corrupt shares. In that case, polynomial interpo- failures do have the potential to bring the system out-of-sync, our
lation will construct from the points a polynomial different from threat model assumes that a maximum of threshold providers fail
the one originally generated, preventing the recovery of the secret. at a time. If this is the case, then the manifest operation is com-
This problem is addressed by Robust Secret Sharing. mitted, and the manifest will be repaired on later operations as the
failing providers become operational. This random-name scheme
makes any operation reversible because there is a clear ”commit”
A.2 Robust Secret Sharing
step, without which all providers are left unfinalized.
We used Robust Secret Sharing so that we could tolerate and iden-
tify up to k - 1 corrupt shares. The algorithm proceeds as follows - B.2 Reprovisioning
Shares are generated as before through Shamir Secret Shar-
ing. Each of these shares is then used as the message for n generated As a result of our assumption that providers can go offline or out
check vectors. A check vector is therefore generated for each pair of business at any time, we needed to make it possible for users
of to remove providers from the system and replace them with new
, and this metadata is sent to the providers along with the generated ones. However, it was crucial that we maintain all guarantees after
shares. such a replacement. To do this, all files and the master key needed
When the providers return their shares and metadata, we use to be reshared across providers with the new threshold parameters.
that metadata to create lists of verified shares from the perspective Atomicity was achieved by using a manifest-commit operation sim-
of each provider. This allows us to use Shamir Secret Sharing for ilar to Put and Wipe. As before, failures in steps (1) and (2) result
each such list to reconstruct what each provider would think the se-
cret would be if they had access to the shares and metadata. Algorithm 3 Reprovision
We then apply a voting scheme to select the correct secret.
If fewer than k providers returned corrupt shares and we have (1) For every file in the system, recover it from the old provider
at least k honest shares, the secret returned from this process is set, and reshare it to the new provider set with a new name
guaranteed to be the one that was originally shared. (2) Create a new manifest with information about all new shares,
This scheme as described varies slightly from the algorithm and share it across new providers
presented by Rabin and Ben-Or [7]. In particular, their algorithm (3) Share the new master key across all providers, and broadcast
guarantees that upon reconstruction all players broadcast their in- the new manifest name (commit).
formation and the guarantee they make is that each honest player
8
University of Pennsylvania SEAS Senior Design
Spring 2016