0% found this document useful (0 votes)
18 views7 pages

PPML

Uploaded by

Irini St
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views7 pages

PPML

Uploaded by

Irini St
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Perspective

https://doi.org/10.1038/s42256-020-0186-1

Secure, privacy-preserving and federated machine


learning in medical imaging
Georgios A. Kaissis1,2,3, Marcus R. Makowski1, Daniel Rückert 2
and Rickmer F. Braren 1 ✉

The broad application of artificial intelligence techniques in medicine is currently hindered by limited dataset availability for
algorithm training and validation, due to the absence of standardized electronic medical records, and strict legal and ethical
requirements to protect patient privacy. In medical imaging, harmonized data exchange formats such as Digital Imaging and
Communication in Medicine and electronic data storage are the standard, partially addressing the first issue, but the require-
ments for privacy preservation are equally strict. To prevent patient privacy compromise while promoting scientific research
on large datasets that aims to improve patient care, the implementation of technical solutions to simultaneously address the
demands for data protection and utilization is mandatory. Here we present an overview of current and next-generation meth-
ods for federated, secure and privacy-preserving artificial intelligence with a focus on medical imaging applications, alongside
potential attack vectors and future prospects in medical imaging and beyond.

A
rtificial intelligence (AI) methods have the potential to and Communications in Medicine (DICOM)21 is the universally
revolutionize the domain of medicine, as witnessed, for adopted imaging data format, and electronic file storage is the
example, in medical imaging, where the application of com- near-global standard of care. Even where non-digital formats are
puter vision techniques, traditional machine learning1,2 and—more still in use, the archival nature of, for instance, film radiography
recently—deep neural networks have achieved remarkable suc- allows post hoc digitization, seen, for example, in the CBIS-DDSM
cesses. This progress can be ascribed to the release of large, curated dataset22, consisting of digitized film breast radiographs. Digital
corpora of images (ImageNet3 perhaps being the best known), giv- imaging data, easily shareable, permanently storable and remotely
ing rise to performant pre-trained algorithms that facilitate transfer accessible in the cloud has driven the aforementioned successes of
learning and led to increasing publications both in oncology—with medical imaging AI.
applications in tumour detection4,5, genomic characterization6,7, The second issue representing a stark deterrent from
tumour subtyping8,9, grading prediction10, outcome risk assess- multi-institutional/multi-national AI trials23 is the rigorous regula-
ment11 or risk of relapse quantification12—and non-oncologic appli- tion of patient data and the requirements for its protection. Both the
cations, such as chest X-ray analysis13 and retinal fundus imaging14. United States Health Insurance Portability and Accountability Act
To allow medical imaging AI applications to offer clinical deci- (HIPAA)24 and the European General Data Protection Regulation
sion support suitable for precision medicine implementations, (GDPR)25 mandate strict rules regarding the storage and exchange
even larger amounts of imaging and clinical data will be required. of personally identifiable data and data concerning health, requiring
Large cross-sectional population studies based solely on volunteer authentication, authorization, accountability and—with GDPR—AI
participation, such as the UK Biobank15, cannot fill this gap. Even interpretability, sparking considerations on data handling, owner-
the largest current imaging studies in the field4,5, demonstrating ship and AI governance26,27. Ethical, moral and scientific guidelines
better-than-human performance in their respective tasks, include (soft law28) also prescribe respect towards privacy—that is, the abil-
considerably less data than, for example, ImageNet3, or the amount ity to retain full control and secrecy about one’s personal informa-
of data used to train algorithmic agents in the games of Go or tion. The term privacy is used in this article to encapsulate both
StarCraft16,17, or autonomous vehicles18. Furthermore, such datasets the intention to keep data protected from unintended leakage and
often stem from relatively few institutions, geographic regions or from deliberate disclosure attempts (that is, synonymous with
patient demographics, and might therefore contain unquantifiable ‘confidentiality’).
bias due to their incompleteness with respect to co-variables such as AI in medical imaging is a multifaceted field of patients, hospi-
comorbidities, ethnicity, gender and so on19. tals, research institutions, algorithm developers, diagnostic equip-
However, considering that the sum of the world’s patient data- ment vendors, industry and lawmakers. Its high complexity and
bases probably contains enough data to answer many significant resulting lack of transparency with respect to stakeholder motives
questions, it becomes clear that the inability to access and leverage and data usage patterns, alongside the facilitated data sharing
this data poses a significant barrier to AI applications in this field. enabled by electronic imaging data storage, threaten to diminish the
The lack of standardized, electronic patient records is one importance of individual privacy and relax the grip on personal data
reason. Electronic patient data management is expensive20, and in the name of, at best, scientific development and, at worst, finan-
hospitals in underprivileged regions might be unable to afford cial interests. The field of secure and privacy-preserving AI offers
participation in studies requiring it, potentially perpetuating the techniques to help bridge the gap between personal data protection
aforementioned issues of bias and fairness. In the medical imaging and data utilization for research and clinical routine. Here, we pres-
field, electronic data management is the standard: Digital Imaging ent an overview of current and emerging techniques for privacy

Department of Diagnostic and Interventional Radiology, Faculty of Medicine, Technical University of Munich, Munich, Germany. 2Department of
1

Computing, Imperial College London, London, UK. 3OpenMined. ✉e-mail: [email protected]

Nature Machine Intelligence | VOL 2 | June 2020 | 305–311 | www.nature.com/natmachintell 305


Perspective NaTure MaCHIne InTellIgenCe

Table 1 | Glossary of terms encountered in the article alongside conceptual examples


Method Description Example
Attack vectors
Attacks against the dataset
Re-identification attack Determining an individual’s identity despite Exploiting similarities to other datasets
anonymization based on other information present in in which the same individual is contained
the dataset. (linkage).
Dataset reconstruction attack Deriving an individual’s characteristics from the results Using multiple aggregate statistics to
of computations performed on a dataset without derive data points corresponding to a single
having access to the dataset itself (synonyms: feature individual.
re-derivation, attribute inference).
Tracing attack Determining whether an individual is present in the Exploiting repeated, slightly varying dataset
dataset or not without necessarily determining their queries to ‘distil’ individual information (set
exact identity (synonym: membership inference). differencing).
Attacks against the algorithm
Adversarial attack Manipulation of the input to an algorithm with the Compromising the computation result by
goal of altering it, most often in a way that makes the introducing malicious training examples
manipulation of the input data impossible to detect by (model poisoning).
humans.
Model-inversion/reconstruction attack Derivation of information about the dataset stored Using generative algorithms to recreate
within the algorithm’s weights by observing the parts of the training data based on
algorithm’s behaviour. algorithm parameters.
Secure and private AI terminology
Secure by default implementation (synonym Systems that have been designed from the ground up —
private by design) with privacy in mind and at best require no specialized
data handling.
Anonymization Removal of personally identifiable information from a Removing information related to age,
dataset. gender and so on.
Pseudonymization Replacement of personally identifiable information in Replacing names with randomly generated
a dataset with a dummy/synthetic entry with separate text.
storage of the linkage record (look-up table).
Secure AI Techniques concerned with protecting the AI Algorithm encryption.
algorithms.
Privacy-preserving AI Techniques for protecting the input and output data. Data encryption, decentralized storage.
Federated machine learning Machine learning system relying on distributing the Training of algorithms on hospital computer
algorithm to where the data is instead of gathering the systems instead of on cloud servers.
data where the algorithm is (decentralized/distributed
computation).
Differential privacy Modification or perturbation of a dataset to obfuscate Random shuffling of data to remove the
individual data points while retaining the ability of association between individuals and their
interaction with a data within a certain scope (privacy data entries.
budget) and of statistical analysis. Can also be applied
to algorithms.
Homomorphic encryption Cryptographic technique that preserves the ability to Performing neural network computations on
perform mathematical operations on data as if it was encrypted data without first decrypting it.
unencrypted (plain text).
Secure (multi-party) computation Collection of techniques and protocols enabling Determining which patients two hospitals
two or more parties to split up data among them to have in common without revealing
perform joint computations in a way that prevents any their respective patient list (private set
single party from gaining knowledge of the data but intersection).
preserving the computational result.
Hardware security implementation Collection of techniques whereby specialized computer Secure storage or processing enclaves in
hardware provides guarantees of privacy or security. mobile phones or computers.

preservation with a focus on their applications in medical imaging, Definitions and attack vectors
discuss their benefits, drawbacks and technical implementations, as A glossary of the terms presented throughout the article can be
well as potential weaknesses and points of attack aimed at compro- found in Table 1, and a visual overview of the field can be found
mising privacy. We conclude with an outlook on the current and in Fig. 1.
future developments in the field of medical imaging and beyond, Optimal privacy preservation requires implementations that are
alongside their potential implications. secure by default (synonymously privacy by design29). Such systems

306 Nature Machine Intelligence | VOL 2 | June 2020 | 305–311 | www.nature.com/natmachintell


NaTure MaCHIne InTellIgenCe Perspective
Attacks
•Theft
Secure AI •Inversion
Protecting •Adversarial
the manipulation
algorithms

Federated Homomorphic
learning encryption

Secure multi-party computation

Algorithm
owners

Secure and
private AI

Medical Untrained
Hospitals data Patients
Data Data algorithm
governance ownership

Private AI
Protecting
the
data
Trained
algorithm

Differential privacy
Attacks
•Theft Bob *** Bob Anna
•Identity/
membership
inference Anonymization Pseudonymization
•Feature
reconstruction

Fig. 1 | Secure and private AI. Schematic overview of the relationships and interactions between data, algorithms, actors and techniques in the field of
secure and private AI.

should require minimal or no data transfer and provide theoretical datasets. In medical imaging, anonymization requires removing
and/or technical guarantees of privacy. all pertinent DICOM metadata entries (for example, patient name,
The term secure AI is used for methods concerned with safe- gender and so on). For pseudonymization, the true entries are
guarding algorithms, and the term privacy-preserving AI for sys- replaced by synthetic data (see overview of techniques in ref. 35),
tems allowing data processing without revealing the data itself. and the look-up table safe-kept separately. The main benefit of
Their combination aims to guarantee sovereignty over the input both approaches is simplicity. Anonymization software is built
data and the algorithms, integrity of the computational process into most clinical data archiving systems, rendering it the easiest
and its results, and to offer trustworthy and transparently auditable method in practice. Pseudonymization poses additional difficul-
technical implementations (structured transparency). Such systems ties since it requires data manipulation, not just data deletion, and
must resist attacks against the dataset30, for example identity or safekeeping of the look-up tables for reversing the process. The
membership inference/tracing31 (determining whether an individ- latter can be problematic in the setting of insecure storage, risking
ual is present in a given dataset) and feature/attribute re-derivation/ data theft36. Furthermore, technical errors can render the protec-
re-identification30 (extraction of characteristics of an individual tion ineffective and potentially (for example, in case of retaining
from within the dataset, for example by linkage attacks32). They institution names), an entire dataset identifiable. Moreover, there
must also withstand attacks on the algorithm or the computational is substantial discourse regarding the definition of ‘sufficient/rea-
process—for instance, modification of algorithm parameters (for sonable’ de-identification37 related to the objective/technical dif-
example, by poisoning33)—or derivation of information about the ficulty of reversing the process. Different points of view exist in
dataset from them (model-inversion/reconstruction34). Finally, they different jurisdictions38, complicating the establishment of inter-
must protect the data and the algorithms from theft both in storage national standards. Also, de-identification techniques are usu-
and when transmitted over networks (asset/integrity protection). ally employed as a preparation to data transfer or sharing. This
presents issues in case the patient withdraws their consent, since
Anonymization, pseudonymization and the risks of it uncouples data governance from data ownership (impeding
re-identification the right to be forgotten, GDPR article 17), or if the legislation
Anonymization (the removal of private data from a record) and changes. Lastly, requirements towards the de-identification pro-
pseudonymization (replacement of sensitive entries with artifi- cess vary according to the type of imaging dataset: a radiograph
cially generated ones while still allowing re-attribution using a of a leg is harder to link back to an individual than a computed
look-up table)—collectively de-identification—are currently the tomography scan of their head, where the contours of the face can
most widely used privacy preservation techniques for medical be reconstructed directly from the image. Such re-identification

Nature Machine Intelligence | VOL 2 | June 2020 | 305–311 | www.nature.com/natmachintell 307


Perspective NaTure MaCHIne InTellIgenCe

attacks39 have been shown to yield high success rates both with Differential privacy
tabular data40,41 (such as patient records) and medical imaging Data-perturbation-based privacy approaches operate on the prem-
data42. As a consequence, datasets more prone to identification ise that the systematic randomized modification of a dataset or
must be processed more rigorously, for instance by removal of the algorithm can reduce information about the single individual while
face or skull region from the images (defacing/skull stripping). retaining the capability of statistical reasoning about the dataset.
This complicates data handling, increasing the probability of The approach of retaining the global statistical distribution of a
errors and constitutes a manipulation of the imaging data, which, dataset while reducing individually recognizable information is
at worst, represents an adversarial update to the algorithm43, termed differential privacy56 (DP). Intuitively, a dataset is differ-
reducing its performance and robustness. Ultimately, even such entially private if an outside observer is unable to infer whether a
processing might not be sufficient for the full de-identification specific individual was used for obtaining a result from the data-
of datasets44. Re-identified patient records are a lucrative target set. For example, a causal relationship between obesity and cardiac
for health insurance companies wishing to reduce their financial disease can be inferred without knowing the body mass index of
risk by discriminating against individuals with certain illnesses. the individual patients. DP thus offers resistance to re-identification
It has been reported that large-scale re-identification attacks and attacks such as linkage or set differencing within a certain scope of
the sale of re-identified medical records have become a busi- interaction with the dataset (privacy budget56). DP can be applied
ness model for data-mining companies45. De-identification by to the input data (local DP), the computation results (global DP) or
naive anonymization or pseudonymization alone must therefore the algorithm. Implementations range from simple random shuf-
be viewed as a technically insufficient measure against identity fling of the input data57 to the introduction of noise to the dataset
inference. (Gaussian DP58 with the benefit of better interpretability). DP can
also be applied to algorithm updates during training, for instance
Decentralized data and federated machine learning in neural networks via differentially private stochastic gradient
The concept of federated machine learning began gathering sig- descent59 or private aggregation of teacher ensembles60, or during
nificant attention around the year 201546. It belongs to a class of inference time. Local DP ensures privacy at the source of the data,
decentralized/distributed systems that rely on the principle of putting the data owner in control and is thus well suited to health-
remote execution—that is, distributing copies of a machine learn- care applications61, for instance for federated learning applications
ing algorithm to the sites or devices where the data is kept (nodes), in which health data are being collected by smartphones or wearable
performing training iterations locally, and returning the results of devices. DP applications to imaging are being actively explored62.
the computation (for example, updated neural network weights) Among the challenges associated with DP, the main is the per-
to a central repository to update the main algorithm. Its main turbation of the dataset itself. Data manipulation can degrade the
benefit is the ability of the data to remain with its owner (reten- data, which in an area with access to relatively little data, such as
tion of sovereignty), while still enabling the training of algorithms medical imaging research, may prove deleterious to algorithm
on the data. The federation topology is flexible (model sharing performance. The technique also poses challenges with respect
among the nodes and aggregation at a later time (peer to peer/ to plausibility testing, explaining the process to patients—that is,
gossip strategy47) or full decentralization, combined, for exam- data legibility (human–data interaction63)—regarding algorithm
ple, with contribution tracking/audit trails using blockchains48). development and implementation, and escalates the requirement
Continuous online availability is not required since training can for statistical expertise to ascertain data representativeness64. Most
be performed offline and results returned later. Thus, federated importantly, the specifics of implementing DP in imaging data are
learning approaches have arguably become the most widely used unclear. Tabular data can be easily shuffled, but the perturbation of
next-generation privacy preservation technique, both in indus- images can have unpredictable effects, with research demonstrating
try49 and medical AI applications50. this type of manipulation (for example, adversarial noise) both as an
While federated learning is flexible and resolves data governance attack against algorithms65 and a regularization mechanism leading
and ownership issues, it does not itself guarantee security and pri- to increased robustness66 and resilience against inversion attacks.
vacy unless combined with other methods described below. A lack Thus, further research is required before the widespread application
of encryption can allow attackers to steal personally identifiable of DP in medical imaging.
data directly from the nodes or interfere with the communication
process. This communication requirement can be burdensome for Homomorphic encryption
large machine learning models or data volumes. The decentralized A conceptually simple, albeit technically challenging approach to
nature of the data complicates data curation to ascertain the integ- data or algorithm fortification is cryptography, widely recognized
rity and quality of the results. Technical research must be performed as a gold standard for information security. Current cryptographic
to determine the optimal method for updating the central model algorithms cannot be cracked by brute force67. Encryption is eas-
state (distributed optimization, federated averaging). In case the ily explained to and trusted by patients and practitioners. It can be
local algorithms are not encrypted, or the updates aren’t securely applied both to the algorithm and to the data allowing secure, joint
aggregated, data can leak or algorithms can be tampered with51, computation.
reconstructed or stolen (parameter inference), which is unaccept- Homomorphic encryption (HE) is an encryption scheme that
able from the viewpoint of intellectual property, patent restrictions allows computation on encrypted data as if it was unencrypted
or asset protection. Moreover, neural networks represent a form of (plain text). Homomorphism is a mathematical concept whereby
memory mechanism, with compressed representations of the train- structure is preserved throughout a computation. Since only cer-
ing data stored within their weights (unintended memorization). tain mathematical operations, such as addition and multiplica-
It is therefore possible to reconstruct parts of the training data tion, are homomorphic, the application of HE to neural networks
from the algorithm weights themselves on a decentralized node52–54. requires the operations defined within the algorithm to conform
Such model inversion or reconstruction attacks can cause catastrophic to these limitations and thus standard encryption algorithms like
data leakage: it has been shown that images can be reconstructed the advanced encryption standard (AES)68 cannot be used. Several
with impressive accuracy and detail55, allowing visualization of the implementations of HE algorithms69 with varying levels of effi-
original training data. Federated learning thus offers an infrastruc- ciency exist, and the application of HE represents an efficiency–
tural approach to privacy and security, but further measures, high- security trade-off, with computational performance currently the
lighted below, are required to expand its privacy-preserving scope. most notable issue. Nevertheless, HE has successfully been applied

308 Nature Machine Intelligence | VOL 2 | June 2020 | 305–311 | www.nature.com/natmachintell


NaTure MaCHIne InTellIgenCe Perspective
to convolutional neural networks70, and its benefits demonstrated by large-scale automatic contact tracing and movement tracking,
in a ‘machine learning as a service’ scenario71, whereby data is sent creating a demand for their safe and privacy-protecting techni-
over the network to be processed on an off-site server (cloud com- cal implementation83. All AI applications including sensitive data
puting). It can also be used in federated learning scenarios (with or unfold in a complex, multi-stakeholder tension field of conflicting
without additional DP61) to securely aggregate encrypted algorithm interests. The unregulated use of private data is likely to be more
updates72. widespread than assumed, and cases of misuse—especially out of
financial interest—will probably increase further. Yet the techniques
Secure multi-party computation presented here offer an opportunity to prevent stakeholder interac-
Secure computation can be extended to multiple parties—secure tions from becoming a zero-sum game.
multi-party computation (SMPC)73—whereby processing is per- We believe that the widespread adoption of secure and private
formed on encrypted data shares, split among them in a way that AI will require targeted multi-disciplinary research and invest-
no single party can retrieve the entire data on their own. The ment in the following areas. (1) Decentralized data storage and
computation result can be announced without any party ever federated learning systems, replacing the current paradigm of
having seen the data itself, which can be recovered only by con- data sharing and centralized storage, have the greatest potential
sensus. A conceptual example for SMPC is a ballot, where the to enable privacy-preserving cross-institutional research in a
result needs to be known, but the individual voter’s preference breadth of biomedical disciplines in the near future84,85, with results
does not. For a technical description of SMPC, we refer to ref. 74. in medical imaging50,86 and genomics87 recently demonstrated.
The research interest in SMPC has recently risen, since it allows (2) To counteract the drawbacks of the individual techniques
for ‘secret sharing’ in semi-trusted and low-trust environments. already presented, efficient cryptographic and privacy primi-
Notably, SMPC has been used in the setting of genetic sequencing tives, neural network operations88 based, for example, on func-
and diagnostics without revealing the patient’s genome75. In the tional encryption89, quantization90 and optimization strategies91,
domain of medical imaging, SMPC can be employed to perform and encrypted transfer learning approaches92 must be further
analyses on datasets completely in the encrypted domain and developed. (3) The trade-offs between accuracy, interpretability,
without otherwise perturbing the data. It can thus help to increase fairness, bias and privacy (privacy-utility trade-offs), need to be
the effective amount of available data without revealing individ- researched. In the field of radiology, for instance, interpretabil-
ual identities or risking information leakage. It can also enable ity in the encrypted setting is limited to the evaluation of trained
the ethically responsible provision of machine learning services algorithms on new images or inspection of the plain-text input
while rendering the commercial use of the data itself impossible, data; however, intermediate outputs might be obfuscated and
or at least under the control of the individual, and subject to legal hard to interpret. Current research about interpretable private
regulation after appropriate ethical debate, similar to the debate algorithms93 can alleviate this issue. (4) Cryptographic expertise
about organ donation (single-use accountability). For example, is required for the design and implementation of secure and effi-
machine-learning-assisted medical image analysis services can be cient systems that not only resist (or at least reveal) errors due to
provided under the guarantee of data protection from malicious technical implementation, but are also robust against semi-honest
use in case of theft or from unwarranted financial exploitation76. or dishonest participants/adversaries attempting to undermine
As long as the data and the algorithms are encrypted, they remain the system94. (5) Deployed models must be monitored and poten-
unusable unless permission is granted by both parties, yielding tially corrected for temporal instability (that is, statistical drift95),
a shared governance model. The notable limitations of SMPC which can be difficult with encrypted data or algorithms. (6) Until
are the requirements for continuous data transfer between par- fully secure and private solutions are the standard, research has to
ties (communication overhead) and for their continuous online address the question of how the right to be forgotten (for example,
availability. The reliability/redundancy and scalability to more GDPR) can be realized—for example, via machine unlearning96
than a small number of parties is a concern for SMPC applica- (‘un-training’ an algorithm when an individual withdraws con-
tions77, and computational considerations are a concern beyond sent). (7) The widespread implementation of secure and private
small algorithm sizes, with efficient SMPC implementations of AI will hinge on lowering the barrier to entry for researchers and
state-of-the-art neural network algorithms currently under active developers by provision of accessible, open-source tools such
development78. as open-source extensions to deep learning frameworks, imple-
mentations of state-of-the-art algorithms and federated learn-
Secure hardware implementations ing solutions, many of which have recently become available97,98.
Encryption provides a theoretical/mathematical privacy guarantee. (8) The development of auditable and objectively trustworthy sys-
However, privacy guarantees on the hardware level also exist, for tems99 (that is, not relying on subjective assertions—for example,
example, in the form of secure processors or enclaves implemented by governments) will promote the universal acceptance of
in mobile devices79. They can assure data and algorithm privacy, secure and private AI solutions by individuals and policymakers.
for example, in federated learning workflows, even in the case of (9) The technical ability offered by secure and private AI solutions
operating system kernel breaches. Due to the rising significance of to retain sovereignty over one’s identity100 and new techniques to
hardware-level deep learning implementations (for example, tensor quantify and track the added value of individual datasets with
processing units80 or machine-learning-specific instruction sets81), respect to algorithm performance will strengthen the notion of
it is likely that such system-based privacy guarantees (trusted execu- private data as a scarce and valuable resource within an evolving
tion environments) built into edge hardware such as mobile phones data economy101 currently experiencing oversupply102. (10) Lastly,
will become more prevalent. we view both the education of patients, physicians, researchers
and policymakers, and the open scientific, public and political
Outlook discourse about privacy, current risks and technical possibili-
Medical imaging has arguably witnessed among the largest ties as paramount for reinforcing the cultural value of privacy
advances in AI applications due to the concurrent developments in and cultivating a sustainable attitude of trust and value-aligned
computer vision. However, the issues of security and privacy are not cooperation both in science and society.
limited to medical imaging82, as seen for example in the 2019/2020
SARS-CoV2-pandemic, which sparked worldwide concern about Received: 26 February 2020; Accepted: 7 May 2020;
the implications of setting political, ethical and legal precedents Published online: 8 June 2020

Nature Machine Intelligence | VOL 2 | June 2020 | 305–311 | www.nature.com/natmachintell 309


Perspective NaTure MaCHIne InTellIgenCe

References 33. Kurita, K., Michel, P. & Neubig, G. Weight poisoning attacks on pre-trained
1. Aerts, H. J. W. L. et al. Decoding tumour phenotype by noninvasive models. Preprint at https://arxiv.org/abs/2004.06660 (2020).
imaging using a quantitative radiomics approach. Nat. Commun. 5, 34. Al-Rubaie, M. & Chang, J. M. Privacy preserving machine learning: threats
4006 (2014). and solutions. IEEE Secur. Priv. 17, 49–58 (2019).
2. Lambin, P. et al. Radiomics: the bridge between medical imaging and 35. Surendra, H. & Mohan, H. S. A review of synthetic data generation
personalized medicine. Nat. Rev. Clin. Oncol. 14, 749–762 (2017). methods for privacy preserving data publishing. Int. J. Sci. Technol. Res. 6,
3. Russakovsky, O. et al. ImageNet large scale visual recognition challenge. 95–101 (2017).
Int. J. Comp. Vision 115, 211–252 (2015). 36. Jiang, J. X. & Bai, G. Types of information compromised in breaches of
4. McKinney, S. M. et al. International evaluation of an AI system for breast protected health information. Ann. Intern. Med. 172, 159–160 (2019).
cancer screening. Nature 577, 89–94 (2020). 37. Taylor, M. J. & Wilson, J. Reasonable expectations of privacy and disclosure
5. Ardila, D. et al. End-to-end lung cancer screening with three-dimensional of health data. Med. Law Rev. 27, 432–460 (2019).
deep learning on low-dose chest computed tomography. Nat. Med. 25, 38. General Data Protection Regulation: NHS European Office Position Paper
954–961 (2019). (NHS Confederation, 2012).
6. Pinker, K., Chin, J., Melsaether, A. N., Morris, E. A. & Moy, L. Precision 39. El Emam, K., Jonker, E., Arbuckle, L. & Malin, B. A systematic review of
medicine and radiogenomics in breast cancer: new approaches toward re-identification attacks on health data. PLoS ONE 6, e28071 (2011).
diagnosis and treatment. Radiology 287, 732–747 (2018). 40. Narayanan, A. & Shmatikov, V. Robust de-anonymization of large sparse
7. Lu, H. et al. A mathematical-descriptor of tumor-mesoscopic-structure from datasets. In 2008 IEEE Symp. Security and Privacy 111–125 (IEEE, 2008).
computed-tomography images annotates prognostic- and molecular- 41. de Montjoye, Y. A., Radaelli, L., Singh, V. K. & Pentland, A. S. Identity and
phenotypes of epithelial ovarian cancer. Nat. Commun. 10, 764 (2019). privacy. Unique in the shopping mall: on the reidentifiability of credit card
8. Kaissis, G. et al. A machine learning model for the prediction of survival metadata. Science 347, 536–539 (2015).
and tumor subtype in pancreatic ductal adenocarcinoma from preoperative 42. Schwarz, C. G. et al. Identification of anonymous MRI research
diffusion-weighted imaging. Eur. Radiol. Exp. 3, 41–41 (2019). participants with face-recognition software. New Engl. J. Med. 381,
9. Kaissis, G. et al. A machine learning algorithm predicts molecular subtypes 1684–1686 (2019).
in pancreatic ductal adenocarcinoma with differential response to 43. Ma, X. et al. Understanding adversarial attacks on deep learning based
gemcitabine-based versus FOLFIRINOX chemotherapy. PLoS ONE 14, medical image analysis systems. Pattern Recognit. https://doi.org/10.1016/j.
e0218642 (2019). patcog.2020.107332 (2020).
10. Cui, E. et al. Predicting the ISUP grade of clear cell renal cell carcinoma 44. Abramian, D. & Eklund, A. Refacing: reconstructing anonymized facial
with multiparametric MR and multiphase CT radiomics. Eur. Radiol. 30, features using GANs. In 2019 IEEE 16th International Symp. Biomedical
2912–2921 (2020). Imaging https://doi.org/10.1109/ISBI.2019.8759515 (IEEE, 2019).
11. Varghese, B. et al. Objective risk stratification of prostate cancer using 45. Tanner, A. Our Bodies, Our Data: How Companies Make Billions Selling Our
machine learning and radiomics applied to multiparametric magnetic Medical Records (Beacon, 2017).
resonance images. Sci. Rep. 9, 1570 (2019). 46. Konečný, J., McMahan, B. & Ramage, D. Federated optimization: distributed
12. Elshafeey, N. et al. Multicenter study demonstrates radiomic features optimization beyond the datacenter. Preprint at https://arxiv.org/
derived from magnetic resonance perfusion images identify abs/1511.03575 (2015).
pseudoprogression in glioblastoma. Nat. Commun. 10, 3170 (2019). 47. Hu, C., Jiang, J. & Wang, Z. Decentralized federated learning: a segmented
13. Rajpurkar, P. et al. CheXNet: radiologist-level pneumonia detection gossip approach. Preprint at https://arxiv.org/abs/1908.07782 (2019).
on chest X-rays with deep learning. Preprint at https://arxiv.org/ 48. Passerat-Palmbach, J. et al. A blockchain-orchestrated federated learning
abs/1711.05225 (2017). architecture for healthcare consortia. Preprint at https://arxiv.org/
14. Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus abs/1910.12603 (2019).
photographs via deep learning. Nat. Biomed. Eng. 2, 158–164 (2018). 49. Konečný, J. et al. Federated learning: strategies for improving communication
15. Sudlow, C. et al. UK biobank: an open access resource for identifying efficiency. Preprint https://arxiv.org/abs/1610.05492 (2016).
the causes of a wide range of complex diseases of middle and old age. 50. Rieke, N. et al. The future of digital health with federated learning. Preprint
PLoS Med. 12, e1001779 (2015). at https://arxiv.org/abs/2003.08119 (2020).
16. Silver, D. et al. Mastering the game of Go with deep neural networks and 51. Tomsett, R., Chan, K. & Chakraborty, S. Model poisoning attacks against
tree search. Nature 529, 484–489 (2016). distributed machine learning systems. Proc. SPIE 11006, 110061D (2019).
17. Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent 52. Carlini, N., Liu, C., Erlingsson, Ú., Kos, J. & Song, D. The secret sharer:
reinforcement learning. Nature 575, 350–354 (2019). evaluating and testing unintended memorization in neural networks. In
18. Fridman, L. et al. MIT advanced vehicle technology study: large-scale Proc. 28th USENIX Security Symp. 267–284 (USENIX Association, 2019).
naturalistic driving study of driver behavior and interaction with 53. Zhang, Y. et al. The secret revealer: generative model-inversion attacks
automation. IEEE Access 7, 102021–102038 (2019). against deep neural networks. Preprint at https://arxiv.org/abs/1911.07135
19. Obermeyer, Z. & Mullainathan, S. Dissecting racial bias in an algorithm (2019).
that guides health decisions for 70 million people. In Proc. Conf. Fairness, 54. Hitaj, B., Ateniese, G. & Perez-Cruz, F. Deep models under the GAN:
Accountability, and Transparency 89 (ACM, 2019). information leakage from collaborative deep learning. In Proc. 2017
20. Wang, S. J. et al. A cost-benefit analysis of electronic medical records in ACM SIGSAC Conf. Computer and Communications Security 603–618
primary care. Am. J. Med. 114, 397–403 (2003). (ACM, 2017).
21. DICOM reference guide. Health Dev. 30, 5–30 (2001). 55. Fredrikson, M., Jha, S. & Ristenpart, T. Model inversion attacks that exploit
22. Lee, R. S. et al. A curated mammography data set for use in computer-aided confidence information and basic countermeasures. In Proc. 22nd ACM
detection and diagnosis research. Sci. Data 4, 170177 (2017). SIGSAC Conf. Computer and Communications Security 1322–1333
23. Price, W. N. & Cohen, I. G. Privacy in the age of medical big data. (ACM, 2015).
Nat. Med. 25, 37–43 (2019). 56. Roth, A. & Dwork, C. The algorithmic foundations of differential privacy.
24. HIPAA. US Department of Health and Human Services https://www.hhs. Found. Trends Theoretical Comp. Sci. 9, 211–407 (2013).
gov/hipaa/index.html (2020). 57. Cheu, A., Smith, A., Ullman, J., Zeber, D. & Zhilyaev, M. Distributed
25. GDPR. Intersoft Consulting https://gdpr-info.eu (2016). differential privacy via shuffling. In Annual Int. Conf. Theory and
26. Cath, C. Governing artificial intelligence: ethical, legal and technical Applications of Cryptographic Techniques 375–403 (Springer, 2018).
opportunities and challenges. Philos. Trans. R. Soc. A 376, 20180080 (2018). 58. Dong, J., Roth, A. & Su, W. J. Gaussian differential privacy. Preprint at
27. Theodorou, A. & Dignum, V. Towards ethical and socio-legal governance in https://arxiv.org/abs/1905.02383 (2019).
AI. Nat. Mach. Intell. 2, 10–12 (2020). 59. Rajkumar, A. & Agarwal, S. A differentially private stochastic gradient
28. Jobin, A., Ienca, M. & Vayena, E. The global landscape of AI ethics descent algorithm for multiparty classification. In Proc. Fifteenth Int. Conf.
guidelines. Nat. Mach. Intell. 1, 389–399 (2019). Artificial Intelligence and Statistics 22, 933–941 (PMLR, 2012).
29. Cavoukian, A. Privacy by Design (Information and Privacy Commissioner 60. Papernot, N. et al. Scalable private learning with PATE. In Proc. 6th Int.
of Ontario, 2011). Conf. Learning Representations (ICLR, 2018).
30. Dwork, C., Smith, A., Steinke, T. & Ullman, J. Exposed! A survey of attacks 61. Kim, J. W., Jang, B. & Yoo, H. Privacy-preserving aggregation of personal
on private data. Annu. Rev. Stat. Appl. 4, 61–84 (2017). health data streams. PLoS ONE 13, e0207639 (2018).
31. Shokri, R., Stronati, M., Song, C. & Shmatikov, V. Membership inference 62. Mireshghallah, F. et al. A principled approach to learning stochastic
attacks against machine learning models. In Proc. 38th IEEE Symp. Security representations for privacy in deep neural inference. Preprint at
and Privacy https://doi.org/10.1109/SP.2017.41 (IEEE, 2017). https://arxiv.org/abs/2003.12154 (2020).
32. Bindschaedler, V., Grubbs, P., Cash, D., Ristenpart, T. & Shmatikov, V. The 63. Mortier, R., Haddadi, H., Henderson, T., McAuley, D. & Crowcroft, J.
tao of inference in privacy-protected databases. In Proc. VLDB Endowment Human-data interaction: the human face of the data-driven society.
11, 1715–1728 (ACM, 2018). Preprint at https://arxiv.org/abs/1412.6159 (2014).

310 Nature Machine Intelligence | VOL 2 | June 2020 | 305–311 | www.nature.com/natmachintell


NaTure MaCHIne InTellIgenCe Perspective
64. Garfinkel, S. L., Abowd, J. M. & Powazek, S. Issues encountered deploying 88. Takabi, D., Podschwadt, R., Druce, J., Wu, C. & Procopio, K. Privacy
differential privacy. In Proc. 2018 Workshop on Privacy in the Electronic preserving neural network inference on encrypted data with GPUs. Preprint
Society 133–137 (ACM, 2018). at https://arxiv.org/abs/1911.11377 (2019).
65. Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing 89. Ryffel, T., Dufour-Sans, E., Gay, R., Bach, F. & Pointcheval, D. Partially
adversarial examples. Preprint at https://arxiv.org/abs/1412.6572 (2014). encrypted machine learning using functional encryption. In Proc. 33rd
66. You, Z., Ye, J., Li, K., Xu, Z. & Wang, P. Adversarial noise layer: regularize Conf. Neural Information Processing Systems (NeurIPS, 2019).
neural network by adding noise. In 2019 IEEE Int. Conf. Image Processing 90. Chou, E. et al. Faster CryptoNets: leveraging sparsity for real-world
https://doi.org/10.1109/ICIP.2019.8803055 (IEEE, 2019). encrypted inference Preprint at https://arxiv.org/abs/1811.09953 (2018).
67. Schneier, B. & Sutherland, P. Applied Cryptography: Protocols, Algorithms, 91. Dathathri, R. et al. CHET: an optimizing compiler for fully-homomorphic
and Source Code in C 157–158 (Wiley, 1995). neural-network inferencing. In Proc. 40th ACM SIGPLAN Conf.
68. Daemen, J. & Rijmen, V. The Design of Rijndael: AES - The Advanced Programming Language Design and Implementation 142–156 (ACM, 2019).
Encryption Standard (Springer, 2013). 92. Salem, M., Taheri, S. & Yuan, J.-S. Utilizing transfer learning and
69. Acar, A., Aksu, H., Selcuk Uluagac, A. & Conti, M. A survey on homomorphic encryption in a privacy preserving and secure biometric
homomorphic encryption schemes: theory and implementation. recognition system. Computers 8, 3 (2019).
ACM Comput. Surv. 51, 79 (2018). 93. Harder, F., Bauer, M. & Park, M. Interpretable and differentially private
70. Hesamifard, E., Takabi, H. & Ghasemi, M. CryptoDL: deep neural networks predictions. In Proc. Thirty-Fourth AAAI Conf. Artificial Intelligence
over encrypted data. Preprint at https://arxiv.org/abs/1711.05189 (2017). (AAAI, 2020).
71. Dowlin, N. et al. CryptoNets: applying neural networks to encrypted data 94. Xu, Z., Li, C. & Jegelka, S. Robust GANs against dishonest adversaries.
with high throughput and accuracy. In Proc. 33rd Int. Conf. Machine Preprint at https://arxiv.org/abs/1802.09700 (2018).
Learning Vol. 48 201–210 (PMLR, 2016). 95. Nelson, K. et al. Evaluating model drift in machine learning algorithms. In
72. Li, X., Chen, D., Li, C. & Wang, L. Secure data aggregation with fully 2015 IEEE Symp. Computational Intelligence for Security and Defense
homomorphic encryption in large-scale wireless sensor networks. Sensors Applications https://doi.org/10.1109/CISDA.2015.7208643 (IEEE, 2015).
15, 15952–15973 (2015). 96. Bourtoule, L. et al. Machine unlearning. Preprint at https://arxiv.org/
73. Zhao, C. et al. Secure multi-party computation: theory, practice and abs/1912.03817 (2019).
applications. Inform. Sci. 476, 357–372 (2019). 97. Ryffel, T. et al. A generic framework for privacy preserving deep learning.
74. Evans, D., Kolesnikov, V. & Rosulek, M. A Pragmatic Introduction to Secure Preprint at https://arxiv.org/abs/1811.04017 (2018).
Multi-Party Computation (NOW, 2018). 98. Dahl, M. et al. Private machine learning in TensorFlow using secure
75. Jagadeesh, K. A., Wu, D. J., Birgmeier, J. A., Boneh, D. & Bejerano, G. computation. Privacy Preserving Machine Learning, NeurIPS 2018
Deriving genomic diagnoses without revealing patient genomes. Science Workshop, Montréal, December 8, 2018. Available at: https://arxiv.org/
357, 692–695 (2017). abs/1810.08130 (2018).
76. Helm, T. Patient data from GP surgeries sold to US companies. The 99. Brundage, M. et al. Toward trustworthy AI development: mechanisms for
Guardian https://www.theguardian.com/politics/2019/dec/07/ supporting verifiable claims. Preprint at https://arxiv.org/abs/2004.07213
nhs-medical-data-sales-american-pharma-lack-transparency (2019). (2020).
77. Tkachenko, O., Weinert, C., Schneider, T. & Hamacher, K. Large-scale 100. Tobin, A. & Reed, D. The Inevitable Rise of Self-Sovereign Identity
privacy-preserving statistical computations for distributed genome-wide (Sovrin Foundation, 2016).
association studies. In Proc. 2018 on Asia Conf. Computer and 101. Ghorbani, A. & Zou, J. Data Shapley: equitable valuation of data for
Communications Security 221–235 (2018). machine learning. In Proc. 36th Int. Conf. Machine Learning (PMLR, 2019).
78. Kumar, N. et al. CrypTFlow: secure tensorflow inference. In Proc. 41st IEEE 102. Elvy, S.-A. Paying for privacy and the personal data economy. Colum. L.
Symp. Security and Privacy (IEEE, 2020). Rev. 117, 1369 (2017).
79. Secure enclave overview. Apple Platform Security https://support.apple.com/
guide/security/secure-enclave-overview-sec59b0b31ff/web (2020). Acknowledgements
80. Cloud TPU. Google https://cloud.google.com/tpu/ (2020). We thank A. Trask, J. Passerat-Palmbach and the OpenMined project members for their
81. Chen, A. Y. et al. An instruction set architecture for machine learning. support and critical appraisal, and B. Farkas for creating the article’s illustration.
ACM Trans. Comput. Syst. 36, 9 (2019).
82. Qayyum, A., Qadir, J., Bilal, M. & Al-Fuqaha, A. Secure and robust
machine learning for healthcare: a survey. Preprint at https://arxiv.org/ Competing interests
abs/2001.08103 (2020). The authors declare no competing interests.
83. Pandemic data challenges. Nat. Mach. Intell. 2, 193 (2020).
84. Son, J. et al. Privacy-preserving electrocardiogram monitoring for intelligent
arrhythmia detection. Sensors 17, 1360 (2017).
85. Mudgal, K. S. & Das, N. The ethical adoption of artificial intelligence in Additional information
radiology. BJR Open 2, 20190020 (2020). Correspondence should be addressed to R.F.B.
86. Li, W. et al. Privacy-preserving federated brain tumour segmentation. In Reprints and permissions information is available at www.nature.com/reprints.
Proc. 10th Int. Workshop on Machine Learning in Medical Imaging 133–141
(Springer, 2019). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
87. Grishin, D., Obbad, K. & Church, G. M. Data privacy in the age of personal published maps and institutional affiliations.
genomics. Nat. Biotechnol. 37, 1115–1117 (2019). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Nature Machine Intelligence | VOL 2 | June 2020 | 305–311 | www.nature.com/natmachintell 311

You might also like