0% found this document useful (0 votes)

17 views10 pages

A Privacy Preserving Distributed Filtering Framework For NLP 30r6g0qti3

Uploaded by

rutmar35

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views10 pages

A Privacy Preserving Distributed Filtering Framework For NLP 30r6g0qti3

Uploaded by

rutmar35

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Sadat et al.

BMC Medical Informatics and Decision Making (2019) 19:183

https://doi.org/10.1186/s12911-019-0867-z

SOFTWARE Open Access

A privacy-preserving distributed filtering

framework for NLP artifacts
Md Nazmus Sadat1,2* , Md Momin Al Aziz1,2, Noman Mohammed1, Serguei Pakhomov3, Hongfang Liu4 and
Xiaoqian Jiang5

Abstract
Background: Medical data sharing is a big challenge in biomedicine, which often hinders collaborative research.
Due to privacy concerns, clinical notes cannot be directly shared. A lot of efforts have been dedicated to de-identifying
clinical notes but it is still very challenging to accurately locate and scrub all sensitive elements from notes in an automatic
manner. An alternative approach is to remove sentences that might contain sensitive terms related to personal information.
Methods: A previous study introduced a frequency-based filtering approach that removes sentences containing low
frequency bigrams to improve the privacy protection without significantly decreasing the utility. Our work extends this
method to consider clinical notes from distributed sources with security and privacy considerations. We developed a
novel secure protocol based on private set intersection and secure thresholding to identify uncommon and low-frequency
terms, which can be used to guide sentence filtering.
Results: As the computational cost of our proposed framework mostly depends on the cardinality of the intersection of
the sets and the number of data owners, we evaluated the framework in terms of these two factors. Experimental results
demonstrate that our proposed method is scalable in various experimental settings. In addition, we evaluated our framework
in terms of data utility. This evaluation shows that the proposed method is able to retain enough information for data analysis.
Conclusion: This work demonstrates the feasibility of using homomorphic encryption to develop a secure and efficient
multi-party protocol.
Keywords: Biomedical data security and privacy, Clinical notes de-identification, Homomorphic encryption

Background Health Information (PHI) defined in the HIPAA safe

Clinical notes represent an indispensable component of harbor method. Traditionally, this is done through the
electronic health records (EHRs), which contain import- detection and scrubbing of 18 specific categories of PHIs
ant information (such as symptoms and medical history) including name, social security number, dates, etc. Many
that structured data might not cover. Sharing clinical efforts have been devoted in this direction including
notes can promote research, improve healthcare services, both the manual and the automatic approaches. Manual
and contribute to clinical decision support [1]. However, approaches to identify PHI are prone to mistakes
it has been a very challenging task to de-identify such (Neamatullah et al [2] shows the recall of 14 clinicians
data to mitigate the privacy risks. Due to the unstruc- to detect 130 clinical notes varied from 0.63 to 0.94) and
tured nature of notes, de-identification is not as straight- they are also expensive (e.g., ~$50/h to read and label
forward as for the structured data. To satisfy the privacy 20 k words/hour in de-identifying MIMIC II database
regulations of Health Insurance Portability and Account- [3]). Automated algorithms can save time and reduce
ability Act (HIPAA), we can remove the Protected the human review efforts. Early systems used rule or
template based approaches to match and detect PHI [4].
* Correspondence: [email protected] Berman [5] developed a concept matching algorithm
1
Department of Computer Science, University of Manitoba, Winnipeg, MB that steps through confidential pathology text to replace
R3T 2N2, Canada
2
Department of Biomedical Informatics, University of California San Diego, La
medical terms matching standard nomenclature code
Jolla, CA, USA with a synonymous term while keeping the high
Full list of author information is available at the end of the article

© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Sadat et al. BMC Medical Informatics and Decision Making (2019) 19:183 Page 2 of 10

frequency “stop words” intact. However, the system Existing works and their limitations
blocks too much and has a high false positive rate, A critical step for our distributed bigram filtering model
making the outputs hard to read [2]. Finley et al pro- is to find what the bigrams in common are among all
posed a similar method which was applied to de-identify collaborative sites in a privacy-preserving manner.
distributed semantic models [6]. Scrub system [7] used a Although there are several studies on 2-party private set
template-based approach to match components of high intersection [16, 17], only a few works have been done
privacy risk, which are then removed, generalized, or to solve multi-party private set intersection (MPSI)
replaced with made-up ones. This method can get rid of problem. Earlier approaches for MPSI have some limita-
explicit personally-identifiable information but it does not tions. In [18], the dataset size of each party must be
handle combinations of fields and the results might still be equal. Another approach suffers from approximation
matched or linked to the identities of individuals [8]. errors [19]. A recent work has shown the feasibility of
Other researchers also treated text de-identification as handling n > 2 parties [20]. In this work [20], each data
a classic Named Entity Recognition (NER) problem and owner constructs a Bloom filter from their data (using
tried to solve it with machine learning models [9]. only the words or bigrams, not the count associated with
Szarvas et al used decision tree to take into consi- them). Data owners send the encrypted (exponential
deration of various features (length, frequency, etc.) to ElGamal encryption scheme) Bloom filter to a service
detect PHIs [10]. Several research groups [2, 11] devel- provider. All encrypted Bloom filters are securely added
oped methods based on Support Vector Machine (SVM) by the service provider without decrypting, which results
to classify sensitive attributes based on Part-of-speech in an encrypted Integrated Bloom Filter (IBF). Then, the
(POS) inputs. Another popular framework utilizes condi- service provider constructs a randomized n-subtraction
tional random fields (CRF), an extension of logistic of IBF (encrypted), where n is the number of parties.
regression and considers correlations in the sentence to The service provider broadcasts this encrypted random-
predict PHIs [12, 13]. Latest methods in this direction ized n-subtraction of IBF to all the data owners. Finally,
[14] using deep learning approaches reported improved all data owners jointly decrypt it and compute the set
performance in detecting PHIs but the model requires intersection: if an element x is in the set intersection, the
careful tuning of parameters for each dataset, which corresponding array locations in the encrypted random-
makes it hard to be portable for collaborative research. ized n-subtraction of IBF, where x is mapped by k hash
A recent method was proposed by Li et al [15] to filter functions is an encryption of 0; otherwise, is an encryp-
out rare sentences (frequency < 3) and sentences con- tion of random integer. Their approach [20] demon-
taining bigrams under a certain frequency threshold strated good performance for set sizes range from 64 to
(frequency < 256). This method demonstrated good 16,384. However, this approach may not scale well with
performance in obtaining sentences with almost no PHIs millions of records, which is common in real world
(evaluated by a manual review on sampled outputs) applications. With a much larger set, to reduce the prob-
while preserving a similar Type Unique Identity (TUI) ability of false positives, the size of the Bloom filter
distribution of the original data, providing an alternative should be large enough compared to the number of
and generalizable way to obtain useful data with items to be inserted into it. In their approach, runtime is
mitigated privacy risks. However, the method is only dominated by the encryption and decryption of Bloom
designed to anonymize data from a single source. In filter. Constructing, encrypting, and transferring such
reality, collaborative research often involves more than large Bloom filters (that can deal with millions of records
one party and poses new challenges to conduct filtering with a minimal probability of false positives) will introduce
in a global manner. In this paper, we propose a distrib- huge computation and communication overhead.
uted and privacy-preserving method as an extension of Our problem specification is different from these works
the single source model [15]. Our criterion for bigram on private set intersection mentioned here, which do not
filtering is stricter than previous work [15] by taking dis- involve any secure thresholding operations. We are de-
tributional differences of local sites into consideration. scribing these works just to give an overview of state-of-
We will only keep sentences containing bigrams observed the-art solutions of the related problems. To the best of
at all collaboration sites and with sufficient global fre- our knowledge, there is no secure protocol for sensitive
quency. Our proposed method can be easily generalized information filtering that combines private set intersection
to cover other NLP artifacts including unigram, trigram, and secure thresholding.
and n-gram. To develop such a global bigram-based filter- The major contributions of this article are summarized
ing method, appropriate protection needs to be enforced as follows:
on private set intersection, secure count aggregation,
and thresholding to ensure data confidentiality during 1. We propose a novel framework based on private
the process. set intersection and secure thresholding to identify
Sadat et al. BMC Medical Informatics and Decision Making (2019) 19:183 Page 3 of 10

uncommon and low-frequency bigrams, which Threat model

can be used to remove sentences from clinical notes In this work, our goal is to ensure that each data
that might contain privacy sensitive terms. The owner knows the thresholded set intersection as a re-
proposed framework takes into consideration sult of the protocol. Data owners should not know
distributional differences of local sites. In addition, the elements of other data owners’ dataset (elements
the framework is highly generalizable: it can be that are not in the intersection). We consider the
used for any other type of NLP artifact. central server as a semi-honest party (also known as
2. The proposed framework demonstrates the honest-but-curious). It follows the protocol but may
feasibility of using homomorphic encryption to attempt to scoop additional information from the ser-
develop a secure and efficient multi-party protocol. ver logs or received messages. We also assume that
For the homomorphic operations, we leverage a the data owners do not collude. These assumptions
Single Instruction, Multiple Data (SIMD) scheme are standard and have been adopted by several earlier
that significantly boosts the performance of the works [21, 22].
proposed framework.
3. Our proposed method can simultaneously Problem specification
guarantee data privacy and preserve data utility for The objective of this study is to identify the globally
analysis. It is able to retain enough information for infrequent common bigrams of participating parties
data analysis. based on a threshold value. In the first phase of the
system protocol, all the parties jointly identify the
To the best of our knowledge, this is the first privacy- common bigrams. Then, data owners send counts of
preserving work to de-identify clinical notes from dis- the common bigrams to the central server. Consider
tributed sources. the example of Table 1. Here, data owner A sends E
(count of bigram Flu-fever = 10), E (count of bigram
Implementation Cancer-pain = 15), and E (count of bigram Diabetes-glau-
System overview coma = 20), where E denotes an encryption algorithm.
We developed a secure and privacy-preserving frame- After receiving counts from all data owners, the central
work for bigram-based filtering to simultaneously meet server performs addition over the bigram counts. If the
two goals: multiparty private set intersection and secure total count for a specific bigram is less than a predeter-
thresholding. mined threshold, then that bigram is considered privacy-
sensitive, and this information can be used to guide sen-
Architecture and entities tence filtering of clinical notes. The intuition behind this
There are three types of entities in our system. Figure 1 rep- filtering is: the more potentially identifying a bigram is,
resents the system architecture of our proposed framework. the rarer it will be.

Data owner: Data owners might be any hospital, Preliminaries

clinical research facility, or federal (or, provincial) Homomorphic encryption
health science institute that possess clinical datasets. The concept of an encryption scheme that can per-
Our proposed system supports any number of data form arbitrary computation on encrypted data was
owners. first proposed by Rivest et al [23] in 1978. Many trad-
Crypto Service Provider (CSP): Cryptographic Service itional homomorphic encryption schemes are either
provider manages public and private keys. CSP also additively homomorphic (Paillier [24]), or multiplica-
manages salt for hashing (refer to Security Analysis, tively homomorphic (ElGamal [25]). However, such
Security of Hashing for more details). Each data
owner receives a public key, a private key, and an Table 1 Identification of globally infrequent bigrams
evaluation key from the CSP. Data owners use Data Owner Frequency of the Frequency of the Frequency of the
public key to encrypt their data (count of bigram), bigram Flu-fever bigram Cancer-pain bigram Diabetes-
and use private key to decrypt the encrypted response glaucoma
from the central server. A 10 15 20
Central Server: The central server coordinates the B 20 15 10
system protocol. It maintains communications with C 5 15 25
all other entities of the system. It receives encrypted
Total 35 45 55
data (hash and encrypted count of bigram) from the
Let us consider the data of the above table. Assume, the threshold value is 40.
data owners, performs computations locally, and Since total count of Flu-fever (35) is less than the threshold value (40), it will
finally sends the encrypted result to the data owners. not be considered privacy-sensitive
Sadat et al. BMC Medical Informatics and Decision Making (2019) 19:183 Page 4 of 10

Fig. 1 Block diagram of the system architecture. Only encrypted summary statistics are delegated to the central server to conduct the bigram
filtering, which returns to individual data owners with encrypted bigrams (that are both common and frequent enough in a global manner).
This block diagram was drawn by the authors

restriction to one single algebraic operation is very homomorphic addition operation Add computes the
inconvenient for general purpose applications. Lately, encrypted sum of m1, m2.
researchers are adopting lattice cryptosystems, which Mult(c1, c2): Let c1, c2 be the ciphertexts for
leverage ring homomorphism (addition and multiplica- messages m1, m2 respectively. Given, c1, c2 as input,
tion) [26, 27]. The cryptosystem in [28] is a Somewhat a homomorphic multiplication operation Mult
Homomorphic Encryption (SWHE) scheme that can computes the encrypted product of m1, m2.
compute a bounded number of homomorphic func- ReLin(cmult, evk): The objective of relinearization
tions. Other recent RLWE-based SWHE cryptosystems operation ReLin is to reduce the size of a given ciphertext
include BGV [29], FV [30], and YASHE [31]. While cmult back to (at least) 2. Relinearization is performed
these systems are intrinsically similar, there are diffe- when the size of the ciphertext increases substantially by
rences and trade-offs. Interested readers can refer to multiplication operations. Relinearization operation
[32] for more details. requires the evaluation key evk.
In this work, we used the FV cryptosystem (other
RLWE-based system will work in a similar manner), There is a recent application of homomorphic encryp-
which consists of the following functionalities: tion, which can securely perform genome search on a
semi-honest cloud server [33].
KeyGen (params): Given the system parameters
params as input, Keygen generates a public-private Ciphertext packing
key pair and an evaluation key (pk, sk, evk). The considerable computational overhead of homomorphic
Enc (pk, m): An encryption algorithm encrypts a encryption results from the large ciphertexts. As homo-
plaintext message m using the public key pk. morphic operations have to operate on these large cipher-
Dec (sk, c): Let, c be the encryption of a plaintext m. texts, they can be quite slow. The primary solution to deal
A decryption algorithm outputs m, given private key with this issue is to work with packed ciphertexts, which
sk and ciphertext c as input. refer to the ciphertexts that encrypt a vector of plaintext
Add(c1, c2): Let c1, c2 be the ciphertexts for messages values [34, 35]. Homomorphic operations can be performed
m1, m2 respectively. Given, c1, c2 as input, a on these vectors component-wise in a Single Instruction,
Sadat et al. BMC Medical Informatics and Decision Making (2019) 19:183 Page 5 of 10

Multiple Data (SIMD) manner. Depending on the memory known as hash value or hash) can be considered as
allowance, this mechanism can significantly boost the per- an unique representation of that message. In this
formance due to parallelization. work, we have used SHA-256, which is a member of
Consider the plaintext elements in a polynomial Secure Hash Algorithm (SHA) family. The length of
quotient ring m ∈ Rt = Zt/(Xn + 1) and ciphertext ele- message digest for SHA-256 is 256 bits [36]. Security
ments in Rq = Zq/(Xn + 1). Here, q and t are positive of hashing is discussed in detail in Security Analysis,
integers (q > t, q > 1, see [30]), Zq represents the set Security of Hashing.
of integers ð− q2 ; q2 , and Xn + 1 is an irreducible poly-
nomial of degree n. Using ciphertext packing, we can Detailed system protocol
encrypt n plaintext values in a single ciphertext for a At the system initialization phase, data owners receive
single instruction execution. public and private keys from the CSP. Also, the central
Since a packed ciphertext is essentially the same as server receives only the public key. Then, each data
a standard ciphertext, the basic homomorphic opera- owner sends the hashes of bigrams to the central server.
tions still work, for instance, homomorphic addition After receiving the hashes from each data owner, the
by adding ciphertexts. Ciphertext packing thus facili- central server computes the intersection of the hashes.
tates SIMD-type homomorphic computation, which is Then, the central server sends the elements of this inter-
capable of computing the same function over many section to data owners. Figure 2 shows the flow diagram
inputs at once. The usage of ciphertext packing in of our protocol.
our proposed framework is elaborated in Detailed Upon receiving the intersection of the hashes from
System Protocol. the central server, data owners encrypt the local
We apply ciphertext packing to minimize both com- frequency of the intersected bigrams by using the
putational and communication overhead. The data ciphertext packing technique. To do so, they follow
owners group their counts of bigrams into vectors of the order received from the central server. Figure 3
length n, encrypt them, and send Cardinality of Inter- illustrates this technique for a data owner and
section of Sets/ n ciphertexts to the central server (see indicates the difference with naive homomorphic en-
Detailed System Protocol). Then the packing mechan- cryption approach. After encrypting the counts, data
ism allows the central server to perform computation owners send the packed ciphertexts to the central ser-
on n items simultaneously, which results in n-fold im- ver, where the encrypted global frequency will be
provement in computation and communication both. computed.
In our case, n equals to 4096, which leads to a sig- After receiving the ciphertexts, the central server per-
nificant time cost reduction over the naive homo- forms homomorphic addition operation on these packed
morphic encryption method. ciphertexts. So, at the end of this addition process, the
resulting output looks like the table below. Here, E rep-
resents the encryption function.
Hash functions In Table 2, E(C11) denotes the encrypted count of
Hash functions are one of the fundamental crypto- bigram B1 contributed by data owner 1. E(C12) denotes
graphic primitives. Hash functions can compute a di- the encrypted count of B1 contributed by data owner 2,
gest of a given message, which is a fixed-length bit E(C13) denotes the encrypted count of B1 contributed
string. For a given message, the message digest (also by data owner 3, and so on.

Fig. 2 Flow diagram for the proposed system protocol. The order of the execution runs in a top down manner in key distribution and computation phases
Sadat et al. BMC Medical Informatics and Decision Making (2019) 19:183 Page 6 of 10

Fig. 3 Usage of ciphertext packing in our proposed method. Here, n is the degree of the polynomial, which indicates the number of slots for parallel computing

Now, we need to meet the thresholding requirement decrypting it, if a data owner gets a random negative
for the sum of homomorphically encrypted counts. For number (or zero), she will understand that the sum of
each of the records, we check the following inequality. counts of the corresponding record is less than (or
equal to) the threshold. Similarly, if a data owner gets a
E ðC11Þ þ E ðC12Þ þ E ðC13Þ þ ⋯ > threshold random positive number, she will understand that the
sum of counts of the corresponding record is greater
Solving this problem involves both addition and com-
than the threshold. Multiplying every coefficient of the
parison. It is known that in arithmetic circuits, addition
resulting ciphertext by same random number may ex-
is cheap but comparison is not trivial. To avoid the com-
pose some additional information about other data
parison operation in the arithmetic circuit, we formulate
owners’ counts. So, we multiply the resulting ciphertext
the problem in the following way,
with a random polynomial, all of whose coefficients are
E ðC11Þ þ E ðC12Þ þ E ðC13Þ þ ⋯−threshold randomly generated.
Although polynomial addition and subtraction are co-
After performing the above mentioned homomorphic efficient-wise by nature, polynomial multiplication in Rt
operation, the central server sends to the data owners (and Rq) is a convolution product of the coefficients. An
r*(E(C11) + E(C12) + E(C13) + ⋯ − threshold), where r is effective technique to transform convolution product into
a random number drew by the central server. After coefficient-wise product in polynomial ring is the Num-
ber-Theoretic Transform (NTT), a specialization of Fou-
rier transform for finite rings. One important property of
Table 2 Secure count aggregation at central server NTT is that it works in the same ring as lattice cryptosys-
Bigram Encrypted Global Frequency tems do. Therefore, NTT can be used to improve the effi-
B1 E(C11) + E(C12) + E(C13) + ⋯ ciency of the polynomial operations [37]. To ensure that
B2 E(C21) + E(C22) + E(C23) + ⋯ the products in the ciphertext space be translated into co-
B3 E(C31) + E(C32) + E(C33) + ⋯
efficient-wise products in plaintext space, we perform an
inverse-NTT operation to plaintext before encryption and
⋮ ⋮
a NTT operation after decryption.
Sadat et al. BMC Medical Informatics and Decision Making (2019) 19:183 Page 7 of 10

Results Experimental results

Experimental settings It is evident from the description of our proposed
Dataset method that the runtime mostly depends on the cardin-
We used the MIMIC-III (Medical Information Mart for ality of the intersection of the sets and the number of
Intensive Care), an openly available dataset comprising data owners. We evaluated our proposed method in
of de-identified health data associated with ~ 40 k critical terms of these two factors. Tables 3 and 4 show the ex-
care patients [38]. To be specific, we used NOTEE- perimental results. These tables report computation time
VENTS table of this database, which contains de-identi- for intersecting hashes, encryption, homomorphic oper-
fied clinical notes including nursing and physician ation, decryption, and network communication costs.
notes, and reports on ECG, radiology, and discharge However, the total time reported here does not include
summary. There are 2,083,180 rows in NOTEEVENTS cost for system initialization, for instance, reading and
table. parsing configuration file, reading input data file, TCP
socket setup and shutdown etc.
Dataset preprocessing
The text column of NOTEEVENTS table represents the Communication cost
contents of the clinical notes. At first, we removed the The total number of bigrams was about 15 million.
stop words from the entries of this column. We also re- These were equally distributed among three data owners
moved any standalone symbol/character, numerical values for the experiments shown in Table 2. Each data owner
including temporal expressions (e.g., 4:10 AM, 9:50 PM). was given 4 million bigrams along with common ones as
shown in the first column of Table 2. For five different
settings, the sizes of encrypted data for each data owner
Evaluation environment
were 46.3, 51, 55.6, 60.2, and 64.8 MB respectively. The
Experiments were performed on Google Compute Engine sizes of the files containing hashes (for each data owner)
(GCE) and Amazon EC2 cloud server. GCE is a cloud were 341, 351, 360, 370, and 379 MB respectively. For
computing service that provides virtual machines running the experiments shown in Table 3, bigrams were distrib-
in Google’s data centers. uted equally among the six data owners (3,518,464 each).
In GCE, we used a n1-standard-8 machine with Ubuntu The size of the encrypted data for each data owner was
16.04.3 LTS. For Amazon EC2, the configuration was 46.3 MB. The size of the file containing hash (for each
r3.xlarge with Ubuntu 16.04.2 LTS. The central server was data owner) was 218 MB.
hosted in Amazon EC2 and the CSP and the data owners
were hosted in GCE. Each entity of the system architec- Discussion
ture communicated with others through TCP (Transmis- Concept distribution analysis
sion Control Protocol). Now, we show that the proposed method is able to re-
tain enough information for data analysis. We compare
Implementation the concept distribution of clinical notes and sanitized
To hash the words, SHA-256 (OpenSSL version 1.0.2 g) sentence repository constructed by eliminating sentences
was used. To encrypt the bigram counts, we use FV of the clinical notes that contain low frequency bigrams
scheme [30]. For FV implementation, we choose NFLlib (frequency less than or equal to a specified threshold).
[39]. NFLlib [39] is an efficient and scalable C++ library Due to the significant computations involved, we
for ideal lattice cryptography. In our implementation, sampled 800 clinical notes for this experimentation. The
the computation and communication tasks are processed results of concept distribution analysis are reported in
in parallel whenever possible. We used OpenMP for this Table 5. Each concept is expressed as a Type Unique
purpose. An open-source implementation of our Identity (TUI) defined by UMLS [40]. The difference of
proposed framework is available at GitHub. the TUI distribution is not too large when the threshold

Table 3 Experimental results for different cardinality of intersection of sets. In the five different settings, cardinality is increased by
1% of the entire dataset. The number of data owners is a constant [3]. The numbers are in seconds
Cardinality of Intersection Intersecting Hashes (s) Encryption (s) Homomorphic Operation (s) Decryption (s) Network Comm. (s) Total Time (s)
1,515,520 (~ 10%) 4.63 8.11 55.43 6.73 0.48 75.38
1,667,072 (~ 11%) 4.69 8.92 61.19 7.06 0.52 82.38
1,818,624 (~ 12%) 4.98 9.70 66.63 7.88 0.54 89.73
1,970,176 (~ 13%) 5.07 10.97 72.21 8.49 0.59 97.33
2,121,728 (~ 14%) 5.20 11.32 77.65 9.34 0.60 104.11
Sadat et al. BMC Medical Informatics and Decision Making (2019) 19:183 Page 8 of 10

Table 4 Experimental results for different number of data owners. The cardinality of intersection of sets is fixed, which is 1,515,520.
The numbers are in term of seconds
Number of Data Owners Intersecting Hashes (s) Encryption (s) Homomorphic Operation (s) Decryption (s) Network Comm. (s) Total Time (s)
2 1.69 8.17 54.72 6.29 0.32 71.19
3 2.72 8.19 55.49 6.33 0.39 73.12
4 3.53 8.28 55.51 6.60 0.46 74.38
5 4.63 8.22 56.36 6.67 0.53 76.41
6 5.36 8.24 58.01 7.11 0.60 79.32

is small but it gets larger at an increasing threshold. distribution respectively. σ denotes the standard deviation
However, this is not a critical issue because we can of the error distribution, and ϵ is the attacker advantage.
maintain the original distribution by oversampling the For our experiments, we choose n = 212, q = 2120,
filtered corpus using sentences that contain one or more σ = 3, ϵ = 2−32. According to root-Hermit factor meas-
TUIs. This is a standard combinatorial optimization ure, our proposed method guarantees 142 bit security.
problem but we do not explore it in this paper.
Security of hashing
One of the primary security requirements of hash function
Security analysis is one-wayness: given a hash output h, it must be computa-
In this section, we analyze the security of our proposed tionally infeasible to find an input m such that h = H(m). In
framework. other words, given a message digest, an adversarial cannot
find out the matching message m from H−1(h) = m. There
exist some cryptanalytic attacks against one-way hashing
Security of encryption that try to break the security properties of the hash function.
To evaluate the security of a lattice cryptosystem, a Brute-force attack (also known as exhaustive search) is a
widely used measure is root-Hermite factor . Lindner type of cryptanalytic attack. Let (m, h) denote the pair of in-
and Peikert showed a mathematical relationship between put message and output hash value, and let M = {m1, m2, ..
root-Hermite factor and security level λ (in bits) [41]. …, mk} be the message space of all possible messages mi.
Such an attack checks for every element of M if H(mi) = =
h. If an equality holds, a possible input message is found.
This type of attack is impractical for a large message space.
A similar one is called dictionary attack, which tries all the
is given by, where c ≈ input messages in a pre-arranged listing, generally derived
qffiffiffiffiffiffiffiffiffiffiffiffiffi
ln ð1=ϵÞ pffiffiffiffiffiffi from a list of words such as in a dictionary (hence the term
Π and s ¼ σ 2Π . dictionary attack), which has a smaller space to search.
n, q, and s represent the degree of the polynomial ring, There is a variant of dictionary attack, known as Rainbow
ciphertext modulus, and scale parameter of the error table attack [42], which uses a precomputed table (rainbow
Table 5 Comparison of TUI Proportion Distribution
TUI Original Clinical Note Threshold = 1 Threshold = 2 Threshold = 4 Threshold = 8 Threshold = 16
T007 0.2627 0.2012 0.1601 0.1421 0.0922 0.0428
T023 5.8168 4.4492 3.5281 2.9490 2.5213 2.1758
T033 7.7646 5.3959 4.8470 3.6402 3.1259 2.5570
T047 7.6978 5.4338 4.8742 3.7598 3.3876 2.8825
T060 2.5509 1.8672 1.6446 1.4018 1.1242 0.9680
T074 1.5871 1.2046 1.0991 0.9302 0.8257 0.6724
T093 0.9824 0.7123 0.6594 0.5846 0.5197 0.4925
T109 4.1908 2.8163 2.7084 2.8069 2.6024 1.6447
T121 1.2840 0.8898 0.8983 0.7719 0.5971 0.6253
T170 0.7523 0.5182 0.4450 0.3165 0.2764 0.1284
T184 3.5566 2.4968 2.2498 1.8443 1.4265 0.6895
T201 1.8249 1.1075 0.9960 0.9173 0.8441 0.8437
Sadat et al. BMC Medical Informatics and Decision Making (2019) 19:183 Page 9 of 10

Table [42] that contains elements up to a certain length con- an efficient multi-party protocol for distributed data de-
sisting of a limited set of characters) for reversing hash func- identification. Experimental results show that our pro-
tions. This attack requires less computation time but more posed method can simultaneously guarantee data privacy
storage compared to brute-force attack. Addressing above and preserve data utility for analysis.
mentioned attacks, we used salt to randomize the hashing. To the best of our knowledge, this is one of the pio-
In cryptography, salt refers to random data that are used as neering privacy-preserving initiatives to de-identify clin-
an additional input to a hash function. Salt was generated by ical notes in a distributed environment. We have open
the CSP and provided to data owners before each hashing sourced our code in GitHub with a GNU general public
process, making these attacks computationally infeasible. license, along with a software manual for compiling and
Another desirable property of a hash function is colli- running it.
sion resistance. A hash function is said to be collision re-
sistant if it is computationally infeasible to find two Availability and requirements
different inputs m1 ≠ m2 with H(m1) = = H(m2). It seems Project name: A Privacy-preserving Distributed Filtering
if the hash function has an output length of b bits, we Framework for NLP Artifacts.
have to check about 2b messages. However, it turns out Project home page: https://github.com/Nazmus-Sadat/th_
that an attacker needs only about 2b/2 messages. This is mpsi
a quite surprising result, which is due to the birthday at- Operating system: Linux.
tack. This attack is based on the birthday paradox, which Programming language: C++.
is a powerful tool that is often used in cryptanalysis. License: GNU general public license.
Collision search for a hash function H() is exactly the
Abbreviations
same problem as finding birthday collisions among party CSP: Crypto Service Provider; EHR: Electronic Health Record; GCE: Google
attendees: how many people are required at a birthday Compute Engine; HIPAA: Health Insurance Portability and Accountability Act;
party such that there is a significant chance that at least IBF: Integrated Bloom Filter; MIMIC-III: Medical Information Mart for Intensive
Care; MPSI: Multi-party private set intersection; NER: Named Entity
two attendees have the same date of birth?. The question Recognition; NTT: Number-Theoretic Transform; PHI: Protected Health
is how many messages (m1, m2, ……, mk) does an attacker Information; SHA: Secure Hash Algorithm; SIMD: Single Instruction, Multiple
need to hash until he has a chance of finding H(mi) = = Data; SWHE: Somewhat Homomorphic Encryption; TCP: Transmission Control
Protocol; TUI: Type Unique Identity; UMLS: Unified Medical Language System
H(mj) for some mi and mj that he chooses. The most sig-
nificant consequence of the birthday attack is that the Acknowledgements
number of messages needed to hash to find a collision is Not applicable.
approximately equal to the square root of the number of
pffiffiffiffiffiffi Authors’ contributions
possible output values, i.e., about 2b ¼ 2b=2 . Hence, for All authors approved the final manuscript. MNS, MMA, and XJ designed the
a security level of u bit, the hash function needs to have method. MNS implemented the protocol and devised experiments. MNS and
XJ wrote the majority of the manuscript. NM, SP, HL, and XJ provided
an output length of 2u bit. In order to prevent collision detailed edits and critical suggestions.
attacks based on the birthday paradox, the output length
of a hash function must be at least 128 [36]. As mentioned Funding
This work was funded in part by NIBIB U01 EB023685, NSERC Discovery
previously, we are using SHA-256 in this work, which has Grants (RGPIN-2015-04147), NIH U01TR002062, and University Research
output length 256. Grants Program (URGP) from the University of Manitoba.
In 2004, collision-finding attacks against MD5 and Xiaoqian Jiang was supported in part by the CPRIT RR180012, UT Stars
award, the National Institute of Health (NIH) under award number
SHA-0 were demonstrated by Xiaoyun Wang [43]. One U01TR002062, R01GM114612, R01GM118574, R01GM124111.
year later, it was claimed that the attack could be ex-
tended to SHA-1 and a collision search would take 263 Availability of data and materials
The clinical notes used in the experiment are available from MIMIC-III (Med-
steps, which is considerably less than the 280, achieved ical Information Mart for Intensive Care), an openly available dataset [38].
by the birthday attack (the output width in this case is
160 bit). In this work, we are using SHA-2 (precisely, Ethics approval and consent to participate
Not applicable.
SHA-256) against which no attacks are known to date.
Consent for publication
Not applicable.
Conclusion
In this article, we proposed a novel protocol to achieve Competing interests
The authors declare that they have no competing interests.
the joint mission of private set intersection and secure
thresholding for a distributed data de-identification task. Author details
1
We extended a previous filtering-based method to cover Department of Computer Science, University of Manitoba, Winnipeg, MB
R3T 2N2, Canada. 2Department of Biomedical Informatics, University of
data from distributed sources and demonstrated the California San Diego, La Jolla, CA, USA. 3Department of Pharmaceutical Care
feasibility of using homomorphic encryption to develop & Health Systems, University of Minnesota, Minneapolis, MN, USA.
Sadat et al. BMC Medical Informatics and Decision Making (2019) 19:183 Page 10 of 10

4
Department of Health Sciences Research, Mayo Clinic College of Medicine, 24. Paillier P. Public-key cryptosystems based on composite degree residuosity
Rochester, MN, USA. 5School of Biomedical Informatics, University of Texas classes. Advances in cryptology—EUROCRYPT’99. Springer; 1999. pp. 223–238.
Health Science Center at Houston, Houston, TX, USA. 25. ElGamal T. A public key cryptosystem and a signature scheme based on
discrete logarithms. IEEE Trans Inf Theory IEEE. 1985;31:469–72.
Received: 2 December 2018 Accepted: 4 July 2019 26. Melchor CA, Barrier J, Fousse L. XPIR: Private information retrieval for
everyone. on Privacy Enhancing; 2016; Available: https://hal.archives-
ouvertes.fr/hal-01396142/. hal.archives-ouvertes.fr
27. Dowlin N, Gilad-Bachrach R, Laine K, Lauter K, Naehrig M, Wernsing J.
References Cryptonets: Applying neural networks to encrypted data with high
1. Demner-Fushman D, Chapman WW, McDonald CJ. What can natural throughput and accuracy: International Conference on Machine Learning
language processing do for clinical decision support? J Biomed Inform. ICML; 2016. p. 201–10.
2009;42:760–72. 28. Naehrig M, Lauter K, Vaikuntanathan V. Can homomorphic encryption be
2. Neamatullah I, Douglass MM, Lehman L-WH, Reisner A, Villarroel M, Long practical? Proceedings of the 3rd ACM workshop on Cloud computing
WJ, et al. Automated de-identification of free-text medical records. BMC security workshop: ACM; 2011. p. 113–24.
Med Inform Decis Mak. 2008;8:32. 29. Brakerski Z, Gentry C, Vaikuntanathan V. (Leveled) fully homomorphic
3. Douglass M, Clifford GD, Reisner A, Moody GB, Mark RG. Computer-assisted encryption without bootstrapping. Proceedings of the 3rd Innovations in
de-identification of free text in the MIMIC II database. Comput Cardiol. 2004; Theoretical Computer Science Conference on - ITCS ‘12. New York: ACM
2004:341–4. Press; 2012. pp. 309–325.
4. Beckwith BA, Mahaadevan R, Balis UJ, Kuo F. Development and evaluation 30. Fan J, Vercauteren F. Somewhat Practical Fully Homomorphic Encryption.
of an open source software tool for deidentification of pathology reports. IACR Cryptology ePrint Archive. 2012;2012:144.
BMC Med Inform Decis Mak. 2006;6:12. 31. Bos JW, Lauter KE, Loftus J, Naehrig M. Improved Security for a Ring-Based Fully
5. Berman JJ. Concept-match medical data scrubbing. How pathology text can Homomorphic Encryption Scheme: IMA Int Conf. Springer; 2013. p. 45–64.
be used in research. Arch Pathol Lab Med. 2003;127:680–6. 32. Acar A, Aksu H, Selcuk Uluagac A, Conti M. A Survey on Homomorphic
6. Finley GP, Pakhomov SVS, Melton GB. Automated De-Identification of Encryption Schemes: Theory and Implementation. arXiv. 2017; Available:
Distributional Semantic Models: AMIA Annual Symposium; 2016. http://arxiv.org/abs/1704.03578. Accessed 21 Jan 2018
7. Sweeney L. Replacing personally-identifying information in medical records, 33. Zhou TP, Li NB, Yang XY, Lv LQ, Ding YT, Wang XA. Secure Testing for
the scrub system. Proc AMIA Annu Fall Symp. 1996:333–7. Genetic Diseases on Encrypted Genomes with Homomorphic Encryption
8. Sweeney L. Guaranteeing anonymity when sharing medical data, the Scheme Secur Commun Netw. 2018. pp. 1–12. doi:https://doi.org/10.1155/2
Datafly system. Proc AMIA Annu Fall Symp. 1997:51–5. 018/4635715
9. Meystre SM, Friedlin FJ, South BR, Shen S, Samore MH. Automatic de- 34. Smart NP, Vercauteren F. Fully homomorphic SIMD operations. Des Codes
identification of textual documents in the electronic health record: a review Cryptogr Springer US. 2014;71:57–81.
of recent research. BMC Med Res Methodol. 2010;10:70. 35. Brakerski Z, Gentry C, Halevi S. Packed Ciphertexts in LWE-Based
10. Szarvas G, Farkas R, Busa-Fekete R. State-of-the-art anonymization of medical Homomorphic Encryption. Public-Key Cryptography – PKC 2013. Berlin:
records using an iterative machine learning framework. J Am Med Inform Springer; 2013. p. 1–13.
Assoc. 2007;14:574–580. 36. Paar C, Pelzl J. Understanding Cryptography: A Textbook for Students and
11. Guo Y, Gaizauskas R. Identifying personal health information using support Practitioners: Springer Science & Business Media; 2009.
vector machines. i2b2 workshop on łdots. 2006; Available: ftp://ftp.dcs.shef. 37. Chen DD, Mentens N, Vercauteren F, Roy SS, Cheung RCC, Pao D, et al.
ac.uk/home/robertg/papers/amia06-deident.pdf High-speed polynomial multiplication architecture for ring-LWE and SHE
12. Gardner J, Xiong L. HIDE: An Integrated System for Health Information DE- cryptosystems. IEEE Trans Circuits Syst I Regul Pap. 2015;62:157–66.
identification: EDBT. IEEE; 2008. p. 254–9. 38. Johnson AEW, Pollard TJ, Shen L, Lehman L-WH, Feng M, Ghassemi M, et al.
13. Wellner B, Huyck M, Mardis S, Aberdeen J, Morgan A, Peshkin L, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3:160035.
Rapidly retargetable approaches to de-identification in medical records. J 39. Aguilar-Melchor C, Barrier J, Guelton S, Guinet A, Killijian M-O, Lepoint T.
Am Med Inform Assoc. 2007;14:564–73. NFLlib: NTT-Based Fast Lattice Library. Topics in Cryptology - CT-RSA 2016.
14. Dernoncourt F, Lee JY, Uzuner O, Szolovits P. De-identification of patient notes Cham: Springer; 2016. p. 341–56.
with recurrent neural networks. J Am Med Inform Assoc. 2017;24:596–606. 40. Volk M, Ripplinger B, Vintar S, Buitelaar P, Raileanu D, Sacaleanu B. Semantic
15. Li D, Rastegar-Mojarad M, Elayavilli RK, Wang Y, Mehrabi S, Yu Y, et al. A annotation for concept-based cross-language medical information retrieval.
frequency-filtering strategy of obtaining PHI-free sentences from clinical Int J Med Inform. 2002;67:97–112.
data repository. Proceedings of the 6th ACM Conference on Bioinformatics, 41. Lindner R, Peikert C. Better key sizes (and attacks) for LWE-baAvailable:sed
Computational Biology and Health Informatics. ACM; 2015. pp. 315–324. encryption. CT-RSA: Springer; 2011. http://link.springer.com/content/pdf/10.1
16. Wang XA, Xhafa F, Luo X, Zhang S, Ding Y. A privacy-preserving fuzzy interest 007/978-3-642-19074-2.pdf#page=330
matching protocol for friends finding in social networks. Soft Computing. 2018. 42. Oechslin P. Making a Faster Cryptanalytic Time-Memory Trade-Off. Advances
pp. 2517–2526. doi: https://doi.org/10.1007/s00500-017-2506-x in Cryptology - CRYPTO 2003. Berlin: Springer; 2003. p. 617–30.
17. Chen H, Laine K, Rindal P. Fast Private Set Intersection from Homomorphic 43. Wang X, Feng D, Lai X, Yu H. Collisions for hash functions MD4, MD5,
Encryption. Proceedings of the 2017 ACM SIGSAC Conference on Computer HAVAL-128 and RIPEMD. IACR Cryptology ePrint Archive. 2004;2004:199.
and Communications Security - CCS ‘17; 2017. https://doi.org/10.1145/3133
956.3134061.
18. Kissner L, Song - Crypto D. Privacy-preserving set operations, vol. 2005: Publisher’s Note
Springer; 2005. Available: http://link.springer.com/content/pdf/10.1 Springer Nature remains neutral with regard to jurisdictional claims in
007/11535218.pdf#page=251 published maps and institutional affiliations.
19. Egert R, Fischlin M, Gens D, Jacob S, Senker M, Tillmanns J. Privately
Computing Set-Union and Set-Intersection Cardinality via Bloom Filters.
Information Security and Privacy. Springer, Cham; 2015. pp. 413–430.
20. Miyaji A, Nakasho K, Nishida S. Privacy-Preserving Integration of Medical
Data. J Med Syst. Springer US. 2017;41:37.
21. Nikolaenko V, Weinsberg U, Ioannidis S, Joye M, Boneh D, Taft N. Privacy-
preserving ridge regression on hundreds of millions of records. Security and
Privacy (SP), 2013 IEEE Symposium on. IEEE; 2013. p. 334–48.
22. Sadat MN, Aziz MMA, Mohammed N, Chen F, Jiang X, Wang S. SAFETY: secure
gwAs in federated environment through a hYbrid solution. IEEE/ACM Trans
Comput Biol Bioinform. 2018. https://doi.org/10.1109/TCBB.2018.2829760.
23. Rivest RL, Adleman L, Dertouzos ML. On data banks and privacy
homomorphisms. Foundations of secure computation. 1978;4:169–80.

Writing PHD Thesis Latex
100% (3)
Writing PHD Thesis Latex
4 pages
MIMIC Database
No ratings yet
MIMIC Database
7 pages
IOT - Based Progonastics and Systems Health Management For Industrial Applications
No ratings yet
IOT - Based Progonastics and Systems Health Management For Industrial Applications
66 pages
Btec HND Unit 5 Security New PDF
100% (1)
Btec HND Unit 5 Security New PDF
46 pages
Research Paper On Zomato
50% (2)
Research Paper On Zomato
3 pages
Problem Based Task Dfc2043
100% (2)
Problem Based Task Dfc2043
3 pages
(2303.11032) DeID-GPT: Zero-Shot Medical Text De-Identification by GPT-4
No ratings yet
(2303.11032) DeID-GPT: Zero-Shot Medical Text De-Identification by GPT-4
53 pages
LLMs-In-The-Loop Part 2 Expert Small AI Models For
No ratings yet
LLMs-In-The-Loop Part 2 Expert Small AI Models For
21 pages
A Secure Database System Using Homomorphic
No ratings yet
A Secure Database System Using Homomorphic
48 pages
MedShare A Privacy-Preserving Medical Data Sharing System by Using Blockchain
No ratings yet
MedShare A Privacy-Preserving Medical Data Sharing System by Using Blockchain
14 pages
Tve 12 - CSS 1ST Semester Midterm Module 1 (Fernandez)
No ratings yet
Tve 12 - CSS 1ST Semester Midterm Module 1 (Fernandez)
13 pages
Group3 Robotics
No ratings yet
Group3 Robotics
33 pages
Anonymization of Electronic Medical Records To Support Clinical Analysis (PDFDrive)
No ratings yet
Anonymization of Electronic Medical Records To Support Clinical Analysis (PDFDrive)
86 pages
Ahb7016t LM
No ratings yet
Ahb7016t LM
3 pages
Towards Secure and Privacy-Preserving Data Sharing For COVID-19 Medical Records A Blockchain-Empowered Approach
No ratings yet
Towards Secure and Privacy-Preserving Data Sharing For COVID-19 Medical Records A Blockchain-Empowered Approach
11 pages
Secure Internet of Medical Things Based Electronic He - 2024 - International Jou
No ratings yet
Secure Internet of Medical Things Based Electronic He - 2024 - International Jou
14 pages
20mis0079 VL2023240104199 Pe003
No ratings yet
20mis0079 VL2023240104199 Pe003
44 pages
Experimental Evaluation of Bidirectional Encoder Representations From Transformers Models For De-Identification of Clinical Document Images
No ratings yet
Experimental Evaluation of Bidirectional Encoder Representations From Transformers Models For De-Identification of Clinical Document Images
8 pages
SE Course Pack Final
No ratings yet
SE Course Pack Final
220 pages
Computer SSC CGL 2022 Tier II Paper I - RBE - Compressed
No ratings yet
Computer SSC CGL 2022 Tier II Paper I - RBE - Compressed
17 pages
MedLock: Securing Historical and Future Medical Records Using SHA-3-512 and AES-256 Against Collision Probability and Quantum Computer Brute Force Attacks
No ratings yet
MedLock: Securing Historical and Future Medical Records Using SHA-3-512 and AES-256 Against Collision Probability and Quantum Computer Brute Force Attacks
20 pages
2020 A Privacy-Preserving Healthcare Framework Using Hyperledger Fabric
No ratings yet
2020 A Privacy-Preserving Healthcare Framework Using Hyperledger Fabric
16 pages
VSD 2022 HW1 Explanation
No ratings yet
VSD 2022 HW1 Explanation
26 pages
Lighttweight Policy Updatescheme
No ratings yet
Lighttweight Policy Updatescheme
17 pages
Privacy Preserving Attribute-Focused Anonymization Scheme For Healthcare Data Publishing
No ratings yet
Privacy Preserving Attribute-Focused Anonymization Scheme For Healthcare Data Publishing
19 pages
Lab 3
No ratings yet
Lab 3
4 pages
A Privacy-Preserving Scheme For Managing Secure Data in Healthcare System
No ratings yet
A Privacy-Preserving Scheme For Managing Secure Data in Healthcare System
13 pages
6 Privacy Preservation in Healthcare Systems
No ratings yet
6 Privacy Preservation in Healthcare Systems
6 pages
SMCQL: Privacy-Preserving Querying For Federated Databases
No ratings yet
SMCQL: Privacy-Preserving Querying For Federated Databases
39 pages
DE-Identification of Protected Health Information PHI From Free Text in Medical Records
No ratings yet
DE-Identification of Protected Health Information PHI From Free Text in Medical Records
11 pages
Ijpds 08 2153
No ratings yet
Ijpds 08 2153
12 pages
Paper 4
No ratings yet
Paper 4
13 pages
DSAS A Secure Data Sharing and Authorized
No ratings yet
DSAS A Secure Data Sharing and Authorized
18 pages
Dernoncourt Et Al. - 2016 - De-Identification of Patient Notes With Recurrent
No ratings yet
Dernoncourt Et Al. - 2016 - De-Identification of Patient Notes With Recurrent
11 pages
Privacy in Electronic Health Records: A Systematic Mapping Study
No ratings yet
Privacy in Electronic Health Records: A Systematic Mapping Study
20 pages
Xu 等 - 2023 - A Privacy-Preserving Medical Data Sharing Scheme B
No ratings yet
Xu 等 - 2023 - A Privacy-Preserving Medical Data Sharing Scheme B
12 pages
Johnson 2020
No ratings yet
Johnson 2020
8 pages
2023 Article 771
No ratings yet
2023 Article 771
10 pages
Patient Privacy
No ratings yet
Patient Privacy
18 pages
Forward Privacy Preservation in IoT-Enabled Healthcare Systems
No ratings yet
Forward Privacy Preservation in IoT-Enabled Healthcare Systems
9 pages
Between Access and Privacy Challenges in Sharing H
No ratings yet
Between Access and Privacy Challenges in Sharing H
5 pages
Enhancing Security and Privacy Preservation of Sensitive Information in E-Health Datasets Using FCA Approach
No ratings yet
Enhancing Security and Privacy Preservation of Sensitive Information in E-Health Datasets Using FCA Approach
14 pages
E Chain Blockchain System
No ratings yet
E Chain Blockchain System
11 pages
Sensitive Data Hiding
No ratings yet
Sensitive Data Hiding
6 pages
Sensitive Data Hiding
No ratings yet
Sensitive Data Hiding
6 pages
NIST - Ir.8053 De-Identification PI
No ratings yet
NIST - Ir.8053 De-Identification PI
54 pages
Maintaining Integrity and Confidentiality of Patients' Records Using An Enhanced Security Technique
No ratings yet
Maintaining Integrity and Confidentiality of Patients' Records Using An Enhanced Security Technique
7 pages
Beyond Accuracy: Automated De-Identification of Large Real-World Clinical Text Datasets
No ratings yet
Beyond Accuracy: Automated De-Identification of Large Real-World Clinical Text Datasets
13 pages
Empowering Healthcare: A Blockchain-Based Secure and Decentralized Data Sharing Scheme With Searchable Encryption
No ratings yet
Empowering Healthcare: A Blockchain-Based Secure and Decentralized Data Sharing Scheme With Searchable Encryption
9 pages
Paper 007
No ratings yet
Paper 007
11 pages
Expert Systems - 2023 - Gopalakrishnan - PriMed Private Federated Training and Encrypted Inference On Medical Images in
No ratings yet
Expert Systems - 2023 - Gopalakrishnan - PriMed Private Federated Training and Encrypted Inference On Medical Images in
14 pages
Principle-Based Approach For The De-Identification of Code-Mixed Electronic Health Records
No ratings yet
Principle-Based Approach For The De-Identification of Code-Mixed Electronic Health Records
11 pages
Dkvm-8E: 8-Port Keyboard, Video, and Mouse Switch
No ratings yet
Dkvm-8E: 8-Port Keyboard, Video, and Mouse Switch
30 pages
A Certified
No ratings yet
A Certified
9 pages
ICETIS 2022 Paper 96
No ratings yet
ICETIS 2022 Paper 96
6 pages
ADC0831/ADC0832/ADC0834 and ADC0838 8-Bit Serial I/O A/D Converters With Multiplexer Options
No ratings yet
ADC0831/ADC0832/ADC0834 and ADC0838 8-Bit Serial I/O A/D Converters With Multiplexer Options
33 pages
Akash 1
No ratings yet
Akash 1
13 pages
Business Profile - Elecsoft
No ratings yet
Business Profile - Elecsoft
5 pages
(2020) Liquid Case Study - Animation Studio Unlocks VDI Performance and Efficiency With Liquid (Liquid)
No ratings yet
(2020) Liquid Case Study - Animation Studio Unlocks VDI Performance and Efficiency With Liquid (Liquid)
7 pages
Privacy Preserving Classification of Clinical Data Using Homomorphic Encryption IJERTCONV3IS12027
No ratings yet
Privacy Preserving Classification of Clinical Data Using Homomorphic Encryption IJERTCONV3IS12027
6 pages
CD/DPF-R Series: Instruction Manual
No ratings yet
CD/DPF-R Series: Instruction Manual
28 pages
Topic 3 - Java Data Types and Variables
No ratings yet
Topic 3 - Java Data Types and Variables
19 pages
Questions
No ratings yet
Questions
6 pages
Encryption Techniques To Protect The Patient Privacy in Health Care
No ratings yet
Encryption Techniques To Protect The Patient Privacy in Health Care
11 pages
Visual Media Portfolio: Breanne Huber
No ratings yet
Visual Media Portfolio: Breanne Huber
18 pages
Segment 11
No ratings yet
Segment 11
4 pages
I.C.T 2020
No ratings yet
I.C.T 2020
16 pages
GSM and Vas
No ratings yet
GSM and Vas
27 pages
ABAP Performance Tuning
No ratings yet
ABAP Performance Tuning
40 pages
LTE Frequency Bands
No ratings yet
LTE Frequency Bands
6 pages
Security Issues in Clinical Informatics
No ratings yet
Security Issues in Clinical Informatics
15 pages
On The Use of Cloud Computing For Scientific Workflows
No ratings yet
On The Use of Cloud Computing For Scientific Workflows
12 pages
Distributed Health Records, Cryptographic Pseudonyms, and Privacy
No ratings yet
Distributed Health Records, Cryptographic Pseudonyms, and Privacy
16 pages
An Iterative Classification Scheme
No ratings yet
An Iterative Classification Scheme
6 pages
Wavelet-Based ECG Steganography For Protecting Patient Confidential Information in Point-of-Care Systems
No ratings yet
Wavelet-Based ECG Steganography For Protecting Patient Confidential Information in Point-of-Care Systems
9 pages
CSE 373: Practice Midterm 2: I Followed The University's Honor Code
No ratings yet
CSE 373: Practice Midterm 2: I Followed The University's Honor Code
5 pages
Real-Time De-Identification of Healthcare Data Using Ephemeral Pseudonyms
No ratings yet
Real-Time De-Identification of Healthcare Data Using Ephemeral Pseudonyms
5 pages
Mining and Classifying Medical Documents
No ratings yet
Mining and Classifying Medical Documents
4 pages
Privacy Protection For Wireless Medical Sensor Data
No ratings yet
Privacy Protection For Wireless Medical Sensor Data
6 pages
Analizadores de Presion de Vapor Analizador RVP PDF
No ratings yet
Analizadores de Presion de Vapor Analizador RVP PDF
6 pages
Jun 12 Ijcoa 001
No ratings yet
Jun 12 Ijcoa 001
6 pages
R - S - ALR - 87013181 Material Ledger Data Over Several Periods
No ratings yet
R - S - ALR - 87013181 Material Ledger Data Over Several Periods
9 pages
Review On Health Care Database Mining in Outsourced Database
No ratings yet
Review On Health Care Database Mining in Outsourced Database
4 pages
RFC To Webservices Sap Technical
No ratings yet
RFC To Webservices Sap Technical
12 pages
SCHS: Secured Cloud-Assisted E-Healthcare System
No ratings yet
SCHS: Secured Cloud-Assisted E-Healthcare System
6 pages
SSD 9971
No ratings yet
SSD 9971
4 pages
16-2 p30 Mapping of j1939 To Can FD Cia602 Zeltwanger
No ratings yet
16-2 p30 Mapping of j1939 To Can FD Cia602 Zeltwanger
2 pages
De-Identification Algorithm For Free-Text Nursing Notes
No ratings yet
De-Identification Algorithm For Free-Text Nursing Notes
4 pages
Juniper SA 700 Datasheet
No ratings yet
Juniper SA 700 Datasheet
4 pages

A Privacy Preserving Distributed Filtering Framework For NLP 30r6g0qti3

Uploaded by

A Privacy Preserving Distributed Filtering Framework For NLP 30r6g0qti3

Uploaded by

Sadat et al.

BMC Medical Informatics and Decision Making (2019) 19:183

SOFTWARE Open Access

A privacy-preserving distributed filtering

Background Health Information (PHI) defined in the HIPAA safe

uncommon and low-frequency bigrams, which Threat model

 Data owner: Data owners might be any hospital, Preliminaries

Results Experimental results

You might also like

Data owner: Data owners might be any hospital, Preliminaries