Database Security - What Students Need To Know
Database Security - What Students Need To Know
Ueli Maurer
Department of Computer Science
ETH Zurich
CH-8092 Zurich, Switzerland
[email protected]
1.3 Outline
1. INTRODUCTION In Section 2 we briefly discuss information security from
a very high-level perspective and compare the design of se-
1.1 Scope of this Article cure systems with the design of correct systems. We also
Classical database security (e.g. see [3]) relies on many distinguish between unilateral and multilateral security. In
different mechanisms and techniques, including access con- Sections 3 and 4 we discuss unilateral database security,
trol, information flow control, operating system and network first assuming the database, then the user to be trustwor-
security, prevention of statistical inference, data and user au- thy, where protection must be achieved against the other
thentication, encryption, time-stamping, digital signatures, party’s potential cheating. In Section 2 we sketch the bi-
and other cryptographic mechanisms and protocols. lateral security problem where both the database must be
It seems desirable to develop a systematic understand- protected against malicious users, and vice versa. A general
ing of database security problems and their solutions and paradigm for building multilaterally secure systems, called
to come up with a framework. Ideally, such a framework secure multi-party computation, is reviewed in Section 6.
should give some assurance that all relevant security prob- Section 7 briefly discusses the secure aggregation of several
lems have been addressed, and it can possibly point out new databases as an application of secure multi-party computa-
security issues not previously considered. It is a goal of this tion.
extended abstract and the corresponding talk to contribute
to developing such a framework and identifying new research 2. INFORMATION SECURITY
directions for fruitful collaborations of the database, the in-
One of the main paradigm shifts of the emerging infor-
mation society is that information is becoming a crucial if
not the most important resource. Information differs radi-
cally from other resources; for instance and it can be copied
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are without cost, it can be erased without leaving traces. Pro-
not made or distributed for profit or commercial advantage and that copies tecting the new resource information is a major issue in the
bear this notice and the full citation on the first page. To copy otherwise, to information economy.
republish, to post on servers or to redistribute to lists, requires prior specific 1
permission and/or a fee. We use the term “database” in the most general sense, in-
SIGMOD 2004, June 13–18, 2004, Paris, France. cluding more general information systems than simple tra-
Copyright 2004 ACM 1-58113-859-8/04/06 ...$5.00. ditional databases.
Information security has proven to be a notoriously diffi- software on his machine. However, this achieves more than
cult topic. To understand and define security one must have the specification, in an undesirable way: The user cannot
a clear understanding of what a system is supposed to ac- only apply the functionality to his data, he can pass this ca-
complish and in which ways a potential attacker can try to pability on to other parties (software piracy). This problem
prevent the system from operating correctly. In this section is sometimes addressed by extra hardware mechanisms, but
we address these issues at a very general level. It serves as usually it is addressed only by legal deterrence, which means
a basis to address database security in the later sections. that from a purely technical viewpoint (which we take in this
paper) the user is assumed to be trusted.
2.1 Unilateral Security There is a dual (for now quite theoretical) solution to this
In many security-relevant applications, security is seen as problem. Instead of having the vendor send the software to
a unilateral problem: Some system (or entity, or collection the user, the user could send his data to the vendor for pro-
of entities) must be protected against a malicious outsider, cessing. More precisely, the vendor could run a web-service
often called an attacker. The system is secure if no attacker which manages, stores, and processes a customer’s data. In
(with certain capabilities) can cause any (significant) devi- this case, the customer need not receive the software but
ation of the system from the specified behavior. This in- must of course fully trust the service provider.
cludes, for example, that the attacker cannot extract secret Ideally, one would like to solve the software piracy prob-
information. lem in a fair and symmetric manner such that neither the
In order to define security, one must hence define the sys- vendor must send the software to the user nor must the user
tem specification, i.e., what the system is supposed to do send the data to the vendor. This is a typical example of
under normal circumstances, as well as the adversary’s ca- a specification which could easily be solved if a mutually
pabilities. Such a specification of capabilities can include trusted party were available. This party could obtain the
the available computing power, access to side information, software and the user’s data and perform all operations for
etc. the user, giving only him access to the result of the com-
Typical examples of unilateral security problems are the putation. We will discuss in Section 6 that many security
protection of a computer system by security mechanisms of solutions can be seen as the simulation of a virtual trusted
the operating system, as well as the protection of an orga- party by the actual involved parties.
nization’s internal network against hackers, for instance by Another example of multi-lateral security, involving three
firewalls and intrusion detection technology. entities, are on-line auctions. The auctioneer, the bidder,
Another, perhaps less obvious example of unilateral se- and the party offering an object need to be protected against
curity is the protection of the communication between two possible fraud by another party (and, of course, also against
parties against an eavesdropper, for example by encryption. external attackers). An even higher level of security is ach-
Although both parties must be protected, the situation is ieved if each party is protected against the other two parties
nevertheless a unilateral one because the required protection cheating collectively with a joint strategy.
of the two parties is (jointly) against an external eavesdrop- Yet another example of multilateral security is e-voting
per, not against each other. discussed below.
Database security is often seen as a unilateral security
problem: The database system must be protected against 2.3 Defining Security
outsiders and possibly also against potentially malicious users, As mentioned above, in order to define security one must
but is itself assumed to be trustworthy.2 define the system specification, i.e., what the system is sup-
2.2 Multilateral Security posed to do under normal circumstances, as well as the ad-
versary’s capabilities. But how should one model the ad-
In contrast to unilateral security, many security-relevant versary if one wants to achieve security simultaneously for
applications require the protection of several parties, each different groups of potential cheaters?
against the potential misbehavior of some other parties, pos- Let us address the system specification first. In the above
sibly against all other parties. software example, the specification is simply that the user
A simple example of bilateral security are on-line trans- should obtain the result of applying the software to his data.
actions where both the customer and the vendor want to However, both discussed implementations achieve more, ac-
be protected against malicious behavior by the other. In tually too much from a security point of view. The speci-
practice, such bilateral security issues are often not really fication can be seen as an idealized system which performs
addressed and instead “solved” by assuming that one of the exactly (and only) the desired operations.
parties (e.g. the vendor) is trustworthy. As another example, a secure communication channel is
Another such example, which needs some more explana- an idealized system with three parties, the sender, the re-
tion, is the classical software piracy problem. The software ceiver, and the eavesdropper. The specification is such that
vendor has developed some useful functionality (e.g. a col- the sender can choose a message, the receiver gets it, and
lection of statistical tools), while the customer wants to ap- the eavesdropper gets nothing. Encryption is a secure im-
ply the functionality to his data. The (idealized) specifica- plementation of this specification if one can argue that the
tion is that a customer who pays is allowed to use the func- ciphertext seen by the eavesdropper gives no information (in
tionality. This specification is implemented by the vendor a computational sense) about the message.
by encoding the functionality in a software package, sending For e-voting the specification is that each voter can choose
the software to the customer, and the customer running the his vote and that all voters should learn the correct outcome
2
Note that one need not distinguish between users and out- of the vote evaluated according to the rules. The fact that
siders. The outsider could be defined as a user with no votes need to be communicated is not part of the actual
privileges. specification, but of course some form of communication is
unavoidable. Like many other specifications, the e-voting 3. UNILATERAL DATABASE SECURITY
specification can be easily implemented using a trusted party As mentioned before, in a traditional model of database
(the voting authority) who receives (securely) all votes from security, the database is (usually implicitly) assumed to be
the voters, counts them, and announces the result. But it trustworthy, while outsiders and possibly also the users are
is doubtful whether in practice such a trusted system, se- considered to be potentially malicious.
cure beyond any doubt (also in case of a highly unexpected It is unnecessary to give a precise specification of a database
outcome), can be implemented. Secure multi-party com- system. It suffices to consider a general system with a state
putation discussed in Section 6 allows to simulate such a space Σ and a set Q of operations (queries). Each query
trusted party. q ∈ Q is specified by a state update function fq : Σ → Σ
In a general setting with a set P of parties, the security and an output function gq : Σ → B, where B is the range
requirements must specify for which sets S ⊆ P of parties of all possible replies a query can produce. There is a set U
their collective cheating must be tolerated, meaning that of users, and each user u ∈ U has certain privileges, i.e., is
for the remaining parties (P \ S) the specification is still allowed to perform a certain subset Qu ⊆ Q of queries.3 In
achieved. A security requirement involves a whole collection this abstract view, also the privilege management subsys-
∆ = {S1 , S2 , . . . , Sk } of such sets, which typically overlap. tem and the logging subsystem are seen as components of
In case of unilateral security, where a system S must be the database. Any query that is logged hence also changes
protected, S is not contained in any of the Si . In a sense, the state, even if it does not modify actual data.
unilateral security can be seen as a setting where the union We briefly discuss the most important security techniques
of the sets in ∆ is strictly smaller than the complete set P relevant in a unilateral context. But we point out again
of parties. that the unilateral database security problems can be seen
One can model such a security requirement by assuming a as problems of the correct implementation of a specification
central adversary who can choose one set S ∈ ∆ and corrupt rather than an actual security problem, although this view
the players in that set, where corruption means that the is quite unconventional.
adversary takes full control of these players. Security means
that for the remaining parties there is no essential difference 3.1 User Authentication and Secure Commu-
whether or not the adversary is present. More precisely, for nication
any set S ∈ ∆ corrupted by the adversary, the security for The most basic unilateral security problem is to estab-
the remaining parties is guaranteed. To make this more lish a secure connection between any user and the database.
formal is beyond the scope of this extended abstract. The term “secure connection” captures both secrecy and au-
When there are many parties (say n), a typical setting is thentication of the communication. Also user authentication
that one wants to tolerate the cheating of any t parties, for can be viewed as being implied by the authentication of the
some t < n. In this case, ∆ = {S ⊆ P : |S| ≤ t}. connection. Of course, a concrete protocol for establishing
a secure connection might involve subprotocols at different
2.4 Correctness vs. Unilateral Security layers of the communication stack, and a user authentication
Let us briefly compare the two problems of constructing a step may be involved.4 If one assumes a public-key infras-
correct system and constructing a (unilaterally) secure sys- tructure (PKI) to be in place, then establishing secure con-
tem to see that correctness and unilateral security are con- nections can be achieved by standard cryptographic mech-
ceptually the same. anisms and protocols. We refer to [10, 12, 8] for a general
A system is a correct implementation of a specification if discussion of cryptography.
it behaves as specified. Correctness is relative to a specifi-
cation defining the desired functionality. Ideally, a specifi- 3.2 System and Network Security
cation defines all interfaces to the system (i.e. all methods Like any information system, a database system must run
for interacting with the system) and the complete input- on a clean operating system and trustworthy hardware, and
output behavior (for all possible parameter ranges and for it must be protected against attacks over the network. But
all possible ways of interacting with the system.) both these issues, while affecting a database’s security, are
Similarly, a system is secure if it behaves as specified, not database security issues and should probably not be con-
even in presence of an adversary with certain well-defined sidered as such.
capabilities. Like correctness, security is relative to a spec-
ification (which defines what security in the given context 3.3 Access Control
means.) Again, a specification defines all interfaces to the Access control is the most classical database security topic.
system (i.e., all methods for interacting with the system) The access control system is the database component that
and the complete input-output behavior. But in contrast to checks all database requests and grants or denies a user’s re-
correctness, one also considers an adversary with a certain quest based on his or her privileges. (Here we assume that
interface to the system, and the specification must be met the user has been authenticated.)
for all admissible adversary strategies. Research in access control is concerned with developing
From such a high-level point of view, correctness and uni- policies and languages for specifying privileges, and with
lateral security are essentially the same. In both cases, the 3
system must behave according to the specification, where in There are usually different categories of users, including
case of correctness one quantifies over all parameter ranges system administrators, different types of internal users, and
possibly a category of (unspecified) external users with re-
etc., and in the case of security one quantifies over all ad- stricted privileges.
versary strategies. This is why multilateral security is per- 4
For example, one could establish an SSL connection (with-
haps more fascinating (at least as a research topic) and more out client authentication) between client and server and then
paradoxical than unilateral security. authenticate the user at the application level.
software components implementing a given policy. A sub- owner to send the entire database to the user so that the
tle aspect of access control is that rights can be seen like user could evaluate the query himself (and neglect the rest
any other data item. This also includes the right to grant of the data). This shows that PIR is possible in principle.
rights, which is potentially recursive. In many commercial PIR takes place in a setting where the user is trusted, i.e.,
databases, however, access control is quite simple. Access there is no information to be hidden from the user. The goal
control can also be content-based, meaning that the decision of PIR protocols is to reduce the necessary communication.
is based not only on which data records are requested, but We do not discuss the known results and protocols.
also based on their content (e.g. based on keywords), or it
can be based on the history of the user’s previous requests. 4.3 Computing with Encrypted Data
Let us now address the problem of keeping the data stored
3.4 Preventing Inference in the database (not the query) secret.
The access control problem becomes even more subtle Encryption is the usual technique used to protect the con-
when the possibility of inference is taken into account, i.e., fidentiality of data. If the users encrypt the information
if one is concerned that a user might, from a set of legiti- stored in a database in order to prevent the database from
mate queries, be able to infer further information he is not seeing the data, then the database queries are restricted to
supposed to obtain. A well-known example is statistical in- simple storage and retrieval operations, which is not very
ference, where several statistical queries can be combined useful.
to obtain information about individual entries, by carefully When the database is supposed to answer queries involv-
specifying the populations for the individual queries. ing several encrypted fields, it seems that it must decrypt
This problem has no clean solution since it is, ultimately, the data before evaluating the query. However, a technique
an artificial intelligence problem. If one sees access control called computing with encrypted data allows to solve this
as a database specification (not a security) problem, the problem. The idea is that every data unit (e.g. every bit) is
inference problem illustrates that a complete specification stored in encrypted form, where the key is not known to the
of a database’s functionality is highly involved and probably database. The database performs the logical bit-operations
impossible. specified by the query on the encrypted bits, thereby ob-
taining the encrypted result of the query, which it returns.
4. UNILATERAL SECURITY FOR THE USER Current solutions to this require substantial communica-
tion for every logical gate to be evaluated, but if one could
4.1 The Problems find a probabilistic bit-encryption scheme that is homomor-
Let us first discuss a few examples to see what the users’ phic with respect to a complete set of logical operations (e.g.
concerns might be and why one might want to protect users the NAND gate, or the EXOR and the AND gates), then
from a malicious database. non-interactive computation with encrypted data would be
possible.5 This remains one of the most intriguing open
Example 1. Consider a patent database, for example ow- problems in cryptography.
ned by company A and open to the public. A competing By using a universal circuit, which takes a description of
company B would probably not want company A to learn a circuit as input, one could even hide both the query and
which patents company B is searching, as this might leak the data from the database.
information about company B’s projects and strategy.
Example 2. Consider a health information system stor-
5. BILATERAL DATABASE SECURITY
ing personal data of users, which is accessible to various In this section we consider bilateral security where both
authorized parties (the user, doctors, hospitals, the user’s the database must be protected against malicious users and
health insurance company, the user’s employer, etc.), each the users must be protected against a potentially malicious
with specified and sufficiently restricted privileges. Users database.
might have no choice but to join such a system. It is obvi- This can be achieved by so-called secure two-party compu-
ous that the users might be concerned about possible misuse, tation proposed originally in [13]. This technique allows two
for instance when certain information is leaked to the insur- parties to compute any specification, including the specifi-
ance company or to his employer. Ideally, the user would cation of a user interacting with a database. A typical illus-
like to hide the information from the organization running trative example is the so-called millionaires’ problem: Two
the database, but it seems that he has no choice but to trust millionaires want to find out who is richer, without telling
the database. each other how wealthy they are. Using a trusted party, this
task can easily been solved, but using cryptography one can
Example 3. In the above example, any of the entities solve this task even without a trusted party.
accessing the database might be concerned about the cor- Unfortunately, it is impossible to achieve full security for
rectness of the data received from it. It seems that the users both parties. Rather, one must assume that each party fol-
have to trust the database that it answers queries correctly lows the protocol and only afterwards may try to find out
and that it maintains the database correctly. more about the other parties inputs and data than provided
by the protocol’s output.6 This is for example justified in
4.2 Private Information Retrieval 5
Let us discuss the problem of hiding the users’ queries This would also potentially allow to solve the software
piracy problem: The user could send the data in encrypted
from the database (see Example 1). Private information form to the software vendor.
retrieval (PIR), proposed in several papers and formalized 6
Such restricted type of cheating is often called passive
in [4], achieves this. This is at first paradoxical, but one triv- cheating or semi-honest, in contrast to full-fledged cheating
ial (though impractical) solution would be for the database which is called active cheating.
case of the millionaires’ problem if the two gentlemen hon- Wigderson [6] proved that, based on cryptographic intrac-
estly and fairly perform a protocol such that neither of them tability assumptions, general secure MPC is possible if and
can (or must) see the other party’s input (i.e., wealth). only if t < n/2 players are actively corrupted. The threshold
Full security against active cheating can be obtained by in- for passive corruption is t < n. (The case n = 2 and t = 1
volving several parties, using a technique called secure multi- with passive security was already discussed in the context
party computation. of the millionaires’ problem.) In the information-theoretic
model, where bilateral secure channels between every pair
6. SECURE MULTI-PARTY COMPUTATION of players are assumed, Ben-Or, Goldwasser, and Wigder-
son [1] Chaum, Crépeau, and Damgård [5] proved that per-
6.1 The Paradigm fect security is possible if and only if t < n/3 for active
corruption, and if and only if t < n/2 for passive corrup-
Secure function evaluation, as introduced by Yao [13], al-
tion.
lows a set P = {p1 , . . . , pn } of n players to compute an ar-
More generally, the adversary’s corruption capability could
bitrary agreed function of their private inputs, even if some
be specified by a so-called adversary structure [7], i.e., a set
of the players deviate arbitrarily from the protocol. More
of potentially corruptible subsets of players.
generally, secure multi-party computation (MPC) allows the
players to perform an arbitrary on-going computation with
new inputs being given into the computation and new out- 7. SECURE AGGREGATION OF DATABASES
puts being generated. This corresponds to the simulation of FROM MUTUALLY MISTRUSTING EN-
a trusted party [6, 7].
Security in MPC means that the players’ inputs remain VIRONMENTS
secret (except for what is revealed by the intended outputs of Consider, as an example scenario, that it has been agreed
the computation) and that the results of the computation that the National Statistical Office (NSO) of a country should
are guaranteed to be correct. More precisely, security is publish detailed weekly (or even daily) statistics about the
defined relative to an ideal-world specification involving a country’s economic situation, involving detailed internal data
trusted party: anything the adversary can achieve in the real of all companies. This requires the cooperation of the com-
world (where the protocol is executed) he can also achieve panies which must provide their data to the NSO. But this
in the ideal world [2, 11]. is in conflict with the companies’ interest to keep such data
Many distributed cryptographic protocols can be seen as confidential, at least until published in an official company
special cases of a secure MPC. For specific tasks like col- report. If the NSO were fully trusted, this task could easily
lective contract signing, on-line auctions, or voting, there be solved in the obvious manner.
exist very efficient protocols. Here we consider general se- One can view the collection of all databases as an aggre-
cure MPC protocols, where general means that any given gated database to which only the NSO has some privileged
specification involving a trusted party can be computed se- access for statistical queries (and not more). This is the
curely without the trusted party. General MPC protocols specification that should be implemented. In particular, the
tend to be less efficient than special-purpose protocols. We NSO should not learn any individual company data.
refer to [9] for a simple explanation of the MPC paradigm. Secure multi-party computation allows to solve this prob-
lem, i.e., to implement this specification. Each database
6.2 Specifying the Adversary’s Capabilities plays the role of a player in a secure MPC protocol. More
The potential misbehavior of some of the players is usually generally, one can define arbitrary queries on such a virtually
modeled by considering a central adversary with an overall aggregated database, and they can be evaluated without any
cheating strategy who can corrupt some of the players. Two other information leaking from the individual databases.
different notions of corruption, passive and active corrup-
tion, are usually considered. Passive corruption means that
the adversary learns the entire internal information of the
Acknowledgement
corrupted player, but the player continues to perform the I would like to thank Gerhard Weikum and the program
protocol correctly. Active corruption means that the adver- committee for the invitation to speak at this SIGMOD con-
sary can take full control of the corrupted player and can ference, Martin Hirt, Yuval Ishai, and Renato Renner for
make him deviate arbitrarily from the protocol. If no active very helpful discussions on database security and related is-
corruptions are considered, then the only security issue is sues, and Christian Konig for his support with preparing
the secrecy of the players’ inputs. this manuscript.
One distinguishes between two types of security. Informa-
tion-theoretic security means that even an adversary with 8. REFERENCES
unrestricted computing power cannot cheat or violate se-
crecy, while cryptographic security relies on an assumed re- [1] M. Ben-Or, S. Goldwasser, and A. Wigderson.
striction on the adversary’s computing power and on certain Completeness theorems for non-cryptographic
unproven assumptions about the hardness of some compu- fault-tolerant distributed computation. In Proc. 20th
tational problem, like factoring large integers. ACM Symposium on the Theory of Computing
(STOC), pp. 1–10, 1988.
6.3 Some Known Results [2] R. Canetti. Security and composition of multi-party
In the original papers solving the general secure MPC cryptographic protocols. Journal of Cryptology, vol.
problem, the adversary is specified by a single corruption 13, no. 1, pp. 143–202, 2000.
type (active or passive) and a threshold t on the toler- [3] S. Castano, M. Fugini, G. Martella, and P. Samarati,
ated number of corrupted players. Goldreich, Micali, and Database Security, Addison-Wesley, 1995.
[4] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan.
Private information retrieval. In Proc. 36th IEEE
Symp. on Foundations of Computer Science (FOCS),
1995.
[5] D. Chaum, C. Crépeau, and I. Damgård. Multi-party
unconditionally secure protocols. In Proc. 20th ACM
Symposium on the Theory of Computing (STOC), pp.
11–19, 1988.
[6] O. Goldreich, S. Micali, and A. Wigderson. How to
play any mental game — a completeness theorem for
protocols with honest majority. In Proc. 19th ACM
Symposium on the Theory of Computing (STOC), pp.
218–229, 1987.
[7] M. Hirt and U. Maurer. Player simulation and general
adversary structures in perfect multi-party
computation. Journal of Cryptology, vol. 13, no. 1, pp.
31–60, 2000.
[8] U. Maurer. Cryptography 2000 ± 10. R. Wilhelm
(Ed.), Lecture Notes in Computer Science,
Springer-Verlag, vol. 2000, pp. 63–85, 2000.
[9] U. Maurer. Secure multi-party computation made
simple. Security in Communication Networks
(SCN’02), G. Persiano (Ed.), Lecture Notes in
Computer Science, Springer-Verlag, vol. 2576,
pp. 14–28, 2003.
[10] A.J. Menezes, P.C. van Oorschot und S.A. Vanstone.
Handbook of Applied Cryptography. Boca Raton: CRC
Press, 1997.
[11] B. Pfitzmann, M. Schunter, and M. Waidner. Secure
Reactive Systems. IBM Research Report RZ 3206,
Feb. 14, 2000.
[12] B. Schneier. Applied Cryptography. Wiley, 2nd edition,
1996.
[13] A. C. Yao. Protocols for secure computations.
Proc. 23rd IEEE Symposium on the Foundations of
Computer Science (FOCS), pp. 160–164. IEEE, 1982.