Azure SQL Database Always Encrypted
Azure SQL Database Always Encrypted
1511
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
following, unless qualified otherwise, when we refer to SQL usability restrictions of AEv1. A TEE is an emerging security
Server, we refer to both offerings. technology that provides a way for a small amount of trusted
code called an enclave to be run as part of a larger untrusted
host process. The TEE hides the enclave computation and
1.1 Always Encrypted state from the host process and host OS and therefore, admin-
Always Encrypted (AE) endows the database system with istrators of the host system. AE currently supports Windows
cryptographic data protection using encryption. AE allows Virtualization-based Security (VBS) enclaves [35]. We are also
data owners to use encryption at a column granularity to working on supporting Intel SGX enclaves [12]. (In the fol-
outsource database administration while keeping data confi- lowing, unless qualified otherwise, AE refers to the latest
dential from the administrators, including cloud operators. v2 version which also includes all of the v1 functionality as
Operationally, these security guarantees are achieved by discussed in Section 2.)
keeping data identified as sensitive, encrypted at all times: at AE uses the TEE to temporarily store encryption keys and
rest when stored on disk, in SQL Server’s internal memory perform computations on decrypted, plaintext data. While
while in use (except within the memory of Trusted Execu- conceptually simple, this approach introduces significant
tion Environments (TEEs) as we will see), and in transit engineering and technical challenges: (1) We need additional
during backups and result communication. SQL Server is services to attest the trustworthiness of the enclave code
untrusted for these guarantees and they continue to hold if and such attestation services need to work for a variety of
the SQL Server instance is compromised. Encryption keys SQL Server deployment scenarios (on-premise, Azure, other
are generated by clients in a “bring-your-own-key” model clouds); (2) TEE raises the question of how query processing
and they are never exposed to SQL Server. This property of is divided between the trusted enclave and untrusted SQL
AE is the central difference as compared to traditional data Server? The simple strawman of pushing all of SQL Server
protection mechanisms such as transparent data encryption querying functionality into the enclave inherits any vulner-
(TDE) [22, 27, 30, 31] and role-based access controls [23, 29], abilities in the large SQL Server code base. There are also
where the database system needs to be trusted for the pro- subtle information leakage attacks where an attacker can
tections. TDE, for example, keeps data encrypted at rest but learn plaintext information from data movement patterns to
decrypts it when loaded into memory during query process- and from the enclave; (3) There are devops challenges such
ing. The TDE design requires the database system to hold as how to handle failures within the enclave and how to
encryption keys, so a malicious administrator can recover debug customer errors originating within the enclave while
the keys and data using simple memory scraping attacks. respecting customer data confidentiality requirements.
The fundamental challenge introduced by data encryption While we subscribe to a northstar goal of supporting most
is computation over ciphertext. One approach to address of SQL functionality, AEv2 represents a first step towards
this challenge is to use specialized encryption schemes that richer querying using TEEs. In AEv2, we support general
allow computation over ciphertext. The first version of AE comparisons (beyond equality) and string pattern matching
(AEv1) uses deterministic encryption which, as the name sug- operations. In addition, initial encryptions and key rotations
gests, deterministically produces the same ciphertext for a go through the TEE and avoid the roundtrip to client systems.
given plaintext. This property allows equality operations Accordingly, AE today is not designed to be applicable to the
over ciphertext and AEv1 relies on this to support database entire database, but to high-sensitivity columns, e.g. person-
operations based on equality such as point lookups, equi- ally identifiable identifiers such as social security numbers,
joins, and equality-based grouping. Functionality restrictions credit card numbers, names, and addresses.
aside, this approach suffers from a serious usability pain-
point: since SQL Server does not have the encryption key,
turning on encryption for the first time (initial encryption) 1.2 Customer Impact and Experiences
and rotating encryption keys both require a roundtrip to Despite its functionality limitations, AEv1 is used by a wide
client systems possessing the encryption key(s), which can variety of customers ranging from financial institutions (e.g.,
be prohibitively expensive. For terabyte large databases, this Financial Fabric, Produbanco) to insurance companies (e.g.,
roundtrip can result in latencies as long as a week for ini- Progressive Insurance) and health care organizations (e.g.,
tial encryption and key rotation, which was a nonstarter for Fullerton Health Care). These customers use AE mostly for
many customers with such large databases. OLTP applications and encrypt only personally identifiable
The main innovation of the second version of Always identifier (PII) columns such as SSNs, names, email addresses,
Encrypted (AEv2) released as part of SQL Server 2019 (and and credit card numbers.
soon on Azure SQL Database) is to use a trusted execution The AEv2 feature was designed based on feedback from
environment (TEE) to address some of the functionality and these and other customers. Customer applications suggest
1512
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
that many types of sensitive information such as names, TEEs to hide access patterns going beyond the operational
phone numbers, and location data require richer operations security provided by AE.
going beyond equality. Further, the data sizes are large enough
to make the client-based initial encryption and key rotation 2 OVERVIEW
impractical. Although only recently released, we are seeing In this section, we discuss how Always Encrypted is config-
an increase in interest from a broader set of customers in- ured, the query functionality that it provides and introduce
cluding those in manufacturing and retail sectors who feel the notion of an enclave.
that Always Encrypted will simplify their ability to meeting
regulatory compliance requirements, such as the EU General
2.1 Enclave
Data Protection Regulation (GDPR).
An enclave is a part of the virtual address space of a process
and includes code and data that is shielded from the rest of
the process and the operating system (OS), and hence from
1.3 Related Work actors with administrative privileges to the machine. We use
AE builds on a rich body of research in the area of encrypted the term host to refer to part of the process outside the en-
database systems. Without this foundational work, it would clave. The host initializes the enclave by loading a specially
not have been possible to build AE. While all that work is compiled dynamically linked library (dll) and invokes the en-
relevant, it took many additional innovations to build AE as clave code using function calls. The enclave code can access
a full-fledged commercial relational database system with the entire address space of the process while by design the
end-to-end data protection guarantees using encryption. AE host cannot access the enclave memory.
is also one of the first commercial server-side platforms to AE currently supports software based enclaves using Win-
leverage emerging TEE technology. dows Virtualization-based Security (VBS) [35]. We are in the
The idea of using encryption for data protection for cloud process of adding support for Intel SGX [12]. For SGX, the
outsourcing was first proposed by Hacigumus et al. [16]. enclave is protected by the CPU, while for VBS, the protec-
The main challenge of using encryption is query processing tion comes from the hypervisor (Windows Hyper-V). This
over data obscured by encryption. All prior research proto- implies that the Intel processor is a trusted component for
types [4] use some combination of homomorphic encryption SGX enclaves and Hyper-V (and the underlying processor),
that allows computation over ciphertext and TEE-based pro- a trusted component for VBS enclaves. A detailed discussion
cessing. of enclaves is beyond the scope of this paper and we refer
Fully homomorphic encryption (FHE) schemes that allow the reader to these papers [12, 35].
arbitrary computation are inefficient for database querying, Enclave platforms support a protocol called attestation
significant advances [10, 17, 28] in recent years notwithstand- using which a remote system (e.g., client) can verify the au-
ing. FHE also works with fixed-size inputs and outputs and thenticity of the initial code and data in an enclave. This
therefore suffers from an abstraction mismatch with database protocol relies on a trusted external service called the attes-
querying where the input and output sizes can vary arbitrar- tation service. We describe the details of attestation as used
ily. Prior systems [15, 24, 26, 33] have relied on specialized in Always Encrypted in Section 4.2.
encryption schemes such as property-preserving encryption
that preserve some plaintext property such as order when en- 2.2 Encryption Keys
crypted, partial homomorphic encryption that support limited AE uses a two-level key hierarchy to encrypt data. Data is
operations such as addition over ciphertext but with better encrypted using symmetric encryption based on a column
performance characteristics than FHE, and garbled circuits. encryption key (CEK), a 32 byte AES key. A CEK is stored
Many industrial NoSQL systems such as Google Encrypted within the database encrypted using a second-level key called
BigQuery [15] and Ciphercloud [11] rely on such specialized the column master key (CMK). A CMK is stored in a separate
encryption schemes as well. key provider and SQL Server AE stores only a URI reference
The first research prototype based on TEEs was TrustedDB to the key in the key provider, without having access to the
[7] which uses a coarse-grained approach of running a full key material. Since all key metadata is stored in SQL Server,
query processing stack within the TEE. The fine-grained except for the CMK, this ensures that the database is the
architecture of Always Encrypted was first described in [2, single source of truth, and that key metadata is replicated and
3]. AE adopted and significantly expanded that design and backed up along with SQL Server data. The client controls
showed for the first time that it can be deployed commercially and configures the key provider(s) used in an AE instance.
and at scale in the cloud and on premise. Recent work [13, 36] We support the following key providers out of the box: Azure
has focused on expensive Oblivious RAM techniques for Key Vault [6], Windows Certificate store, Java Key Store, and
1513
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
1514
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
table on the left is the plaintext, and the one on the right is the 2.5 Application Transparency
corresponding encrypted table using a policy that specifies Always Encrypted is designed for application transparency
that the AcctID column is stored in plaintext, the AcctBal meaning applications do not need to be modified to use AE
column is encrypted using randomized encryption, and the functionality, modulo the functionality restrictions above
Branch column is deterministically encrypted. and some fine-print discussed next. In order to achieve trans-
parency, we only support parameterized queries (including
2.4 Functionality stored procedures and functions). This is not a serious func-
AE maintains encryption end-to-end. The client issues en- tional limitation since good programming practices already
crypted queries, i.e. queries where the input parameters are recommend parameterization, and any ad hoc query can be
encrypted and SQL returns encrypted results (we discuss rewritten to be parameterized. We achieve transparency by
application transparency in Section 2.5.) enhancing the SQL client drivers to be aware of AE, such that
(a) query parameters are encrypted on their way to SQL, and
2.4.1 Key Provisioning. While we provide DDL to create (b) results are decrypted when returned to the application.
key metadata, our DDL expects clients to configure the CMK When a query requires computations in the enclave, then
and compute the encrypted value of CEKs. In order to ease the driver also transparently sends CEKs to the enclave.
the burden for clients, we automate the above steps in our
tools. 2.6 Threat Model
2.4.2 Data Encryption. To turn on encryption (initial en- Our goal is to ensure data confidentiality from entities with
cryption) for a column using an enclave-enabled CEK, we privileged OS and database access. To characterize this threat,
rely on an ALTER TABLE ALTER COLUMN DDL statement. To we introduce the strong adversary, who has unbounded power
encrypt columns using an enclave-disabled CEK, we need a over the SQL Server process and can not only view the con-
roundtrip to a client system, and we provide client-side tools tents of the server’s memory/disk at every instant, along
for this purpose. with all external and internal communication, but also tam-
An important operation with encryption is key rotation. per with it, for instance by attaching a debugger to SQL. A
A CMK rotation does not require re-encrypting data, but strong adversary however cannot observe state or computa-
merely CEKs encrypted with it. To prevent downtime we tions within the enclave because it is specifically designed
allow CEKs to be encrypted with two CMKs temporarily for this purpose. As an emerging technology, TEEs are un-
for the duration of the CMK rotation. A CEK rotation does dergoing an arms race between side channel attacks and
require re-encrypting data. Just like initial encryption, we the corresponding patches [34]. A similar dynamic of hard-
use an ALTER TABLE ALTER COLUMN DDL statement when ware attacks has been observed in the past and continues
both prior and new-CEKs are enclave-enabled; otherwise, to be observed in the development of Hardware Security
we rely on client-side tools to manage the client round-trip. Modules [19]. Current enclave side channel attacks are spe-
All operations discussed here are online, meaning the client cific to the TEE implementations, not the promised secu-
sees no downtime during key rotation or initial encryption. rity goals. The design of AE is not dependent on a specific
TEE implementation allowing us to transition to a more
2.4.3 Select and Update Queries. AE restricts querying secure implementation if necessary. We therefore exclude
functionality for encrypted columns. For columns encrypted enclave side-channel attacks from our scope. Our adversary
with enclave-disabled keys, AE supports only equality op- is stronger than the honest-but-curious adversary assumed in
erations on DET columns, i.e. point lookups, equi-joins and most prior work [3, 7, 15, 24, 26, 33], who can only observe,
equality grouping, and no scalar operations on RND columns. but not tamper with the processing.
For columns with enclave enabled keys, AE uses the enclave
to support equality, range comparisons, and LIKE pattern- 3 ARCHITECTURE
matching predicates, even for RND columns.
Figure 3 describes the architecture of Always Encrypted. We
2.4.4 Indexing. On DET columns, we support point index- show the SQL Server component shaded to illustrate that it is
ing. On RND columns with enclave-enabled keys, we also untrusted. All other components—the SQL client, the enclave
support range indexing using SQL Server’s B+-Trees. Range and the attestation service are trusted. The keys visible in
indexing is not supported on deterministically encrypted the respective trusted components are also illustrated—the
columns, where enclave-enabled keys can only be used for key provider stores the CMK, and the CEKs are visible in
in-place encryption and key rotation; we see determinis- the client driver and the enclave. Data is stored encrypted
tic encryption strictly as a way to support equality based at column granularity within SQL Server; Figure 3 shows
comparisons. encrypted data corresponding to the schema introduced in
1515
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
Driver
Application SQL Server Enclave
{CEK}
{CEK}
ID Value
(plaintext) (encrypted)
Figure 1. SQL Server also stores CEKs encrypted with the a query to SQL Server with encrypted parameters. If the
corresponding CMKs. query evaluation requires an enclave, the driver installs the
Since encryption is transparent, the application issues a necessary (decrypted) CEKs in the enclave using the shared-
parameterized query with plaintext parameters and expects secret based secure channel. SQL executes the query (using
results in plaintext. The driver therefore needs to deduce the enclave if needed) and returns encrypted results to the
the encryption “type” information, the encryption keys to client, along with key metadata needed to decrypt the results.
encrypt parameters and decrypt results. Implementing this The driver decrypts the results and presents them in the clear
functionality entirely within the driver is a substantial en- to the application.
gineering challenge since it involves duplicating full SQL We now briefly outline how query execution inside SQL
Server parsing in the driver. Further, this step also requires Server uses the enclave. The SQL Server engine implemen-
access to metadata stored in SQL Server for “binding”, attach- tation of AE inherits many of the design elements from Ci-
ing semantic interpretation to the parsed query. Our design pherbase [2]. Our design relies on the observation that most
instead relies on implementing the encryption type deduc- components of a database engine do not directly compute
tion functionality within SQL Server and making it available on column-granularity data values, but rather move or copy
to the driver using a new api called sp_describe_paramet data values between different locations (disk, buffer pool, log,
er_encryption. The output of this call for a parameterized and lock table). Their functionality is unaffected whether
query contains: the encryption type information (CEK) for the values are encrypted or in plaintext. SQL Server code
each parameter; if the evaluation of the query requires an localizes all computations on columnar data values to a mod-
enclave, the output also contains the set of CEKs required ule called expression services (ES). Our enclave runs a subset
within the enclave. For each CEK above, the output contains of ES needed to implement the functionality that we sup-
the encrypted CEK and the CMK metadata. Since SQL Server port. While we have made changes outside the enclave, those
is untrusted, it could return incorrect output for the sp_ changes are not extensive.
describe_parameter_encryption call and undermine se- For example, consider the data shown in Figure 3. It in-
curity. We discuss client controls that mitigate this risk in dicates an instance of Table T whose schema is defined in
Section 4.1. Figure 1. Column value is encrypted using randomized en-
If the query requires enclave computations, then SQL cryption with an enclave enabled key. Suppose that the ap-
Server also makes a call to a trusted attestation service and re- plication issues the (parameterized) query select * from T
turns attestation information to the client, which is included where value = @v; the driver encrypts the value of param-
in the above output. The attestation information if returned eter @v before forwarding the query to SQL Server. Suppose
is used to establish a shared secret between the driver and that the filter predicate is evaluated using a table scan. In
the enclave. Overall, in our approach, SQL Server acts as the our architecture, data would be fetched from the storage
untrusted “man-in-the-middle” mediating communication engine encrypted, into the filter operator which would, for
between the driver and the enclave. each row, invoke the enclave to evaluate the filter. Inside
The driver uses the output of sp_describe_paramete the enclave, values are decrypted and the filter is evaluated
r_encryption to obtain CMKs, decrypt CEKs, and issue on the corresponding plaintext. The boolean result of the
1516
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
1517
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
of values share a prefix. However, even if we did not sup- the enclave is small, dump information inside the enclave
port LIKE predicates, a client that reduces LIKE predicates is almost never necessary. Furthermore, connecting to the
to range predicates would reveal the same information to database returns only encrypted data. Since cloud admin-
an adversary that has the background knowledge that the istrators cannot access the CEKs, they cannot decrypt the
in-coming range queries emanate from prefix matches. data. We see the above advantages as a side-effect of having
a small Trusted Computing Base (TCB). While SQL Server is
Secure Compilation For AE DDL. Always Encrypted uses
a complex system with millions of lines of code, our enclave
DDL (ALTER TABLE ALTER COLUMN) statements to rotate keys
on which the security of AE relies, is a tiny fraction of SQL
and initialize encryption. When the CEKs involved in the
Server’s code base. The benefits of having a small TCB are
operation are enclave-enabled, AE uses the enclave to avoid
widely recognized [2].
a client roundtrip. In particular, initial encryption requires
the enclave to encrypt plaintext values. We seek to restrict 4 IMPLEMENTATION
exposing such an encryption oracle at the enclave where
an adversary can generate ciphertexts for plaintexts of its This section details the implementation of the AE feature.
choosing. Accordingly, we only allow the enclave Encrypt
function to be called if the client explicitly authorizes it. We
4.1 Client Driver
check for client authorization by having the driver sign the We updated recent versions of various SQL Server drivers
query text using the session secret; we compute a SHA256 including ADO.Net, ODBC, and JDBC drivers to include
hash of the query text and include it along with the CEKs client-side AE functionality. As described in Section 3, when
that are encrypted with the shared secret. The enclave seeks the application issues a parameterized query to the driver,
a proof of client authorization when SQL Server requests the driver invokes a SQL Server api, sp_describe_paramet
access to the Encrypt function 1 . SQL Server supplies a proof er_encryption, to retrieve encryption type information to
using the parse tree of the Alter Table Alter Column query encrypt query parameters and install enclave CEKs.
to the enclave, which uses the parse tree, the raw query text Example 4.1. Consider the parameterized query select *
and its SHA256 hash to validate that the client is authorizing from T where value = @v over the running example ta-
the type conversion needed for the DDL. ble in Figures 1 and 3. The value column of Table T uses
RND encryption with an enclave enabled key. The output of
3.3 Design Alternatives the call to sp_describe_parameter_encryption indicates
Always Encrypted uses the enclave only for expression eval- that: (1) parameter @v should be encrypted with randomized
uation. An alternate design explored in prior work [7, 8] is to encryption with CEK MyCEK, and (2) the CEK MyCEK should
run the entire database engine inside the enclave. We decided be sent to the enclave for evaluating the query. It further
against the latter design due to considerations in Azure Sql contains the metadata for CEK MyCEK and its correspond-
Database, the Platform-as-a-Service (PaaS) cloud offering ing CMK, MyCMK shown in the DDL in Figure 1. Since the
of SQL Server. In a PaaS setting, the cloud organization is query requires enclave computations, then SQL Server also
tasked with administrative tasks such as configuring back- makes a call to the attestation service and returns attestation
ups, replication for high-availability, and troubleshooting in information, which is included in the above output.
the event of failures. To perform these tasks cloud operators The driver constructs an encrypted query by encrypting
need to be able to perform operations such as examining parameters and issues it to SQL for execution. If CEKs are
query plans to troubleshoot performance related problems, needed in the enclave, it first checks that the CEKs are autho-
collecting dumps during a crash, connecting to the database rized to be sent to the enclave using the corresponding CMK
server running in production to query the system runtime, signature, and then encrypts them with the shared secret
e.g. the transaction conflict graph, and in rare scenarios, at- and sends it to SQL Server along with the query.
taching a debugger. Supporting such operations requires full In order to avoid frequent CEK decryptions, which in the
access to the SQL Server process, and running the process case of external key providers like Azure Key Vault could
inside an enclave does not add any security. involve a network call, the driver caches the decrypted CEKs
In contrast, with the design of AE described above, all for a duration that can be controlled by clients. Further, the
management tasks can be carried out with full access to shared secret obtained as an outcome of attestation is cached
the SQL Server process, but without access to the enclave in the driver in order to avoid frequent invocations of the
memory. Enclave memory is automatically stripped from attestation protocol. All of the above caches are shared across
crash dumps, but since the amount of code running inside the entire client process. Note that the above architecture
1 Ourauthorization check is generalized to all type conversions using the incurs two round-trips to SQL. In order to not force every
enclave. We use Encrypt as an illustrative example. application to incur two round trips, we add a property to the
1518
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
connection string indicating its use for AE. In the absence of the enclave binary, and version numbers of the enclave
of the property, the driver does not invoke the special API and the host hyper-visor. Our VBS enclave creates an RSA
described above. public/private key pair when it is loaded. In addition, the
The fact that SQL is the source of truth for type deduction enclave report contains a hash of the enclave’s public key.
and key metadata introduces the following vulnerabilities in We use Diffie-Hellman (DH) key exchange to establish a
the system. First, SQL could maliciously alter the output of s shared secret between enclave and driver, and fold it into the
p_describe_parameter_encryption to claim that parame- attestation protocol to save client-server roundtrips. When
ters corresponding to encrypted columns are not encrypted. the attestation protocol is invoked (as part of the call to s
In order to address this issue, we allow an application to p_describe_parameter_encryption), the client passes its
explicitly force a parameter to be encrypted. Second, instead DH public key. As part of its attestation information, SQL
of returning the key metadata for keys provisioned by the returns:
client, SQL could return metadata corresponding to keys (1) The host health certificate containing the host signing
provisioned maliciously. In order to prevent this attack, we key.
allow the application to restrict the key paths of the CMK (2) The enclave report signed by the host signing key.
to a list of trusted paths. The driver checks that the CMK (3) The enclave’s public key and DH public key, signed
metadata returned by SQL belongs to the list of trusted paths. by the enclave’s public key. Since the client sends its
DH-public key as input, at this point, it follows from
the DH protocol that the enclave already holds the
4.2 Attestation and Shared Secret
shared secret.
The goal of attestation is for the client to check the health of
On receipt of the above information, the client checks the
the enclave before releasing keys. The attestation protocol
chain of trust as follows.
is invoked at query time on a signal from the client during
the call to sp_describe_parameter_encryption (and only (1) Check that the health certificate is signed by the HGS
when needed, i.e., there is enclave computation involved.) We signing key. It obtains the HGS signing key by query-
build upon attestation to establish a shared secret between ing HGS (all HGS APIs are exposed using http(s)).
the driver and the enclave. (2) Check that the enclave report is signed by the host
Attestation breaks down into two portions: the health of signing key embedded in the health certificate.
the enclave platform, and the health of the code running (3) Check that the enclave is healthy. In our current im-
inside the enclave. The details are specific to the enclave plementation, we base this check on: (1) the signing
platform. We describe the details for the VBS enclave. The key; we build the enclave binary using a specially pro-
attestation service we support is a feature in Windows Server visioned signing key, and use it to check the enclave
known as the Host Guardian Service (HGS) [18]. HGS mea- health. Using the binary hash is a possibility, but would
sures the health of a host machine using Trusted Platform break even with minor modifications to the enclave
Module (TPM) measurements. TPMs measure the boot se- code, and (2) version numbers; in the event of a secu-
quence of a host and the measurement is returned in the rity update to our enclave, we build the enclave with
form of a log called the TCG log. In an offline step, the TCG an updated version number and would release a client
log obtained from the machine hosting SQL is registered that checks for the updated version number.
with the HGS service to be included in its white-list. For (4) Check that the enclave public key returned is consis-
VBS enclaves, we only trust the hypervisor, but not the host tent with the hash embedded in the report, and that the
kernel. Therefore, we are only interested in the measurement enclave DH public key is signed by the enclave public
of the boot sequence until the hypervisor is loaded. In order key. At this point, the attestation process is complete,
to attest the VBS enclave, SQL invokes Windows to send the and the client can derive the shared secret.
current TCG log to HGS, which looks up its white-list and re- As noted above, the shared secret is used by the driver to
sponds with a health certificate in the event of a match in the encrypt CEKs and is sent on the TDS stream when executing
white-list. The health-certificate is signed by a signing key the query. We note that one limitation of using the shared
possessed by HGS, that we refer to as the HGS signing key secret as-is, is that SQL could replay the TDS stream to send
and contains a signing key possessed by the host hypervisor, keys to the enclave. In order to address replay attacks, the
which we refer to as the host signing key. driver adds a nonce to the encrypted CEKs being sent to
SQL then issues a call to Windows to measure the enclave. the enclave. The enclave checks that the nonce is not re-
The measurement is called an enclave report and it contains peated (for a given session, i.e. shared secret value). A simple
attributes of the enclave including the author ID that refers strawman for nonce checking is to use a counter at the dri-
to the signing key used to sign the enclave binary, the hash ver to generate nonces. The enclave checks if a new nonce
1519
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
is greater than the most recent previous nonce. While this generalized encryption types — Plaintext, Deterministic and
strawman requires O(1) enclave state per session, it does not Randomized — that form a lattice shown in Figure 6 (with
work correctly when driver-enclave communication is out enclaves, there are more generalized types but still maintain
of order, which is possible since both the client application a lattice structure). Operations decrease strictly as we go
and SQL Server are multi-threaded. from Plaintext to Deterministic to Randomized. The arrows
Our implementation of nonce checking is a generalization in the Figure indicate lattice order. We note that the above
of the above idea: the driver still uses a counter to gener- are generalized types since they do not refer to any specific
ate nonces, however the enclave now tracks all historical CEK. As we compile the query, we add constraints on the en-
nonces. The enclave encodes all historical nonces using com- cryption type of each literal (parameter/variable) and solve
pact ranges. For example, the contiguous set of nonce val- for them. We illustrate with an example.
ues 0, ..., 100 are encoded using a range [0, 100]. The overall
Example 4.2. Consider the data shown in Figure 3 and the
idea behind this design is that, since the driver generates
query select * from T where value = @v. Since this discus-
sequential nonce values, the sequence of nonce values that
sion does not include enclaves, assume that column value is
the enclaves sees is still close to sequential with some local
encrypted with Deterministic encryption. As we compile the
reorderings, which translates to a very compact encoding of
query, when we encounter parameter @v, we add it to our
historical nonces.
constraint system with an unknown encryption τ with the
constraint τ ≤ Randomized, where ≤ is a reference to the
4.3 SQL Engine: Metadata and Type System
lattice order. When we encounter the predicate value = @v,
we add two constraints: (1) τ ≤ Deterministic since equality
Randomized
is not allowed on Randomized encryption (without enclaves),
and (2) τ = EncryptionType(value) since equality is only
Deterministic allowed if both operands have the same encryption type (this
is also true with enclaves). Since the encryption type of the
column value is known, solving for the above system yields
Plaintext the result that the encryption type of parameter @v is the
same as the encryption type of the column value.
Figure 6: Encryption Type Lattice
In our implementation, we do not explicitly use a con-
SQL stores information about encryption in its metadata. straint solver. We use a Union-Find algorithm to solve the
This includes key metadata in new system tables we intro- constraints implicitly. We have equivalence classes to repre-
duce, and also encryption information associated with each sent all operands with the same (potentially unknown) en-
column. We enhance the SQL type system to reason about cryption type, merging them when we encounter an equality
encryption. Encryption information is incorporated as an constraint if no constraint violation is introduced. Inequality
additional attribute of SQL types; for instance, there is an constraints are processed by adjusting the encryption type
encrypted integer, an encrypted string, an encrypted date- of an equivalence class to account for the inequality (e.g.,
time, etc. The type information of a column, parameter, or from Randomized to Deterministic in the above example),
variable includes not only the encryption type, but also the again only if no constraint is violated in doing so. The above
identifier of the corresponding CEK. Type deduction on a encryption type deduction also tracks all CEKs needed in the
query consists of checking not only plaintext types (e.g. can enclave for query processing. There could be cases where
a string be used to lookup an integer column) but also en- we have multiple solutions to the constraint system. In such
cryption types. Therefore, we introduce an additional phase cases, our preference is to solve using the Plaintext type.
of type deduction in SQL, namely encryption type deduction. The above type deduction is invoked during the call to sp
Furthermore, unlike plaintext types that are fully declared, _describe_parameter_encryption, but it is also invoked
owing to our transparent API, encryption types are not de- during query execution as part of normal type deduction.
clared in the input query. Hence, encryption type deduction The results of type deduction are cached in the plan cache
needs to operate on unknown types. with the query plan, to avoid recomputing them on every
For this purpose, we use the observation that our encryp- execution.
tion types form a lattice and we setup a constraint solving
system using the lattice to infer encryption types. For ease 4.4 SQL Engine: Expression Services
of exposition, we describe the lattice structure without en- As noted in Section 3, we run a subset of expression ser-
claves in the picture. The extension to include enclaves is vices (ES) inside the enclave. In SQL, ES is implemented as a
straightforward. In the absence of enclaves, there are three stack machine. All expressions required for query processing
1520
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
Expression as a tree Stack code Stack code Under the above approach, we compile the source code
CEsComp (Host) CEsComp (Enclave)
CScaOp_Comp
of ES to generate two binaries — the standard ES binary
TMEval SetData
GetData
Comp running outside the enclave, and the enclave binary. As noted
GetData
CScaOp_Identifier CScaOp_Identifier GetData GetData
in Section 4.2, we sign the enclave binary with a specially
Stored as a serialized byte provisioned key.
stream in CEsComp (Host)
Expressions in SQL are compiled from their tree represen-
tation to a stack machine, specifically the CEsComp object
referred to above. We add a new instruction called TMEval
Figure 7: Illustration of Expression Compilation to the ES stack machine instruction set. This instruction
invokes an enclave computation. We represent the enclave
computation using another CEsComp object, one that is eval-
are compiled into ES objects of a class called CEsComp and uated within the enclave. The expression object running in
stored in the plan cache. At runtime, SQL generates an exe- the enclave is serialized and stored inlined in the host ob-
cutable version of CEsComp, called CEsExec which exposes ject. We serialize the object as a way to implement a deep
an Eval() method to run the stack program on provided in- copy. During execution, the entire CEsComp object is recon-
puts. All scalar operations of query operators in SQL Server structed inside the enclave. We reconstruct the object since
are encapsulated within CEsExec objects they own: evaluat- the compile-time CEsComp object is used at execution time
ing filter predicates, computing hashes for hash join probing, as well, and having the enclave reference an object stored in
and checking join equality all translates to Eval() calls of the host would introduce an attack vector where SQL could
these objects. interfere with the enclave evaluation by tampering with the
When running ES in the enclave, we faced the following CEsComp object.
challenge. Like all other components in SQL, ES does not We note that the TMEval instruction is used only for en-
call the operating system (OS) directly for resource manage- clave computations. Equality operations on deterministi-
ment, and instead uses SQL’s own internal OS abstraction cally encrypted columns are simply treated as VARBINARY or
called SQL OS. SQL OS itself calls the OS through a platform BINARY equality operations.
abstraction layer that lets SQL run on operating systems
other than Windows. However, the enclave runtime explic- Example 4.3. Consider our running example query from
itly excludes the OS, since the OS is untrusted. The enclave Example 4.2, select * from T where value = @v. Figure 7
runtime does support all the resource management that an shows the tree representation of the comparison, and the
OS provides (memory, threading and synchronization, excep- output of compiling it to a stack machine. We generate two
tion handling) but it is provided through restricted non-OS CEsComp objects, one running at the host with a stub TMEval
interfaces. for the enclave call, and another running in the enclave.
Given the above challenge, we faced three options for run-
ning ES within the enclave: (1) reimplement ES, (2) port ES 4.4.1 Expression Evaluation within Enclave. The enclave ex-
with SQL OS, and (3) port ES without SQL OS. We rejected poses an interface Eval(expr, inputs, outputs) to eval-
the option of reimplementing ES (a departure from the Ci- uate a scalar expressions within the enclave. The parameter
pherbase project); we wanted to inherit all the benefits of expr in the Eval call is a serialized representation which
ES, e.g. its handling of strings, specifically collations, NULL deserializes to a CEsExec object within the enclave. The pa-
values and exceptions. Since our goal is to eventually support rameter inputs is an array of data values that form the input
a larger fraction of SQL, we chose to port ES to run inside to the expression evaluation; the parameter outputs is an
the enclave. We also rejected the option of porting SQL OS, array of data value buffers for storing the outputs of the
since SQL OS is a large component written to support all evaluation.
of SQL, whereas ES requires only a small subset of SQL OS. The enclave Eval method is called from within the TMEval
Therefore, we wrote a small SQL OS layer (that we will refer stack instruction with inputs popped from the stack during
to as the enclave SQL OS to distinguish it from the host SQL host-side expression evaluation. The enclave enforces se-
OS) that supports the SQL OS abstractions needed for ES curity checks that ensures for instance that encrypted and
in addition to cryptographic operations needed within the plaintext values cannot be compared.
enclave, and is implemented on top of the enclave runtime. We now describe how encryption and decryption is han-
The above layering also lets us easily port our enclave code dled during expression evaluation within the enclave. The
between enclave platforms; by re-implementing the enclave ES stack instruction set contains two instructions GetData
SQL OS against different enclave runtimes, we let most of and SetData to move data to and from the stack, respec-
the enclave code remain unchanged. tively. For enclave expression evaluations, the source of any
1521
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
GetData instruction is one of the values in the inputs param- presence of an encrypted index, the database is fully avail-
eter, and the destination of any SetData instruction is one able for clients, but the version cleaner that performs index
of the buffers in the outputs parameter. traversals could potentially not find keys in the enclave, in
These instructions are annotated with the type of data, which case it keeps retrying. When the client connects to pro-
which includes the CEK identifier and encryption scheme vide keys, the version cleaner completes successfully. When
when the data is encrypted. Using this information, the the database is configured to use CTR, then even though the
GetData instruction automatically decrypts any encrypted above availability issue is not eliminated (there are corner
data before it is placed on the stack; Similarly, the SetData cases where we could end up with deferred transactions),
instruction automatically encrypts the data before moving the overall database availability is improved.
it off the stack if its type information indicates it should be The above approach ensures database availability without
encrypted. In other words, all decryptions and encryptions relying on the client to supply keys. However, if the client
happen at ingress and egress points, and the stack program never supplies keys, then other database administration tasks
evaluation itself is oblivious to the encryption details. such as log truncation are blocked. The same issue also arises
Enclave memory is stripped from dumps and not visible if we are restoring a database backup in a machine that has
in the debugger. This does not affect our supportability since no enclave configured. In order to address such problems, we
the enclave code is small. We leverage structured exception introduce the mechanism of forcing resolution of deferred
handling to obtain coarse-grained information in the case of transactions by skipping recovery of index pages and mark-
hardware faults such as access violations. ing the index as invalid in the metadata. If an enclave is not
configured, then index invalidation is automatic, whereas if
an enclave is configured, then index invalidation could be
4.5 Indexing initiated using explicit policies based on timeout or resource
The core idea behind indexing was presented in Section 3. consumption, e.g. log space consumption. Since invalidating
One of the main challenges we encounter with indexing a clustered index can lead to data loss, we do not support
is recovery. In SQL Server, redo recovery is physical, but clustered indexes on encrypted data.
undo recovery of indexes is logical; for instance, aborted
inserts are undone by navigating the B+-Tree and deleting
the record. This poses a problem for indexes on encrypted 4.6 Performance Optimizations
columns (henceforth referred to as encrypted indexes). En- Section 4.1 lists performance optimizations relevant to the
crypted indexes require keys in the enclave, and the client driver such as caching of CEKs. We now discuss optimiza-
only sends keys when running queries. Hence, we have to tions in the SQL engine. Our optimizations focus on the use
consider the possibility that the client never runs any query of the enclave. Calling the enclave incurs an overhead since it
using the encrypted index, which could potentially block resides in a different security boundary; and since expression
recovery. In order to address this problem, we mark any re- evaluation constitutes the inner loop of query processing, by
covery transaction that finds the key missing to be deferred, moving it into the enclave, we make the inner loop expensive.
leveraging a pre-existing mechanism of deferred transactions Instead of calling the enclave as a function, i.e. synchronous
available in SQL. When the client connects and sends keys execution, we spool up an enclave worker thread and pin it
to the enclave, the deferred transactions are resolved. Since to a core. Host workers submit work to the enclave using a
deferred transactions hold locks, the above approach could queue, and the enclave worker consumes and performs the
lead to large parts of the database being unavailable. For work. After completing its work, the enclave worker spins
instance, if SQL crashes during a bulk load on a table with an for a fixed duration polling for work before exiting the en-
encrypted index, then the deferred transactions could lock clave and going to sleep. In this way, if the system is making
up a large portion of the table; while clients can connect heavy use of the enclave, then the expectation is that host
to the database, they would be unable to perform update workers are constantly feeding the enclave work, keeping it
operations on most of the table. busy, owing to which we avoid the enclave context switch
To mitigate the above problem, we leverage the constant- cost. On the other hand, if the workload does not heavily
time recovery (CTR) feature also available in SQL Server use the enclave, then the enclave goes to sleep freeing up
2019 [1]. Briefly, CTR makes the database fully available resources for SQL Server host.
with all locks released in constant time in the event of a Further, the enclave is multi-threaded and each enclave
crash. It does so by persisting versions of the data; when the thread processes ES requests as described in section Sec-
database recovers from a crash, clients only get access to tion 4.4.1. To simplify synchronization issues all state changes
the latest committed version (with all locks released), while such as adding CEKs are handled by a single enclave thread.
uncommitted versions are cleaned in the background. In the The other threads only read the current state. As state changes
1522
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
are rare operations this design allows for efficient scaling of 5.3 Benchmark and Encryption
enclave resources. Our performance optimizations are still Configuration
work in progress and we are in the process of implementing We use the TPC-C benchmark [32] with some minor changes
some of the optimizations identified in [3].
for our performance evaluation. The benchmark consists of
nine tables and five types of transactions over these tables
5 PERFORMANCE EVALUATION that simulate the business activities of a wholesale supplier.
This section contains results of preliminary performance We encrypt the personally identifiable columns of the
experiments with AE using the TPC-C benchmark [32]. benchmark which are the C_FIRST, C_LAST, C_STREET_1,
C_STREET_2, C_CITY, and C_STATE columns in the Customer
table. As noted above, we use RND encryption with enclave-
5.1 Hardware configuration enabled keys for SQL-AE-RND configurations, and DET with
Our experiments were run on virtual machines on Microsoft enclave-disabled keys for SQL-AE-DET. Since the column
Azure [21]. The SQL Server AE instance was run on a stan- to CEK mapping does not impact performance we choose
dard DS15 v2 virtual machine with 20 cores and 140 GB of the simplest configuration of using the same CEK for all
main memory. The VM was equipped with two separate P30 encrypted columns. All other columns in the database remain
premium SSD disks to store data and log files on separate unencrypted.
drives. We used VBS enclaves [35] and we allocated four We made minor changes to the TPC-C stored procedures
threads to run the enclave. to reflect current functionality limitations of Always En-
We used a Microsoft internal SQL Server TPC-C driver crypted. We modify the Payment and Order Status transac-
called Benchcraft to run the benchmark. Benchcraft was run tions to remove the ORDER BY on C_FIRST (since we do not,
on a standard D4 v2 virtual machine with 8 cores and 28 GB at this point support ORDER BY using enclaves); both trans-
of main memory. The Host Guardian Service (Section 4.2) actions select a subset of customers using a filter predicate,
was run on a standard DC4s VM with 4 cores and 16 GB of order these customers by their first names (C_FIRST) and
main memory to attest the VBS enclave. use this ordered list to identify the median customer. We also
create a NONCLUSTERED (non-unique) index CUSTOMER_NC1
5.2 Systems Compared on CUSTOMER(C_W_ID, C_D_ID, C_LAST, C_FIRST, C_ID)
deviating from the benchmark specification which requires
We compared the performance of three SQL Server configu-
a unique constraint on these columns.
rations described below.
With this encryption configuration, the only scalar opera-
1. SQL-PT : This configuration runs SQL Server on TPC-C tion over encrypted data is the equality C_LAST = @c_last
data with no encryption. Further, the TPC-C client driver of values in the C_LAST column against a provided parameter
connects to SQL Server using a non-AE connection string. @c_last, and this operation is used by both Payment and
This configuration serves as the baseline to measure various Order Status transactions. Together, these two transaction
AE-related overheads. types account for around 47% of the workload; in terms of ex-
2. SQL-PT-AEConn: This configuration again runs SQL Server pressions evaluated, the equality predicate is invoked for 60%
with no data encryption. However, the client driver now con- of transactions of each type (the other 40% involve an equal-
nects using an AE connection string. While basic transaction ity of C_ID and not over C_LAST). In summary, around 28%
processing remains unchanged, this configuration introduces of the workload (in terms of expression evaluation) involves
one additional client-server roundtrip for computation on encrypted data.
sp_describe_parameter_encryption as described in Sec- The benchmark includes a scaling factor W representing
tion 4.1. the number of warehouses. For our experiments, we used
W = 800; consistent with the benchmark specification, this
3. SQL-AE: This configuration runs SQL Server with data is the smallest scaling factor that maximized cpu usage for
encryption and therefore relies on Always Encrypted for SQL-PT, our baseline configuration. At W = 800 warehouses
transaction processing. By default, we use RND encryption
the CUSTOMER table has 24 million rows.
with enclave enabled keys. We also vary the number of en-
clave threads. SQL-AE-RND-1 and SQL-AE-RND-4 specify 1
and 4 enclave threads respectively. We also consider a variant 5.4 Results
of this configuration with DET encryption and non-enclave 5.4.1 AE vs. Baselines. Figure 8 shows the relative (normal-
enabled keys which does not use the enclave. We prefix these ized) performance of the three configurations. We varied
variants as SQL-AE-RND-1, SQL-AE-RND-4 and SQL-AE-DET ; the number of TPC-C client driver threads (shown on the
SQL-AE refers to SQL-AE-RND-4 unless explicitly qualified. horizontal axis) and for each setting show the normalized
1523
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
1.2 1.2
1 1
Normalized Throughput (tps)
Normalized Throughput
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
50 100 150 200
0
SQL-PT SQL-PT-AEConn SQL-AE-RND-4 SQL-PT SQL-PT-AEConn SQL-AE-DET SQL-AE-RND-4 SQL-AE-RND-1
Figure 8: Normalized TPCC benchmark transaction Figure 9: Normalized TPCC benchmark transaction
processing rates for the three systems compared for processing rates comparing enclave-based Always En-
different number of TPCC client driver threads. The crypted processing and non-enclave-based processing
benchmark scaling factor was W = 800. using DET with 100 client driver threads and W = 800
.
throughput for the three configurations. At 100 client driver
threads the throughput on all three configurations is close
guarantees using encryption. The main challenge of comput-
to or at their respective maximums. Under this load, AE cur-
ing on encrypted data is addressed by AE using property-
rently achieves roughly half the throughput of the baseline
preserving deterministic encryption in its initial release, and
plaintext SQL Server.
going ahead, with a trusted execution environment in the
The SQL-PT-AEConn system that runs SQL Server with
form of an enclave running within the SQL Server process.
no encryption but with an AE connection achieves 64% of
Our design takes a first step towards supporting richer func-
the throughput of the plaintext baseline. This suggests that
tionality on encrypted data while providing confidentiality
the bulk of the drop of performance happens due to the addi-
guarantees against a strong adversary that can compromise
tional roundtrip introduced by the
SQL Server, in the process enabling complex administrative
sp_describe_parameter_encryption call to provide trans-
tasks to be undertaken without access to sensitive data. The
parency. However, we believe this overhead is not fundamen-
AE system is the first of its kind in the industry.
tal and can be reduced with client-side caching of the results
We are still in early days of understanding and exploit-
of sp_describe_parameter_encryption.
ing the full potential of the enclave-based AE architecture.
5.4.2 Enclave vs. Deterministic Encryption. Figure 9 com- While our initial performance improvements are promising,
pares the performance of enclave-based processing with RND we are working on further performance improvements. In
encryption (SQL-AE-RND) against non-enclave based pro- its current form, AE restricts query functionality and is not
cessing with DET encryption for 100 client driver threads intended as a data protection mechanism for the entire data-
and 800 warehouses. base, but rather for a small subset of sensitive columns such
The performance of SQL-AE-DET is roughly in between as personally identifiable identifiers. The main avenue for
those of SQL-PT-AEConn and SQL-AE-RND. With the opti- future work is to make AE a general-purpose solution for all
mizations we currently support, enclave based computation data without restricting query functionality.
is 12.3% slower than DET encryption. We expect this gap to
narrow as we add further optimizations. Since enclaves pro- ACKNOWLEDGMENTS
vide further functionality, we view this result as promising AE is the result of a large scale collaboration among SQL,
and an acceptable overhead for the additional functionality. Windows and Microsoft Research. We acknowledge the con-
tributions of Manuel Costa, Elnata Degefa, Kedar Dubashi,
6 CONCLUSIONS AND FUTURE Benjin Dubishar, Raul Garcia, Istvan Haller, Joachim Ham-
DIRECTIONS mer, Ajay Manchepalli, Bala Neerumalla, Aditya Nigam and
This paper presented Always Encrypted, a feature of Mi- Nikhil Vithlani, who contributed to the design and imple-
crosoft SQL Server that offers end-to-end data confidentiality mentation of AE.
1524
Industry 3: Cloud and Distributed Databases SIGMOD ’20, June 14–19, 2020, Portland, OR, USA
1525