Acaces2019 Proc Arch Sec Part-3
Acaces2019 Proc Arch Sec Part-3
Jakub Szefer
Assistant Professor
Dept. of Electrical Engineering
Yale University
(These slides include some prior slides by Jakub Szefer and Shuwen Deng from HOST 2019 Tutorial)
ACACES 2019 – July 14th - 20th, 2019
Slides and information available at: https://caslab.csl.yale.edu/tutorials/acaces2019/
ACACES Course on Processor Architecture Security
1
© Jakub Szefer 2019
Logical Isolation and Memory Hierarchy
Most units in the memory hierarchy have been shown to be vulnerable to timing attacks:
• Caches
• Cache Replacement Logic
• Load, Store, and Other Buffers
• TLBs
• Directories
• Prefetches
• Coherence Bus and Coherence State
• Memory Controller and Interconnect
Emoji Image:
https://www.emojione.com/emoji/2668
ACACES Course on Processor Architecture Security
© Jakub Szefer 2019 3
Securing the Memory Hierarchy
• To prevent timing attacks, “secure” versions of different units in the memory hierarchy have been
proposed and evaluated
• Most defenses leverage ideas of partitioning and randomization as means
of defeating the attacks
• Of course can always turn off the different units to eliminate the attacks
• E.g. disable caches to remove cache timing attacks
• This creates possibly large impact on performance
• Some defenses use fuzzy time or add random delays
• Attacker can always get a good timing source, so fuzzy time does not work well
• Random delays simply create more noise, but don’t address root causes of the timing attacks
• Most researchers have focused on secure caches (18 different designs to date!)
• Less studied are TLBs, Buffers, Directories
• Most are related to caches, so secure cache ideas are applied to these
• Software defenses are possible (e.g. page coloring or “constant time” software)
• But require software writers to consider timing attacks, and to consider all possible
attacks, if new attack is demonstrated previously written secure software may no
longer be secure
• Numerous academic proposals have presented different secure cache architectures that aim to
defend against different cache-based side channels.
• To-date there are 18 secure cache proposals
• They share many similar, key techniques
latency ↔ cache
{hit} access
hit or miss
s1 fast{force
• Interference happens within the victim’s process itself evict}
access s
s5
Memory reuse conditions
6
{bypass} {replace}
ld
• Hit-based vulnerabilities return s4
{return data}
s3
• Goal: limit the victim and the attacker to be able to only access a limited set of cache blocks
• Partition among security levels: High (higher security level) and Low (lower security level)
or even more partitions are possible
• Type: Static partitioning v.s. dynamic partitioning
• Partitioning based on:
• Whether the memory access is victim’s or attacker’s
• Where the access is to (e.g., to a sensitive or not memory region)
• Whether the access is due to speculation or out-of-order load or store,
or it is a normal operations
• Partitioning granularity:
• Cache sets
• Cache lines
• Cache ways
• Randomization aims to inherently de-correlate the relationship among the address and the
observed timing
Observed timing from
cache hit or miss
Information of victim's security
critical data's address Observed timing of flush
or cache coherence
operations
• Randomization approaches:
• Randomize the address to cache set mapping
• Random fill
• Random eviction
• Random delay
• Goal: reduce the mutual information from the observed timing to 0
• Some limitations: Requires a fast and secure random number generator, ability to predict the
random behavior will defeat these technique; may need OS support or interface to specify range
of memory locations being randomized; …
ACACES Course on Processor Architecture Security
© Jakub Szefer 2019 11
Differentiating Sensitive Data
• Allows the victim or management software to explicitly label a certain range of the data of
victim which they think are sensitive
• Can use new cache-specific instructions to protect the data and limit internal interference
between victim’s own data
• E.g., it is possible to disable victim’s own flushing of victim’s labeled data, and therefore
prevent vulnerabilities that leverage flushing
• Has advantage in preventing internal interference
• Allows the designer to have stronger control over security critical data
• How to identify sensitive data and whether this Set-associative cache
identification process is reliable are open
research questions
• Independent of whether a cache uses
partitioning or randomization sets
• Partitioning-based caches
• Static Partition cache, SecVerilog cache, SecDCP cache, Non-Monopolizable (NoMo) cache,
SHARP cache, Sanctum cache, MI6 cache, Invisispec cache, CATalyst cache, DAWG cache,
RIC cache, Partition Locked cache
• Randomization-based caches
• SHARP cache, Random Permutation cache, Newcache, Random Fill cache, CEASER cache,
SCATTER cache, Non-deterministic cache
Set-Associative Cache
L H
sets
ways
ACACES Course on Processor Architecture Security
© Jakub Szefer 2019 14
SecVerilog Cache
Zhang, D., Askarov, A., & Myers. "Language-based control and mitigation of timing channels”, 2012.
sets h1
• Dynamically partitioned
• Process-reserved ways and unreserved ways
• 𝑁 : number of ways, 𝑀 : number of SMT threads, 𝑌 each thread’s exclusively
.
reserved blocks, 𝑌 ∈ [0, 𝑓𝑙𝑜𝑜𝑟(/)]. E.g.,
• NoMo-0: traditional set associative cache
.
• NoMo- 𝑓𝑙𝑜𝑜𝑟(/): partitions evenly for the different threads and no non-
reserved ways
• NoMo-1:
external miss-
✓ ✓ ~ ✓
based attacks
internal miss-
X X X X
based attacks
external hit-
X ✓ ✓ X
based attacks
internal hit-
X X X X
based attacks
• Replacement policy
• Cache hit is allowed among different processes
• Cache misses and data to be evicted following the order:
1. Data not belonging to any current processes
2. Data belonging to the same process
3. Random data in the cache set + an interrupt generated to the OS
• Eviction between different processes becomes random
• Disallow flush (clflush) in the R or X model
• Invalidation using cache coherence is still possible
ACACES Course on Processor Architecture Security
© Jakub Szefer 2019 19
Sanctum Cache
Costan, V., Ilia L., and Srinivas D., "Sanctum: Minimal hardware extensions for strong software isolation”, 2016.
• Sanctum
• Open-source minimal secure processor
• Provide strong provable isolation of software modules running concurrently and
sharing resources
• Isolate enclaves (Trusted Software Module equivalent) from each other and OS
• Sanctum cache is a modified cache
• Their changes cover L1 cache, TLB, and last-level cache (LLC)
• L1 cache and TLB
• Security monitor (software) flushes core-private cache lines to achieve isolation
• LLC
• Page-coloring-based cache partitioning ensure per-core isolation between OS
and enclaves
• Assign each enclave or OS to different DRAM address regions
• Targets at LLC
• Uses Cache Allocation Technology (CAT) from Intel to do coarse-grained partitioning
• Available for some Intel processors
• Allocates up to 4 different Classes of Services (CoS) for separate cache ways
• Replacement of cache blocks is only allowed within a certain CoS.
• Partition the cache into secure and non-secure parts
• Uses software to do fine partition
• Secure pages not shared by more than one VM
• Pesudo-locking mechanism pins certain page frames (immediately bring back after eviction)
• Malicious code cannot evict secure pages
1. D.L = 0; R.L = 1
2. D.L = 1; R.L = 1; D.ID != R.ID
external miss-
✓ ✓ ✓ ✓ ✓
based attacks
internal miss-
X X ✓ ✓ X
based attacks
external hit-
X ✓ ✓ X X
based attacks
internal hit-
X X ✓ X X
based attacks
Uses number of
ACACES Course on Processor Architecture Security assumptions, such
© Jakub Szefer 2019 as pre-loading 24
Random Permutation (RP) Cache
Wang, Z., and Lee, R.B., "New cache designs for thwarting software cache-based side channel attacks”, 2007.
• Uses randomization
• De-correlates the memory address accessing and timing of the cache
• Adds process ID and protection bit (P) extended for each line P ID Original cache line
• Replacement policy
• Cache hits
• When both process ID and the address are the same
• Cache misses (D: brought in; R: replaced)
• D and R in the same process, have different protection bits
• Arbitrary data of a random cache set S’ is evcted
• D is accessed without caching
• D and R in the different processes
• D is stored in an evicted cache block of S’
• Mapping of S and S’ is swapped
• Other cases
• Normal replacement policy is used
• Uses cache access decay to randomize the relation between accessing and timing
• Counters control the decay of a cache block
• Local counter records the interval of its data activeness
• Increased on each global counter clock tick
• When reaching a predefined value
• Corresponding cache line is invalidated
• Non deterministic cache randomly sets local counter’s initial value
• Can lead to different cache hit and miss statistics
• May have larger performance degradation compared with other data-targeted
secure caches
external miss-
✓ ✓ X ✓ ✓ O
based attacks
internal miss-
X ✓ X ✓ ✓ O
based attacks
external hit-
✓ ✓ ✓ X ~ O
based attacks
internal hit-
X X ✓ X X O
based attacks
• Speculation-related cache
• MI6
• Secure Enclaves in a Speculative Out-of-Order Processor
• Isolation of enclaves (Trusted Software Module equivalent) from each other and OS
• Combination of:
• Sanctum cache’s security feature
• Disabling speculation during the speculative execution of memory related operations
• Speculation-related cache
• A speculative buffer (SB) will store the unsafe speculative loads (USL) before
modifying the cache states
• Mismatch of data in the SB and the up-to-date values in the cache
• Squashed
• The core receives possible invalidation from the OS before checking of memory
consistency model
• No comparison is needed
• Targets on Spectre-like attacks
external miss-
✓ ✓ X ✓ ✓ ✓
based attacks
internal miss-
X ✓ X ✓ X X
based attacks
external hit-
✓ ✓ X ✓ ✓ ✓
based attacks
internal hit-
X ✓ X ✓ X X
based attacks
Random Fill
SecVerilog
InvisiSpec
Newcache
SCATTER
CATalyst
Sanctum
CEASER
Non Det.
SecDCP
SHARP
DAWG
NoMo
SP*
RIC
MI6
RP
PL
average 3.5%, 3.5%
within
reduce slowdow 9% if 1% for 7%
L1 the
12.5% slowdo n of setting for perfor with
1.2% and 10%
better wn of 0.7% for impr 0.3%, the perfor - simpl
avr., L2 12 range
Perf.
- - - - - - - LLC - - - - ge - - -
power
0.61 1.5nj
mW
L1-SB
LLC-SB
Area 0.17
Area
- - - - - - - - - - - - - - -
(mm2) 6%
0.0174
0.0176
• Various buffers store data or memory translation based on the history of the code executed
on the processor
• Hits and misses in the buffers can potentially be measured and result in timing attacks
• This is different from recent MDS attacks, which abuse the buffers in another way: MDS attacks
leverage the fact that data from the buffers is sometimes forwarded without proper address
checking during transient execution
• Towards secure buffers
• No specific academic proposal (yet)
• Partitioning – can partition the buffers, already some are per hardware thread
• Randomization – can randomly evict data from the buffers or randomly bring in data,
may not be possible
• Add new instructions to conditionally disable some of the buffers
• Timing variations due to hits and misses exist in TLBs and can be leveraged to build
practical timing-based attacks:
• TLB timing attacks are triggered by memory translation requests,
not by direct accesses to data
• TLBs have more complicated logic, compared to caches,
for supporting various memory page sizes
• Further, defending cache attacks does not protect against TLB attacks
(4) Random
Fill
(2) Normal demand (6) No DCache
DCache address (3) Send Fill
Fill SecR
To the signal (2) Probe
Mux
RNG Random Fill
Generation
TLB buffer
TLB
sbase ssize
Page Table Walker
Page Table Walker
(a) (b)
(b)
• Directories are used for cache coherence to keep track of the state of the data in the caches
• By forcing directory conflicts, an attacker can evict victim directory entries, which in turn
triggers the eviction of victim cache lines from private caches
• SecDir re-allocates directory structure to create per-core private directory areas used in a
victim-cache manner called Victim Directories; the partitioned nature of Victim Directories
prevents directory interference across cores, defeating directory side-channel attack.
Secure TLBs [S. Deng, et al., 2019] For SR TLB: IPC 1.4%, MPKI 9% SPEC2006
SecDir [M. Yan, et al., 2019] few % (some benchmarks faster SPEC2006
some slower)
• In response to timing attacks on caches, and other parts of the processor’s memory
hierarchy, many secure designs have been proposed
• Caches are most-researched, from which we learned about two main defense techniques:
• Partitioning
• Randomization
• The techniques can be applied to other parts of the processor: Buffers, TLBs, and Directories
• Other parts of memory hierarchy are still vulnerable: memory bus contention, for example
Related reading…
https://caslab.csl.yale.edu/books/