Understanding The Security of Discrete GPUs

Uploaded by

polycatnft

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views11 pages

Understanding The Security of Discrete GPUs

Uploaded by

polycatnft

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Understanding The Security of Discrete GPUs

Zhiting Zhu Sangman Kim Yuri Rozhanski

The University of Texas at Austin The University of Texas at Austin Technion-Israel Institute of Technology
[email protected] [email protected] [email protected]

Yige Hu Emmett Witchel Mark Silberstein

The University of Texas at Austin The University of Texas at Austin Technion-Israel Institute of Technology
[email protected] [email protected] [email protected]

CCS Concepts •Security and privacy → Operating systems se- that contains quirky features absent from CPUs, such as auxiliary
curity; embedded microprocessors, and by the GPU’s deep software stack
ACM Reference format: whose boundaries and interactions with GPU hardware are deliber-
Zhiting Zhu, Sangman Kim, Yuri Rozhanski, Yige Hu, Emmett Witchel, ately blurred by the vendor. Unfortunately, the complexity of this
and Mark Silberstein. 2016. Understanding The Security of Discrete GPUs. interplay can hide vulnerabilities that attackers can use to subvert a
In Proceedings of GPGPU-10, Austin, TX, USA, February 04-05 2017, 11 pages. GPU’s expected behavior and break critical security properties, as
DOI: http://dx.doi.org/10.1145/3038228.3038233 we show in this paper.
For discrete GPUs, their independent memory system and com-
Abstract putational resources are physically partitioned from the main CPU
GPUs have become an integral part of modern systems, but which makes it plausible that a GPU could function as a secure
their implications for system security are not yet clear. This pa- processor; it might be possible to protect computation on a dis-
per demonstrates both that discrete GPUs cannot be used as secure crete GPU from code executing on the CPU. While plausible in
co-processors and that GPUs provide a stealthy platform for mal- theory, we systematically analyze the shortcomings of one specific
ware. First, we examine a recent proposal to use discrete GPUs proposal to use GPU hardware registers as secure storage (called
as secure co-processors and show that the security guarantees of PixelVault [44]). We show that PixelVault’s security depends on
the proposed system do not hold on the GPUs we investigate. Sec- assumptions about GPU hardware features that do not hold, and in
ond, we demonstrate that (under certain circumstances) it is possi- practice fully depend on the vulnerable GPU software interface that
ble to bypass IOMMU protections and create stealthy, long-lived GPU vendors expose.
GPU-based malware. We demonstrate a novel attack that compro- The flaws we find in PixelVault’s GPU security model stem from
mises the in-kernel GPU driver and one that compromises GPU mi- the lack of a clear software/hardware boundary and shifting respon-
crocode to gain full access to CPU physical memory. In general, we sibilities of hardware and software across GPU generations. For ex-
find that the highly sophisticated, but poorly documented GPU hard- ample a non-bypassable hardware feature in one version of a GPU
ware architecture, hidden behind obscure close-source device dri- can migrate to a bypassable software feature in another version.
vers and vendor-specific APIs, not only make GPUs a poor choice The problem with such a fluctuating software/hardware boundary
for applications requiring strong security, but also make GPUs into is that it becomes hard, if not impossible, to reason about the actual
a security threat. security guarantees of a GPU system.
We also systematically analyze risks that originate with NVIDIA
1 Introduction GPUs, where the GPU serves as a host for stealthy, long-lived ma-
GPUs have enjoyed increasing popularity over the past decade, licious code. It is difficult to detect the execution of GPU-hosted
both as hardware accelerators for graphics applications and as malware and in certain cases, it is even difficult to detect its pres-
highly parallel general-purpose processors. With general purpose ence. We demonstrate attack code running on the NVIDIA GPU
computing on GPUs (GPGPUs) diffusing into the mainstream, re- that reads secrets from CPU memory and corrupts the memory state
searchers are looking at their security implications. In this paper we of CPU computations by leveraging GPU Direct Memory Access
analyze two related questions: can GPUs be used to enhance the se- (DMA) capabilities.
curity of a computing platform? and can GPUs be used to subvert We demonstrate two novel attacks: one against the proprietary
the security of a computing platform? in-kernel closed-source GPU driver, the other against the GPU mi-
Understanding the security of GPUs requires understanding the crocode running on an auxiliary microprocessor resident on the
interplay among the GPU hardware, its software stack, and the GPU card. For the driver attack, we binary patch the proprietary
busses and chipsets that coordinate a platform’s transfer of data. NVIDIA GPU driver while it is loaded and being used by the OS
The interplay of these features is complicated by GPU hardware kernel, and force it to map sensitive CPU memory into the address
space of an unprivileged GPU program. Our second attack lever-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed ages auxiliary microprocessors [27, 32] which GPUs use for various
for profit or commercial advantage and that copies bear this notice and the full citation functions like power management and video display management.
on the first page. Copyrights for components of this work owned by others than the These microprocessors are not exposed as part of the standard GPU
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission programming model (e.g., CUDA or OpenCL). We implement the
and/or a fee. Request permissions from [email protected]. attack microcode running on such an auxiliary microprocessor that
GPGPU-10, Austin, TX, USA combines the functionality of the original microcode with malicious
© 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM.
978-1-4503-4915-4/17/02. . . $$15.00
DOI: http://dx.doi.org/10.1145/3038228.3038233
GPGPU-10, February 04-05 2017, Austin, TX, USA Zhu et. al.

2.1 GPU execution model

CPU GPU
GPUs are slave processors controlled entirely by a CPU execut-
Process Kernel ing a GPU driver running in privileged mode. Any CPU process can
initiate the execution of a GPU program by making API calls to the
kernel-resident GPU driver. The initiating CPU process is called the
GPU-controlling process. The unprivileged CPU process invokes a
Chipset GPU GPU kernel, which is specially written and compiled for execution
PCIe Bus on a GPU.
IOMMU Chipset
There are public APIs to allow the CPU-controlling process to
manipulate the address space of any GPU kernel that it launches.
Both NVIDIA CUDA and OpenCL contain APIs that allow man-
agement of the GPU kernel’s memory, including transfer of data
CPU RAM GPU RAM from/to the CPU, and mapping CPU memory into a kernel’s ad-
Figure 1. CPU-GPU architecture overview. IOMMU use is op- dress space, as we discuss next. Once the GPU-controlling process
tional and its behavior is configured by the operating system. terminates, all the GPU resources associated with it are released.
In particular, the driver reclaims the GPU memory and terminates
active GPU kernels associated with the process.
2.2 Memory Hierarchy
code that can read and write arbitrary CPU memory. To the best of
NVIDIA GPUs contain several streaming multiprocessors
our knowledge it is the first such attack on NVIDIA GPUs.
(SMs), which concurrently run thousands of sequential threads.
Our attacks rely on unlimited DMA access from GPU to CPU
Each thread may access its private registers, local on-die scratchpad
physical memory. The common way to foil such DMA attacks has
memory, and global GPU memory shared among all SMs. Global
been to restrict peripheral access to system memory by using the
memory is cached with two levels of hardware cache. The L1 data
IO memory management unit (IOMMU), intended to protect CPU
cache is local for each SM, while the L2 is shared across all SMs.
memory against malicious or malfunctioning peripherals. We, how-
Instruction cache. NVIDIA GPUs have multiple levels of instruc-
ever, show novel techniques to subvert IOMMU security and bypass
tion cache, though the exact architecture has not been officially dis-
its protection in the strict operation mode that has previously been
closed. Wong et al. [48] suggest that there are three levels. The in-
considered the most secure (§4).
struction caches in GPUs are used exclusively for instructions. Fur-
Similar DMA attacks on other peripherals are known [10, 11,
ther, the instruction cache is not kept coherent with the GPU global
34, 40] However, our GPU DMA attacks are particularly dangerous
memory where the instructions are stored. The GPU driver flushes
because they are relatively easy to program: the ability of modern
the instruction cache when a new GPU program starts. However, if
GPUs to execute general purpose code lowers the bar for imple-
the CPU writes to GPU instructions in memory while the GPU is
menting sophisticated malware on GPU.
running, the (stale) instructions are not invalidated from the cache.
Our attacks assume an adversary who transiently gains the ability
NVIDIA does not provide any public API for flushing the instruc-
to load a kernel module. The primary danger of the GPU-based
tion cache. Therefore, overwriting a GPU kernel’s program code in
attacks is their stealthiness. They would evade most known tools
GPU memory while a GPU is running may not change the actual
and research proposals that provide software system integrity.
running program.
Our work focuses on discrete GPUs from NVIDIA which come
mounted on an expansion card which is plugged into a computer’s 2.3 Accessing CPU memory from the GPU
IO bus. The only current proposal to use a GPU as a secure proces- Modern GPUs provide limited access to CPU memory
sor is for a discrete GPU. While integrated GPUs also use IOMMUs from programs running on a GPU. In particular, a stan-
for safety, we leave for future work the dangers and mitigations for dard API (cudaHostRegister for CUDA [33] and
integrated GPU security. clEnqueueMapBuffer for OpenCL [23]) allows the CPU
We begin by summarizing the portions of the GPU architecture to map a CPU memory region into a GPU kernel’s address space.
that are most relevant for security (§2), and then describe how we Once mapped, the GPU may directly access the mapped CPU
can defeat the security of PixelVault (§3). Then we discuss how memory without CPU involvement via direct memory access
to bypass IOMMU protections (§4) and attack CPU memory using (DMA). Similarly, GPUs may be configured to access memory-
a compromised GPU driver (§5) and compromised microcode run- mapped input/output (MMIO) regions of other peer peripheral
ning on an embedded GPU microprocessor (§6). We summarize devices connected to the PCIe bus. For example, the NVIDIA
related work (§7) and conclude. GPUDirectRDMA API enables peer-to-peer access to Infiniband
network cards, allowing GPU programs to communicate over the
2 Architecture network without CPU mediation [31]. The GPU internal page table
This section summarizes relevant aspects of GPU architecture, is generally accessible only through the GPU driver and is not
paying close attention to the memory subsystem and CPU-GPU visible to the CPU OS.
communications. We describe discrete GPUs that connect to the In contrast to CPU programs, in which memory protection is en-
host via a peripheral component interconnect express (PCIe) bus, forced for every load, store and instruction execution at runtime by
because this paper focuses on discrete GPUs, and in particular on hardware, GPU accesses to CPU memory do not pass through the
NVIDIA GPUs. A high-level view of a CPU-GPU system model CPU’s memory management unit (MMU) and therefore the CPU
used throughout this paper is shown in Figure 1. Process and kernel
are shaded, because they are software abstractions.
Understanding The Security of Discrete GPUs GPGPU-10, February 04-05 2017, Austin, TX, USA

performs no runtime checks. Rather, the GPU driver validates ac- PixelVault threat model. PixelVault assumes that the system boots
cess rights at the time of mapping in software. Additional hardware from a trusted configuration, and it can set up its execution environ-
protection can be provided by the IOMMU which we describe next. ment on the GPU. Once PixelVault is established, the attacker may
2.4 IOMMU have full control over the platform. Specifically, the attacker can
When a device performs direct memory access (DMA) to read execute code at any privilege level and it has access to all platform
or write CPU physical memory, it uses device addresses. When the hardware.
IOMMU is working, it maps device addresses to CPU physical ad- To achieve this goal PixelVault leverages several characteristics
dresses (just as the CPU’s MMU maps virtual to physical address). of the NVIDIA GPU architecture and execution model. While some
The IOTLB caches entries from the IO page table, just as the CPU’s of these characteristics are well known and have been officially con-
TLB caches entries from the process’ page table. IO page table en- firmed, some were only assumed to be correct and others were par-
tries contain protection information, and the IOMMU will check tially validated experimentally.
each access to system memory from a peripheral device, to make Below we list only those assumptions that we later experimen-
sure it has sufficient permissions. tally refute. Even if only one of these assumptions is not satisfied,
The IOTLB is not kept coherent with the IO page table by hard- PixelVault is no longer able to guarantee the secrecy of the master
ware, similarly to TLBs in most common CPUs. Software must encryption key under its threat model.
explicitly manage the IOTLB, flushing the cached mappings when
they are removed from the IO page table. We exploit this software- 1. It is impossible to replace the code of a running GPU kernel
managed IOTLB coherence mechanism to circumvent IOMMU pro- if the code is fully resident in the instruction cache. This
tection and enable unauthorized access to system memory from the feature is critical to ensuring that an adversary cannot re-
GPU, as we discuss in Section 4. place the PixelVault GPU code without stopping the kernel,
2.5 Microprocessors and MMIO registers in GPUs and therefore without losing the master key stored in GPU
GPUs expose a set of memory mapped input output (MMIO) registers. We show that NVIDIA GPUs have unpublished
registers used by the driver for GPU management [2, 27]. In addi- MMIO registers that flush the instruction cache, allowing
tion, they contain several special-purpose microprocessors used to replacement of code from running kernels that are as small
manage internal hardware resources. A GPU driver updates GPU as 32B (4 instructions).
microprocessor code every time a GPU is initialized. The docu- 2. The contents of GPU registers cannot be retrieved after ker-
mentation about the actual purpose of microprocessors and MMIO nel termination. This feature is essential to PixelVault’s
registers used in NVIDIA GPUs is fairly scarce; it usually comes ability to prevent a strong adversary from retrieving a mas-
from unofficial sources, such as open-source driver developers who ter key by stopping a running PixelVault kernel. We show
partially reverse-engineered the official driver. that under certain conditions it is possible to retrieve the
We found that the GPU MMIO registers can invalidate the GPU contents of registers after kernel termination, and the Pixel-
instruction caches. Flushing the instruction caches is key to dynam- Vault design satisfies these conditions.
ically updating the code of a running kernel, which breaks the se- 3. A running GPU kernel cannot be stopped and debugged if
curity guarantees of PixelVault (§ 3.2). Our microcode attack (§ 6) it is not compiled with explicit debug support. This fea-
leverages an important capability of NVIDIA microprocessors that ture is necessary to ensure that an adversary cannot retrieve
allows unrestricted access to GPU and CPU memory [17]. register contents by attaching a GPU debugger to the run-
ning PixelVault kernel. We show that newer versions of
3 Attacking PixelVault the NVIDIA CUDA runtime provide support for attaching
In this section we analyze the GPU model and security guaran- a debugger to any running kernel and it is unclear how to
tees PixelVault uses to claim a GPU as a secure co-processor. We disable this capability. This attack requires root privileges
then present several attacks that clearly violate PixelVault’s assump- to attach to any running process, yet this is permitted under
tions, and therefore its security properties. We conclude that sys- PixelVault threat model.
tems developed using PixelVault’s approach are insecure.
Experimental platform. The attacks described in Section 3.4 In the remainder of this section, we explain in more details how
and Section 3.3 are performed on NVIDIA Tesla C2050/C2075 we have invalidated the listed assumptions, thereby debunking Pix-
GPU (Fermi) and an NVIDIA GK110GL Tesla K20c (Kepler), us- elVault’s security.
ing NVIDIA driver versions 319.37 and 331.38 respectively, and 3.2 Replacing PixelVault as it runs
CUDA version 5.5. The attack in Section 3.2 is performed on an To run a kernel, a GPU needs the binary code to be resident in
NVIDIA Tesla C2050/C2075 (Fermi) with the open source nou- GPU memory. The binary is transferred from CPU memory to GPU
veau [29] and gdev [21] [22]) drivers. memory by the driver prior to kernel invocation. GPUs do not sup-
3.1 PixelVault summary and guarantees port code modification while the kernel is executing. Therefore,
PixelVault proposes a GPU-based design of a security co- PixelVault assumes that GPU hardware makes it impossible for an
processor for RSA and AES encryption which is resilient to even attacker to alter the execution of a running PixelVault kernel by re-
a strong adversary with full control of CPU and/or GPU software. placing the original PixelVault binary in GPU memory, as long as
PixelVault stores the secret keys encrypted in GPU memory, and the the binary is entirely resident in the instruction cache. PixelVault
master key in GPU registers. It implements a software infrastruc- explicitly validates that its kernel is small enough to fit in the in-
ture that strives to prevent any adversarial access to these registers struction cache. Assuming the kernel is simple, PixelVault also ex-
from CPU or GPU. plores all possible execution paths to make the kernel fully resident
in the cache soon after starting execution.
GPGPU-10, February 04-05 2017, Austin, TX, USA Zhu et. al.

We find that the lack of software interfaces for dynamic code We find that a cuda-gdb debugger can retrieve register values
update does not imply the lack of hardware support. Using the open even after kernel termination. However it requires that other GPU
source Envytools [13] reverse engineering toolkit, we can invalidate tasks are concurrently active with the execution of the victim ker-
the instruction caches and replace the instructions of the running nel. Specifically, NVIDIA CUDA enables multiple GPU operations
kernel with the attacker’s modified instructions from GPU memory. such as CPU-GPU memory transfers or GPU invocations to be in-
Technical details. We perform an experiment to show that we can voked concurrently by the same CPU process. Each operation is in-
dynamically update GPU kernel code using a matrix addition ker- voked in its own CUDA stream, and a GPU handles the operations
nel and an updater process. The updater process locates the GPU in different streams concurrently. We found that if GPU kernel B is
kernel code in GPU physical memory by searching for the kernel’s invoked in parallel with running kernel A, A’s register state can be
instructions. In the case of PixelVault, the binary is not secret, so retrieved using the debugger API even after A terminates, as long
it can be detected by an attacker. Once invoked, PixelVault kernel as B is still running.
runs indefinitely, giving the updater enough time to identify and up- The PixelVault implementation employs two CUDA streams,
date its code. If the code is erased from memory, the attacker can one for kernel execution and another for data transfers between a
speculate on its location, methodically working through the address CPU and a GPU. An attacker may take advantage of the data trans-
space. fer stream to invoke a long-running kernel, terminate PixelVault,
The updater replaces the addition instructions in our test kernel and retrieve its secrets.
with subtraction instructions. When the effective size of the code Technical details. We modified the cuda-gdb source code to
in the GPU kernel loop is larger than 32 KB, overwriting the in- read the registers of a terminated kernel. The modifications were
structions in memory causes the behavior of the kernel to change. necessary because by default cuda-gdb will refuse to read the
Such a large kernel (presumably larger than the size of the last level registers of a terminated kernel. If we launch two kernels on differ-
instruction cache) experiences instruction cache misses at runtime, ent CUDA streams, cuda-gdb can read the register values from
yielding a result that is not a simple matrix addition. the terminated kernel so long as the other kernel is running. As soon
However overwriting the kernel’s program has no effect if the as both kernels terminate, we cannot access either of their registers.
size of the kernel’s main loop is smaller than 32KB. In these cases, 3.4 Stopping PixelVault with a debugger
only if we flush the instruction cache via GPU MMIO registers do The PixelVault version discussed in their paper runs NVIDIA
we see the expected change in the kernel output. CUDA version 4.2. This version of CUDA provided no support for
Our prototype requires 3.1 seconds to scan and identify kernel attaching and setting a breakpoint in a running GPU kernel, unless
code in GPU memory. Therefore, we can only effectively flush the that kernel was explicitly compiled with debug information. This
cache for long running kernels. The PixelVault kernel is intended to property of the runtime environment disguised itself as a hardware
run continuously as it provides a runtime encryption service, there- feature. Therefore, PixelVault relied on it to ensure that an adver-
fore it is vulnerable to this cache flush attack. sary cannot attach a debugger to retrieve the values of GPU registers
MMIO registers for instruction cache flush. To invalidate the which store the secret keys, without stopping the running PixelVault
L1 instruction cache with the updated code memory, it is nec- GPU kernel and consequently without erasing the contents of those
essary to flush all the cache levels. The addresses in paren- registers.
theses are the offsets within the MMIO region, which is re- However, with the GPU system software evolving so rapidly,
ferred by the first set of the PCI base address registers (BAR0) many desirable features like attaching a debugger to a running ker-
of NVIDIA GPUs. The register that flushes the per-GPC nel are added in every new release. In particular, this feature was
caches is PGRAPH.GPC BROADCAST.CCACHE.CACHE CTRL added in the cuda-gdb GPU debugger starting from CUDA ver-
(0x419000), especially the first bit of the 32-bit register. sion 5.0 [18]. By using the CUDA debug API it is possible to stop
For the per-SM flush, we used the first and ninth bits a kernel, and inspect all GPU registers of the executing kernel from
of PGRAPH.GPC BROADCAST.TPC ALL.MP.CCACHE CTRL the CPU. This ability invalidates the privacy guarantees for Pixel-
(0x419ea4). To the best of our knowledge, the use of the ninth Vault’s master encryption key.
bit of CCACHE CTRL for flushing per-SM cache is not reported or We found no simple way of preventing software from being able
documented. to attach to a running kernel. When attaching, cuda-gdb needs
3.3 Capturing PixelVault secrets after termination access to certain predefined memory locations stored as symbols
PixelVault relies on GPU registers being initialized to zero when in libcuda.so [30], which is the main library providing CUDA
a new kernel is loaded onto a GPU and begins execution. Initial zero driver API support to GPU applications. In an older version of the
values for registers is necessary to prevent an adversary from termi- CUDA driver (we tested version 319.37), the attach information re-
nating the PixelVault kernel and running a new kernel that looks sides in the symtab section and it can be safely stripped. However,
for PixelVault secrets in its initial register values. Because initial more recent versions of the library (we tested version 331.38), no
zero values is not a feature officially documented by NVIDIA, 1 longer place the attach information in symtab, they place it in the
the PixelVault developers experimentally validate that the register dynsym section. The dynsym section cannot be stripped from
contents cannot be retrieved by another kernel invoked after Pix- the binary because it holds important data necessary for dynamic
elVault terminates. Yet, there is no guarantee that GPU hardware linking. If we remove the dynsym section from the binary, the
clears registers after termination of a GPU kernel. dynamic linker can no longer load libcuda.so.
It is possible to zero out the entries in the dynsym section used
by cuda-gdb, which causes cuda-gdb to crash the controlling
1 http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#state- CPU process when attaching to the running GPU kernel. PixelVault
spaces-types-and-variables could make its own copy of libcuda.so (so that other users can
Understanding The Security of Discrete GPUs GPGPU-10, February 04-05 2017, Austin, TX, USA

continue to debug their kernels) and just zero out or corrupt the [5, 8, 9]), or by exploiting kernel module loader weaknesses [12].
attach information in dynsym. Ultimately however, the ability to After loading a module, the attacker also has access to the GPU con-
stop the running kernel is a hardware feature that PixelVault cannot trol interface i.e., MMIO register regions, and as we explain in our
disable. Because the PixelVault threat model assumes the attacker microcode attack (§6), it can use these registers to load malicious
controls the host, the attacker can still attach to a running kernel microcode onto an embedded auxiliary GPU processor. Loading
and examine its register state even if the default support for how the microcode is done by reloading the GPU driver module into the
cuda-gdb attaches is removed from a version of libcuda.so. OS kernel. After the malware is installed, the attacker loses the
Technical details. We use cuda-gdb to attach to a running GPU module loading capability and is allowed only unprivileged access.
kernel, and retrieve all GPU registers via the CUDA debugger call If an attacker can load a module, why should he or she bother
CUDBGResult (*CUDBGAPI st::readRegister) even if with the attacks we describe? The primary reason is stealthiness:
the CUDA application is compiled without debug information (i.e., all of our attacks originate with the GPU reading and writing CPU
without the -G flag for the NVCC compiler). memory (e.g., sensitive operating system data structures), making
A simple experiment verifies this attack. A CUDA application them hard to detect. Detecting root-level compromise is the subject
launches a kernel with one thread that spins in an infinite loop of much published work (e.g., [20], [36], [35], [3]), open source
and continuously changes a value stored in a register. We attach tools (e.g., chkrootkit) and commercial tools (e.g., Malwarebytes
cuda-gdb to the GPU-controlling CPU process and then attach AntiRootkit, McAfee Rootkit Remover). To our knowledge, there
to the GPU kernel that the process spawned. All register values are no tools and precious few research studies to detect GPU-based
from the running kernel can be extracted whether or not the GPU malware. Our attacks require no changes to the page table of any
kernel was built with debug information. unprivileged process, and they bypass the CPU’s MMU and mem-
Using this technique, we can also attack a more realistic kernel. ory protection settings. The GPU page table that stores the map-
We implement the AES encryption algorithm in a GPU program, pings into system memory are not visible from the CPU (at least
emulating part of PixelVault’s operation. The AES key is stored in not via the public API). Therefore, once mapped into the GPU, a
GPU registers. While the kernel is running, we attach to it using malicious unprivileged GPU kernel may keep accessing any CPU
cuda-gdb, read the GPU registers, and expose the secret key. memory without raising suspicion.
3.5 Discussion Modern systems, however, usually contain an input/output mem-
Discrete GPUs appear to have potential as secure coproces- ory management unit (IOMMU) which monitors devices’ Direct
sors because they have physically distinct and complete process- Memory Accesses (DMAs) to system memory in order to protect it
ing resources: processor, caches, RAM, and access to I/O. They from unauthorized accesses. The IOMMU restricts the devices to
also have micro-architectural (though seemingly robust) guarantees access only the CPU memory pages specified in its I/O page table.
about non-preemption and an incoherent instruction cache. The Pix- Unlike the hidden GPU page tables, the I/O page table can be mon-
elVault system is an intelligent attempt at trying to build a secure itored by security tools (though we are not aware of any that do),
system from these components. undermining the stealthiness of the attack.
However, our investigation yields the clear conclusion that GPUs The malware, therefore, must circumvent IOMMU protection to
are not appropriate as secure coprocessors and cannot contribute to evade detection, and the next section details the techniques to ac-
the trusted computing base (TCB) of the system. GPUs are complex complish that.
devices that rely on sophisticated proprietary hardware and soft- 4.2 IOMMU
ware which is poorly (often purposefully so) publicly documented We exploit the subtleties of IOTLB management in the Linux
– the opposite of a firm basis for security. GPU manufacturers are kernel and our prototype is based on the Intel IOMMU. We first
not interested in exposing their architecture internals, and they can provide a brief overview of the IOMMU management policies.
easily change the architecture in ways that invalidate the security of IO device drivers strive to make all memory mappings as short-
systems based on a GPU, e.g., by adding preemption. lived as possible to increase security at the expense of higher man-
We have found a variety of documented and undocumented ways agement overheads [26]. The OS, therefore, offers several IOMMU
of violating the security of PixelVault. We learned of the existence configurations that influence the IOTLB management policy and
of many MMIO registers from the Envytools GPU reverse engineer- enable different tradeoffs between management cost and security.
ing project [13]. However, some registers that allowed us to invali- The configurations and their respective management policies are
date the GPU instruction cache are not documented as flushing the summarized in Table 1.
cache—yet another example of an obscure GPU architectural sub- IOMMU disabled. Though it is detrimental to security, many
tlety which undermines its use as a secure coprocessor. systems, especially those that include discrete GPUs, disable their
IOMMU by default. IOMMU support for Intel chipsets must be
4 Threat model and IOMMU
configured through the BIOS as part of setup for device virtualiza-
We explore how GPUs might host stealthy malware. Our mal-
tion technology (VT-d), and it also must be enabled in the Linux
ware attacks compromise the privacy and integrity of system mem-
kernel. Some server manufacturers ship their products with VT-d
ory by reading and writing it from the GPU. First we specify our
disabled in the BIOS by default [14].
threat model.
Several major Linux distributions (e.g., Ubuntu 15.04, CentOS
4.1 Threat model 7, RHEL 7, OpenSUSE 13.2) ship with Intel’s IOMMU disabled
An attacker may load and unload kernel modules. In Linux, this in the kernel. The primary reasons for disabling the IOMMU is re-
can be done by briefly gaining the CAP SYS MODULE capability duced I/O performance [49] and the IOMMU’s incompatibility with
(which is a user credential stored in the process control block) using certain devices and features. For example, the peer-to-peer DMA
kernel exploits (e.g., [4, 6, 7]), by bypassing capability checks (e.g.,
GPGPU-10, February 04-05 2017, Austin, TX, USA Zhu et. al.

Mode IOTLB Flush Workload Bit rate Stale period

Strategy Timing Idle ssh connection 10 bps 1 day
Disabled None NA Web radio 130 Kbps 1 hour
Pass Web video: Auto (480p) 2 Mbps 1 minute
None NA
through Table 2. Workload, average measured bit rate, and the time a stale
When deferred list is IOTLB entry stays resident.
Flush entire
full or t ms after the
Deferred
IOTLB. first entry 2 , whichever
comes first. Strict mode also respects IO protection domains. Each device,
Flush individual Immediately after e.g., NIC, GPU, is placed in its own protection domain and en-
Strict entry in given unmapping entry from tries in the IOTLB are tagged with the domain identifier. When
domain. IO page table. the IOMMU driver flushes an entry from one domain, it does not
Table 1. Different IOMMU modes, their properties and security affect other domains.
characteristics. This mode is considered safer than the deferred mode, because
it flushes the IOTLB immediately after the entry gets unmapped.
4.3 IOMMU attacks
capability of NVIDIA GPUs, essential for high I/O performance in
Our driver and microcode attacks work when the IOMMU is dis-
multi-GPU systems, requires the IOMMU to be disabled [31].
abled or in pass through mode. Therefore, we now describe how we
While we state the obvious, when the IOMMU is disabled, it
attack strict mode and how we can surreptitiously transition from
does not protect CPU memory from malicious accesses performed
deferred mode to strict mode.
from the GPU.
We describe how to keep a stale malicious mapping in the IOTLB
IOMMU pass through mode. In pass through mode, device ad-
without having it installed in the I/O page table. Keeping the entry
dresses are used directly as CPU physical addresses. In this mode
out of the I/O page table makes an attack more difficult to detect
the hardware IOMMU is turned off, so there is no permissions
(though we know of no security tool that even validates I/O page
checking for DMA requests. Devices enter pass through mode if
tables).
it is enabled by a kernel parameter, and if during device discovery,
Stale IOMMU entries in strict mode. Our attack writes a mali-
the kernel determines that a device can address all of physical mem-
cious IO page table entry (e.g., one that maps an unrelated process’
ory. Some devices can be in pass through mode without all devices
credential structure), launches a GPU kernel which accesses the de-
being in this mode.
vice address of the mapping, causing the entry to be cached in the
Because there is no permissions checking, our driver and mi-
IOTLB. Then the attack code overwrites the IO page table entry and
crocode attacks work in pass through mode. Pass through mode is
the GPU kernel terminates. The malicious entry remains cached in
intended to use a software TLB [50], but we verified that on our sys-
the IOTLB, with no way for software to detect its presence, and no
tem, the software TLB does not check permissions. In our system,
evidence of it in the backing IO page table.
even though GPU device addresses are 40 bits, it identifies as a 32-
The question remains, how long can a stale entry last in the
bit device during its initialization. Therefore, the kernel must boot
IOTLB? We do experiments on a machine running Ubuntu 14.04
with less than or equal to 4 GB of memory to enable pass through
with X and Xfce desktop, the Linux 3.16 kernel with an Intel
mode. We verified that regardless of how much physical memory
82579LM Gigabit Network Card. We run several experiments each
is in the machine, if the kernel boots with a mem=4G option, the
with progressively more network traffic to create different levels of
kernel defaults to pass through mode where our attacks work.
IOTLB contention, because the network card competes for IOTLB
IOMMU deferred mode. If the IOMMU is enabled, Linux config-
entries with the GPU. We also tried to create IOTLB pressure from
ures it to work in deferred mode by default. When system memory
the GPU by running glxgears, a web browser, and displaying
is unmapped from IO devices, the OS clears the IO page table entry
video, but none of these activities generated significant competition
and adds it to a flush list. The IOMMU driver flushes the entire
for IOTLB entries, and none of them caused the GPU or driver to
IOTLB when the list contains a certain number of entries (250 for
flush the IOTLB.
the kernel version 4.1) or at most 10ms after an entry is added to
Table 2 summarizes our experiments where we run workloads
the list, whichever comes first.
with increasing network load and measure the stale period—the pe-
Deferred mode is considered less secure for memory integrity
riod of time a stale entry stays in the IOTLB.
because the memory unmapped from the device by the device driver
The first workload keeps an open ssh connection (using the
remains accessible from the device for up to 10ms [26]. However,
TCPKeepAlive option) to an idle machine. We measure a stale
we show below that this mode foils the GPU malware attack.
period of more than 24 hours (after one day we discontinued the ex-
IOMMU strict mode. Strict mode does not defer flushing the
periment). Next we continuously stream a web radio station which
IOTLB when unmapping IO memory; each page is flushed by the
generates on average 130 Kbps of incoming network traffic. With-
driver immediately after its entry is unmapped from the IO page ta-
out refreshing the stale entry, the stale mapping can be read after
ble. The driver flushes only the region covered by the entry (which
T = 1 hour. By periodically running an unprivileged GPU kernel to
can be a single page, or a power-of-two contiguous and aligned se-
read the memory mapped by the stale IOTLB entry every hour, we
quence of pages) and it never flushes the entire IOTLB, as it does
can keep the stale IOTLB entry resident for 10 hours.
in deferred mode.
Finally, we increase the network load by streaming a video from
youtube using the Firefox web browser. The video is played at the
2 In Linux t = 10 for Intel IOMMUs “Auto (480p)” setting, generating an average of 2 Mbps as measured
Understanding The Security of Discrete GPUs GPGPU-10, February 04-05 2017, Austin, TX, USA

by a network monitor. This workload uses the NIC and the graph- CPU GPU
User Process 3
ics rendering capability of the GPU. The mapping can be reliably
User Space
read 1 minute after it was erased from the IO page table. Running Kernel 5
1 2
a GPU kernel every 1 minute is sufficient to keep the stale IOTLB Attack Patch
2 6
entry resident for one hour, after which we discontinued the exper- Module GPU driver
iment. Streaming higher bandwidth videos, like the “Auto (720p)” Kernel Space
setting, cause the attack to fail, even when we refresh the stale en-
try every minute. We use round numbers like 1 minute for a stale
PCIe Bus
Chipset GPU Chipset
period to validate that our attacks are practical. Future work might
4 DMA
determine more clever ways to keep an IOTLB entry resident, but
our experiments establish a large enough window of vulnerability CPU RAM GPU RAM
to be a security concern.
Stealthy transition from deferred to strict. Keeping a stale entry
in the IOTLB is possible only if the IOMMU is configured in strict Figure 2. The driver attack, where a patched GPU driver has its
mode, because it is the only mode that invalidates each IOTLB en- memory mapping access control bypassed to allow a GPU kernel to
try separately. However, if the IOMMU is enabled, Linux uses de- access all of the CPU’s memory.
ferred mode by default, which flushes the IOTLB as a whole. These
IOTLB flushes frustrate our attacks (and form a practical counter-
measure to both of our attacks). the keylogger case access is to the keyboard buffer in OS kernel
We find that the kernel transitions between deferred memory. The primary difference is that our attack requires no long-
and strict mode based on the state of a single variable running CPU process to proceed, while the keylogger does. That
(intel iommu strict). By setting this variable to 1, the is because for the keylogger, the malicious memory mapping to the
kernel will put all devices into strict mode. Because this is a small, keyboard buffer must be installed into an unprivileged process that
legal change to kernel state, it is quite stealthy. We experimentally is running while the root-level compromise is active. The attack is
verify that it is effective at engaging strict mode, where we can lost if that unprivileged process subsequently terminates. Attack-
cache a stale IOTLB entry and launch one of the attacks we now ing the driver directly allows us to map any page to any process as
describe. many times as necessary, and without modifying sensitive kernel
We leave for future work developing a Linux IOMMU manage- data structures.
ment policy that combines the best parts of strict and deferred mode. Driver patch. Identifying the specific locations in the NVIDIA
Strict mode minimizes the chances of memory corruption by a de- driver that control access to memory would seem difficult because
vice by quickly unmapping DMA memory. Deferred mode frus- most of the driver is proprietary and undisclosed. However, the par-
trates our attacks by periodically flushing the entire IOTLB. ticular functions that control memory mapping reside in the wrap-
5 GPU driver attack per of the driver that is shipped open source and compiled at the
In this section we show an attack on the stock NVIDIA closed- time of driver installation, making it easier to determine the exact
source GPU driver. The attack enables arbitrary CPU memory map- location that needs patching.
ping into an unprivileged GPU program, concealing malware code We choose to patch the NVIDIA driver in memory rather than
which monitors or changes CPU memory from a GPU kernel. modify its source code. Patching the driver enables the attack on a
The attack scenario. An attacker loads a malicious kernel mod- system where the GPU driver module has been already loaded and
ule which installs a backdoor by patching a GPU driver in CPU is in use by the kernel, allowing the attack to avoid unloading it first,
memory (Step 1 in Figure 2). The driver continues to operate nor- which can be easily noticed. The patch is installed by a malicious
mally. To trigger the backdoor, an unprivileged GPU-controlling module which finds the driver module in memory, overwrites some
process performs a sequence of standard GPU API calls (a trigger of its code, and then unloads itself.
sequence) 3 , and maps the requested CPU memory into the GPU The patch diverts the original control flow of the driver to bypass
4 . The driver patch bypasses the standard access control checks the memory permission checks when handling the cudaHost-
in the driver, and allows the attacker to map any user or kernel mem- Register() API call, which is normally used to lock CPU mem-
ory page. The CPU process invokes a malicious, unprivileged GPU ory pages (os_lock_user_pages()) and map them into the
kernel 5 which accesses the mapped page 6 . The attack module GPU address space (nv_dma_map_pages()).
may unload itself, leaving the modified driver in kernel memory to The driver acts normally as long as the trigger sequence has not
subsequently repeat the attack from another unprivileged process. been detected. The trigger sequence is a series of legal but erro-
If no more attacks are planned, a stealthier alternative is to recon- neous calls to cudaHostRegister(), e.g., passing pointers to
struct the original driver code to evade detection by kernel code unallocated memory. Once triggered, the modified driver expects
integrity scanners [35]. We implement two proof-of-concept GPU another call to cudaHostRegister() with the hidden buffer as
malware kernels – one that escalates the privileges of a given pro- a parameter. The hidden buffer contains the actual parameters of
cess by manipulating its cred structure, and another that diverts the malicious mapping request. For instance, the driver may map
the execution flow of a given process by updating its code, which the virtual address of a certain process, or some known kernel data
resides in read-only memory. structure like the task control block (task_struct in Linux).
Our attack is similar to the previously reported GPU keylog- The patch resolves the physical address of the requested memory
ger [24] attack. Both attacks exploit GPU DMA capabilities; in region, injects this address into the original control structures of the
driver, and resumes the original driver which updates the internal
GPGPU-10, February 04-05 2017, Austin, TX, USA Zhu et. al.

GPU page table with the new mapping. The GPU-controlling pro- CPU GPU
cess then may launch the malicious GPU kernel that accesses the Process Kernel
mapped region.
6 GPU microcode attack GPU Chipset
We demonstrate a novel attack that modifies GPU microcode to
GPU
spy on or corrupt CPU and GPU state. The attack uses an embed- Microprocessor
ded microprocessor in NVIDIA GPUs and can evade any detection 3
Microcode
1
mechanism that relies on evidence from CPU or GPU memory. Chipset
PCIe Bus
RAM
6.1 Background: NVIDIA GPU microprocessors
NVIDIA GPUs contain several on-board microprocessors for
power management, video display, decoding, decryption, and other
purposes. The existence of multiple Falcon microprocessors in
NVIDIA GPUs has been officially disclosed by NVIDIA [32], but 2
no official public API has been released. The reverse-engineering CPU RAM GPU RAM
community discovered that Falcon microprocessors are capable
of issuing data transfers from CPU or GPU physical memory to
microprocessor memory using dedicated memory transfer instruc-
Figure 3. The GPU microcode attack. ( 1 Launch) The attacker
tions [13, 17].
loads the attack code into microprocessor storage. ( 2 Monitor)
Falcon microprocessors expose a common set of MMIO regis-
The microcode transfers data from GPU memory to its own memory
ters that both a CPU and the microprocessor can access or update
in order to identify triggers or commands from the attacker. ( 3
(GPU kernels normally cannot access them). These registers enable
Execute) Once it detects the commands, it launches the attack by
communication between privileged CPU code and the microcode.
writing to critical data structures in CPU memory.
Certain MMIO registers update the code and data memory of the
microprocessor and restart its execution. Linux kernel source code
contains the assembly code for certain Falcon processors3 . CPU memory or in GPU memory. We check all of the GPU base
The platform’s GPU driver loads control code onto the Falcon address register (BAR) regions mapped by the CPUs (BAR0 for
microprocessors as part of its initialization sequence. The code is MMIO, BAR1 for VRAM aperture, BAR3 for kernel-accessible
invoked in response to certain events, for example when serving re- control memory) and none contain any sign of the microcode data
quests to switch GPU control to another CPU process (called GPU or code, both in big- and little- endian representations. The only
context switch [15]). It is this control code that we attack. We call forensic tool we could find for dumping GPU memory [37] did not
this a GPU microcode attack because the microprocessor code is reveal the microcode. The microcode resides only in the GPU mi-
one of several non-user visible code modules loaded into the GPU croprocessor memory. The Falcon processor has MMIO registers
at its initialization, and because it is terminology accepted by GPU for uploading the microcode from CPU memory to microprocessor
driver developers [16]. memory. It also has an MMIO register for transferring microproces-
6.2 The attack sor memory back to CPU memory, but we know of no tool that uses
The attack consists of three phases: launch, monitor, and execute, this interface, let alone one that tries to determine if the microcode
as illustrated in Figure 3. In the launch phase, the attacker with is malicious.
privileged access installs the attack microcode on one of the Fal- Once established, the microcode attack does not require support
con microprocessors, and executes the code. Now the attack enters from a kernel module or a CPU user process, unlike previously
the monitor phase, in which the microprocessor monitors regions known GPU malware attacks [24]. The attack does not affect the
of GPU memory to identify commands from the attacker. The com- integrity of kernel-level data structures.
mands can be located in GPU memory or MMIO registers, though 6.3 Technical details
our prototype monitors only specific GPU memory locations. Fi- The attack code is written in C, based on the microcode assem-
nally, once the commands are identified, they are executed by the bly code in the Linux kernel. It is then compiled by the publicly
microprocessor as part of the execution phase. available LLVM backend and envytools [17]. The Falcon’s xfer
In our proof-of-concept attack, we use a seemingly random bi- instruction can initiate DMA, and it can transfer data from both
nary string as the trigger for the attack. An unprivileged attacker GPU and CPU memory to its own memory, or vice versa.
executes a CUDA program that has a large data structure contain- Out of many Falcon microprocessors, we use a microproces-
ing repeated copies of the trigger string. Our modified microcode sor that manages context switching between multiple command
detects the trigger (probably) in one of its monitoring locations and streams, e.g., between the X server and a 3D application. There are
transitions into the attack execution phase. In our prototype, the several microprocessors involved in this process; microprocessors
attacking microcode escalates the privilege of a predefined running manage the context switch of a group of SMs (graphics process-
shell process, by writing into the process’ credential structure in ing clusters, or GPCs), and a HUB microprocessor that manages
CPU memory. these GPC microprocessors. We update the microcode of HUB mi-
Stealth and unobtrusiveness. The key benefit of this attack is croprocessor because it has larger code and data memory, and the
stealth. We observe no evidence that the attack is occurring in open-source version of the microcode is available in the Linux ker-
nel. An official version of the microcode can be also extracted [16],
3 Located in drivers/gpu/drm/nouveau/core/engine/graph/fuc. and used to build modified microcode.
Understanding The Security of Discrete GPUs GPGPU-10, February 04-05 2017, Austin, TX, USA

The compiled binary is loaded into the microprocessor of the Small code size. Different types of Falcon processors have differ-
NVIDIA C2075 GPU using a set of MMIO registers. The envy- ent limits on code and data memory. For example, the maximum
tools [13] project contains a set of tools the open-source commu- code size and data size for a microprocessor are 16 KB and 4 KB,
nity has developed and used to build the open-source nouveau respectively, whereas the limits for closely related microprocessors
Linux driver for NVIDIA GPUs. RNN is the community-built is only 8 KB and 2KB. If multiple microprocessors communicate
knowledge base for MMIO registers in NVIDIA GPUs. The ad- and launch a more complex attack than one microprocessor can han-
dresses in parentheses are the offsets within the MMIO region, dle, the attacker can distribute the work according to the memory
which is referred by the first set of the PCI base address registers limit of each processor.
(BAR0) of NVIDIA GPUs. Microcode validation. Starting from Maxwell GPUs, NVIDIA
Microcode is uploaded to Falcon using CODE VIRT ADDR significantly strengthened the security of Falcon microprocessors
(0x188) and CODE (0x180) registers. The user can issue 32 bit by requiring their code to be signed and preventing code modifica-
stores to the CODE VIRT ADDR register for the index of the 256B tions after code is initially loaded [32]. Unsigned microprocessor
chunk of code to be written, and CODE for the content of the code code may run in unsecure mode, but it cannot use certain hardware
at the address pointed by CODE VIRT ADDR. DATA (0x1c4) allows features (the precise set of constraints depends on the processor).
upload of microcode data. These new security mechanisms, therefore, are likely to complicate
Most of the attack microcode is loaded into regions that do not or even entirely prevent our microcode attack, because most (but
overlap with the existing code or data, to have minimal effect on the not all) Falcons on NVIDIA Pascal GPUs do not allow unsigned
normal system operation. Only small patches to the code region are code access to physical memory. We leave the vulnerability analy-
needed to redirect the control flow to the injected attack code. sis of Pascal GPUs for future work.
To remain unobtrusive, we split the work done by the attack mi-
7 Related work
crocode into small units and execute only for a limited time, chain-
GPU malware. Vasiliadis et al. [45] present two GPU malware
ing together subsequent executions using continuations. The mi-
techniques, code unpacking and runtime polymorphism, used to
crocode is originally designed to be interrupt-driven: most of the
evade malware detection. These techniques make use of the GPU
functions are interrupt handlers for different types of interrupts,
computing capacity to build more complex packing algorithms and
such as periodic timers or commands waiting to be handled. The
leverage GPU direct memory access (DMA) to modify host mem-
microprocessor assumes that interrupt handlers will be brief.
ory.
We test the unobtrusiveness of our attack code by running glx-
Ladakis et. al. implement a keylogger on the GPU [24], leverag-
gears from the Mesa GL utility library. This application reports
ing the DMA capability of GPUs to monitor the operating system
the frame rate of 3D graphics rendering, and we could observe
keyboard buffer from a GPU kernel. The GPU-based keylogger re-
lower frame rates or even GPU lockup if the execution time of
quires an unprivileged helper process to set up the attack. It relies
our inserted operations took too long. By fine-tuning the amount
on a kernel module to update a page table entry of the helper, so
of work done at each interrupt, our attack microcode supports the
that the process’ address space contains a window on the kernel-
same glxgears framerate as the unmodified microcode. We also
level keyboard buffer. The keyboard buffer address is then moved
could not subjectively observe an effect on typical desktop opera-
to the GPU page table, and erased from the page table in the CPU,
tions.
keeping the kernel memory mapping for a short time.
We implement our proof-of-concept microcode attack on top of
Both of these attacks require helper processes on the CPU, and
the open source nouveau driver. The attack code for the nouveau dri-
these processes violate certain address space integrity properties
ver includes all of the attack steps we describe in the attack scenario.
(though in most systems, these integrity properties are implicit).
We verify that we can unobtrusively inject simple code sequences
Hiding malware with unpacking and polymorphism on the GPU re-
into the NVIDIA microcode, but do not implement the entire attack
quires mapping a CPU memory region that is executable, writable,
for the NVIDIA microcode.
and IO-mapped. The GPU-based keylogger has a user-level page
The embedded NVIDIA microprocessor has a periodic timer in-
that maps kernel memory which no user process should ever map.
terrupt and a one-shot watchdog timer, but in our experience, the
These distinctive memory regions that clearly violate certain safety
use of the periodic timer affects the graphics output. Therefore, to
properties make the malware easy to detect by some rootkit detec-
remain undetectable, we use the watchdog timer and have the watch-
tors [19, 36].
dog event handler reschedule another watchdog event. Our attack
The GPU-based microcode attack described in this paper does
code is 4KB, which we add to the 3KB of Nouveau microcode fit-
not require any running process once it is installed in the micropro-
ting comfortably in the device’s 16KB capacity.
cessor. It leaves no trace in CPU or GPU memory and therefore
6.4 Discussion does not violate any memory integrity property. The GPU driver
Falcon microprocessors are relatively slow, the one we used runs attack does not need a CPU helper until the malicious behavior is
at 270MHz. We only read a small number of GPU memory loca- triggered. The attack is entirely encapsulated in the driver and does
tions to keep execution time short. Therefore, our trigger consists not change any kernel data structure, however the patched driver
of a GPU program that fills much of its memory with the target module might still be detected by kernel integrity checkers.
string which gives a high probability of it being read by the attack Villani et. al. analyze four GPU-assisted malware anti-memory-
microcode. Our proof-of-concept trigger fills 3GB of data and the forensics techniques without modification of GPU microcode: un-
microcode reads five memory locations at 1GB offsets. This com- limited code execution, process-less code execution, context-less
bination makes the microcode recognize the trigger for each of 10 code execution and inconsistent memory mapping and apply them
trials. to integrated Intel GPU cards [46].
GPGPU-10, February 04-05 2017, Austin, TX, USA Zhu et. al.

GPU as secure co-processor. PixelVault [44] proposes to use USENIX Association, pp. 337–352.
GPUs as secure co-processors for cryptographic operations. We [2] C ORP., I. Intel 965 Express Chipset Family and Intel G35 Express Chipset
Graphics Controller Programmers Reference Manual. Volume 1: Graphics
have shown in this paper how features from the official NVIDIA Core, 2008. https://01.org/sites/default/files/documentation/965 g35 vol 1
debugger to unofficial hardware interfaces violate the security as- graphics core 0.pdf.
[3] C UI, A., C OSTELLO , M., AND STOLFO , S. J. When firmware modifications
sumptions of PixelVault. attack: A case study of embedded exploitation. In Proceedings of the Network
Firmware attacks. There are several firmware-based attacks that and Distributed System Security Symposium (NDSS) (2013).
target diverse devices [1, 3, 10, 41, 47, 51]. Similar to the mi- [4] CVE. CVE-2007-1019. https://www.cvedetails.com/cve/CVE-2007-1881/.
Accessed: May 2016.
crocode attack in this paper, these attacks embed malicious code [5] CVE. CVE-2011-1019. https://www.cvedetails.com/cve/CVE-2011-1019/.
into the firmware to circumvent the platform’s security while evad- Accessed: May 2016.
ing detection. Triulzi [42] presents a sniffer that uses a combination [6] CVE. CVE-2014-5207. https://web.nvd.nist.gov/view/vuln/detail?
vulnId=CVE-2014-5207. Accessed: May 2016.
of NIC and GPU to access main memory. The GPU runs an ssh dae- [7] CVE. CVE request: ro bind mount bypass using user namespaces. http://www.
mon that accepts packets from the NIC through PCI-to-PCI transfer. openwall.com/lists/oss-security/2014/08/13/9. Accessed: May 2016.
[8] CVE. How to exploit the x32 recvmmsg() kernel vulnerability CVE 2014-
The firmware modification on the GPU is mainly due to lack of PCI- 0038. http://blog.includesecurity.com/2014/03/exploit-CVE-2014-0038-
to-PCI transfer support. With GPUDirectRDMA [31], this attack x32-recvmmsg-kernel-vulnerablity.html. Accessed: May 2016.
can be implemented without GPU firmware modification. To the [9] CVE. Local root exploit for CVE-2014. https://github.com/saelo/cve-2014-
0038. Accessed: May 2016.
best of our knowledge, our attack is the first GPU microcode-based [10] D UFLOT, L., PEREZ, Y.-A., AND M ORIN , B. What if you can’t trust your
attack that leverages GPU embedded microprocessors. network card? In Proceedings of the International Symposium on Recent
Newer NVIDIA GPUs are expected to disallow the use of un- Advances in Intrusion Detection (RAID) (Berlin, Heidelberg, 2011), Springer-
Verlag, pp. 378–397.
signed microcode, preventing the microcode attack. [11] D UFLOT, L., PEREZ, Y.-A., VALADON , G., AND L EVILLAIN , O. Can you still
Information leaks through GPU. Recent works notice that the trust your network card. CanSecWest/core10 (2010).
[12] E DGE, J. A crypto module loading vulnerability. https://lwn.net/Articles/
GPU driver does not erase the device memory after kernel termi- 630762/. Accessed: May 2016.
nation, leaking private information [25, 28]. This paper describes a [13] ENVYTOOLS. envytools - tools for people envious of nvidia’s blob driver. https://
different type of attacks that leverage the GPU to stealthily perform github.com/envytools/envytools. Accessed: May 2016.
[14] FISCHER , W. Activating the Intel VT-d virtualization feature. https://
unauthorized accesses to CPU memory. www.thomas-krenn.com/en/wiki/Activating the Intel VT-d Virtualization
Attacks using graphics software stack. The security aspects of Feature. Accessed: May 2016.
using GPUs in graphics applications have been the subject of much [15] FREEDESKTOP. ORG. nouveau context switching. http://nouveau.freedesktop.
org/wiki/ContextSwitching/. Accessed: May 2016.
work [38, 39, 43]. Our work is complementary in that we focus on [16] FREEDESKTOP. ORG. nouveau context switching firmware. http://nouveau.
the GPU microcode and the driver, and investigate the weaknesses freedesktop.org/wiki/NVC0 Firmware/. Accessed: May 2016.
[17] FUJII, Y., A ZUMI, T., N ISHIO , N., K ATO , S., AND E DAHIRO , M. Data transfer
of using GPUs as secure co-processors. matters for GPU computing. In Proceedings of the International Conference
Reverse-engineering GPU hardware. Detailed information about on Parallel and Distributed Systems (ICPADS) (Washington, DC, USA, 2013),
GPU hardware architecture is usually never disclosed by the ven- IEEE Computer Society, pp. 275–282.
[18] G ERFIN , G., AND V ENKATARAMAN , V. Debugging experience with CUDA-
dors. Wong et al. [48] reverse engineer GPU internals via carefully GDB and CUDA-MEMCHECK. In GPU Technology Conference (2012).
crafted microbenchmarks. We use similar techniques in this paper [19] H OFMANN , O. S., D UNN , A., K IM, S., ROY, I., AND W ITCHEL , E. Ensuring
to discover the size of the instruction cache. Fujii et al. [17] explain operating system kernel integrity with OSck. In Proceedings of the ACM Inter-
national Conference on Architectural Support for Programming Languages and
the internal organization of GPU microprocessors which we use to Operating Systems (ASPLOS) (March 2011).
implement the microcode attack. [20] H OFMANN , O. S., PORTER , D. E., ROSSBACH , C. J., R AMADAN , H. E., AND
W ITCHEL , E. Solving difficult HTM problems without difficult hardware. In
8 Conclusion Proceedings of the 2nd Workshop on Transactional Computing (TRANSACT)
(Portland, OR, August 2007).
GPUs are not an appropriate choice for a secure coprocessor, and [21] K ATO , S. Gdev: Open-source GPGPU runtime and driver software. https://
they pose a security threat to computing platforms, even those with github.com/shinpei0208/gdev.
an IOMMU. The problem with making hardware, especially hard- [22] K ATO , S., M C T HROW, M., AND M ALTZAHN , C ARLOSAND B RANDT, S.
Gdev: First-class GPU resource management in the operating system. In Pro-
ware as complex as a GPU, into something that enhances security is ceedings of the USENIX Annual Technical Conference (June 2012).
that the security guarantees rely on a large set of assumptions about [23] K HRONOS G ROUP. The OpenCL Specification, Version 2.0, 2014.
[24] KOROMILAS , L., VASILIADIS , G., IOANNIDIS , S., L ADAKIS , E., AND POLY-
architectural, micro-architectural, and software features. These as- CHRONAKIS , M. You can type, but you can’t hide: A stealthy GPU-based key-
sumptions are difficult to verify and they can change for different logger. In Proceedings of the Sixth European Workshop on System Security (EU-
versions of the product because the underlying motivation of the ROSEC) (2013), ACM.
[25] L EE, S., K IM, Y., K IM, J., AND K IM, J. Stealing webpages rendered on your
manufacturer is not security. browser by exploiting GPU vulnerabilities. In Proceedings of the IEEE Sympo-
As an attack platform, they combine powerful access to plat- sium on Security and Privacy (Oakland) (Washington, DC, USA, 2014), IEEE
form hardware with an opacity encouraged by its proprietary nature. Computer Society, pp. 19–33.
[26] M ALKA , M., A MIT, N., B EN -Y EHUDA , M., AND T SAFRIR , D. rIOMMU: Ef-
While forensic tools for GPUs will improve, they represent another ficient IOMMU for I/O Devices That Employ Ring Buffers. In Proceedings of
nettlesome resource for determined attackers. the Twentieth International Conference on Architectural Support for Program-
ming Languages and Operating Systems (New York, NY, USA, 2015), ASPLOS
9 Acknowledgements ’15, ACM.
[27] M ARECK , R. AMD x86 Firmware Analysis. Accessed: May 2016.
Mark Silberstein was supported by the Israel Science Foundation [28] M AURICE , C., N EUMANN , C., H EEN , O., AND FRANCILLON , A. Confiden-
(grant No. 1138/14) and the Israeli Ministry of Science. We also tiality issues on a GPU in a virtualized environment. In Financial Cryptography
gratefully acknowledge funding from NSF grants CNS-1017785 and Data Security. Springer, 2014, pp. 119–135.
[29] NOUVEAU. Nouveau: Accelerated open source driver for nvidia cards. http://
and CCF-1333594. nouveau.freedesktop.org/wiki/. Accessed: May 2016.
[30] NVIDIA. Debugger API. http://docs.nvidia.com/cuda/debugger- api/index.
References html.
[1] B ROCKER , M., AND C HECKOWAY, S. iSeeYou: disabling the MacBook web- [31] NVIDIA. GPUDirectRDMA technology. http://docs.nvidia.com/cuda/
cam indicator LED. In Proceedings of the USENIX Security Symposium (2014), gpudirect-rdma/index.html.
Understanding The Security of Discrete GPUs GPGPU-10, February 04-05 2017, Austin, TX, USA

[32] NVIDIA. NVIDIA Falcon security. ftp://download.nvidia.com/open-gpu-

doc/Falcon-Security/1/Falcon-Security.html.
[33] NVIDIA. CUDA Runtime API, 2014.
[34] O LEKSIUK , D. Breaking uefi security with software dma attacks. http://blog.
cr4.sh/2015/09/breaking-uefi-security-with-software.html. Accessed: May
2016.
[35] PENDERGRASS , J. A., AND N. M C G ILL, K. LKIM: The linux kernel integrity
measurer. In Johns Hopkins APL Technical Digest. September 2013.
[36] PETRONI , J R ., N. L., AND H ICKS , M. Automated detection of persistent kernel
control-flow attacks. In Proceedings of the 14th ACM Conference on Computer
and Communications Security (New York, NY, USA, 2007), CCS ’07, ACM,
pp. 103–115.
[37] R ICHARD III, G. GPU memory dump tool. 2015 DFRWS Forensics Challenge,
2015. http://cs.uno.edu/∼golden/Materials/gpumalware/nvidia-dump-fb-1.
0.tar.gz.
[38] SECURITY, C. I. WebGL-a new dimension for browser exploita-
tion. http://www.contextis.com/resources/blog/webgl-new-dimension-
browser- exploitation/. Accessed: 2015-02-15.
[39] SECURITY, C. I. WebGL-more WebGL security flaws. http://www.contextis.
com/resources/blog/webgl-more-webgl-security-flaws/. Accessed: May
2016.
[40] SEVINSKY, R. Funderbolt: Adventures in thunderbolt dma attacks. https://
media.blackhat.com/us-13/US-13-Sevinsky-Funderbolt-Adventures-in-
Thunderbolt-DMA-Attacks-Slides.pdf. Accessed: May 2016.
[41] STEWIN , P. Detecting peripheral-based attacks on the host memory. PhD thesis,
Technische Universität Berlin, 2015.
[42] T RIULZI , A. Project maux mk.ii. http://www.alchemistowl.org/arrigo/
Papers/Arrigo-Triulzi-PACSEC08-Project-Maux-II.pdf. Accessed: May
2016.
[43] US-CERT. Vulnerability summary for CVE-2014-8298. National Vulnerability
Database, Dec 2014. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-
2014-8298.
[44] VASILIADIS , G., ATHANASOPOULOS , E., POLYCHRONAKIS , M., AND IOAN -
NIDIS , S. PixelVault: Using GPUs for securing cryptographic operations. In
Proceedings of the ACM Conference on Computer and Communications Secu-
rity (CCS) (2014), ACM.
[45] VASILIADIS , G., POLYCHRONAKIS , M., AND IOANNIDIS , S. GPU-assisted
malware. International Journal of Information Security (2010), 1–9.
[46] V ILLANI , A., B ALZAROTTI , D., AND DI PIETRO , R. The impact of GPU-
assisted malware on memory forensics: A case study. In DFRWS 2015, Annual
Digital Forensics Research Conference, Philadelphia, USA (2015).
[47] WOJTCZUK , R., AND RUTKOWSKA , J. Attacking Intel trusted execution tech-
nology. Black Hat DC (2009).
[48] WONG , H., PAPADOPOULOU , M.-M., SADOOGHI-A LVANDI , M., AND
M OSHOVOS , A. Demystifying GPU microarchitecture through microbench-
marking. In Performance Analysis of Systems Software (ISPASS), 2010 IEEE
International Symposium on (March 2010), pp. 235–246.
[49] YASSOUR , B.-A., B EN -Y EHUDA , M., AND WASSERMAN , O. On the DMA
mapping problem in direct device assignment. In Proceedings of the 3rd Annual
Haifa Experimental Systems Conference (New York, NY, USA, 2010), SYSTOR,
ACM, pp. 18:1–18:12.
[50] Y U , F. Intel IOMMU Pass Through Support. https://lwn.net/Articles/329174/.
Accessed: May 2016.
[51] Z ADDACH , J., K URMUS , A., BALZAROTTI , D., B LASS , E.-O., FRANCILLON ,
A., G OODSPEED , T., G UPTA , M., AND KOLTSIDAS , I. Implementation and
implications of a stealth hard-drive backdoor. In Proceedings of the Annual Com-
puter Security Applications Conference (ACSAC) (2013), ACM, pp. 279–288.

Wireless and Mobile Network Architecture
No ratings yet
Wireless and Mobile Network Architecture
592 pages
Exp#4 Preliminary Report
No ratings yet
Exp#4 Preliminary Report
40 pages
An5156 Introduction To Security For Stm32 Mcus Stmicroelectronics
No ratings yet
An5156 Introduction To Security For Stm32 Mcus Stmicroelectronics
59 pages
Advanced Switching Concepts
No ratings yet
Advanced Switching Concepts
41 pages
CA Plex 7 2 1 Source
No ratings yet
CA Plex 7 2 1 Source
302 pages
General 1830 PSS Questions
No ratings yet
General 1830 PSS Questions
5 pages
Reverse Engineering Power Managment Nvidia GPU
No ratings yet
Reverse Engineering Power Managment Nvidia GPU
10 pages
FPGA2019 Tutorial
No ratings yet
FPGA2019 Tutorial
11 pages
Practical Plantwide Process Control: PID Tuning: Sigurd Skogestad, NTNU
No ratings yet
Practical Plantwide Process Control: PID Tuning: Sigurd Skogestad, NTNU
66 pages
Fo81 PDS Lobby Control en
No ratings yet
Fo81 PDS Lobby Control en
4 pages
UNIT 4 GPU Computing - HPC
No ratings yet
UNIT 4 GPU Computing - HPC
13 pages
What Is Teradata
0% (1)
What Is Teradata
7 pages
CS 101 Chapter 2
No ratings yet
CS 101 Chapter 2
6 pages
TLV9001
No ratings yet
TLV9001
100 pages
Akuvox C313 Indoor Monitor Administrator Guide - V1.0 202201
No ratings yet
Akuvox C313 Indoor Monitor Administrator Guide - V1.0 202201
116 pages
Linux TCS
No ratings yet
Linux TCS
229 pages
New ATM Booth Setup and Configuration: Presented by
No ratings yet
New ATM Booth Setup and Configuration: Presented by
57 pages
Modelado de GPUs e Implementación de Características Dentro de Accel-Sim. TFG Juan José Castillo Otón
No ratings yet
Modelado de GPUs e Implementación de Características Dentro de Accel-Sim. TFG Juan José Castillo Otón
56 pages
Part1 22
No ratings yet
Part1 22
77 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
Breaking RSA
No ratings yet
Breaking RSA
21 pages
Lecture - 10 - Abstract Classes
No ratings yet
Lecture - 10 - Abstract Classes
20 pages
Strongbox: A Gpu Tee On Arm Endpoints: Yunjie Deng Chenxu Wang Shunchang Yu Shiqing Liu
No ratings yet
Strongbox: A Gpu Tee On Arm Endpoints: Yunjie Deng Chenxu Wang Shunchang Yu Shiqing Liu
15 pages
Article 3226
No ratings yet
Article 3226
28 pages
Osdi18-Volos-Graviton-Trusted Execution Environments On GPUs
No ratings yet
Osdi18-Volos-Graviton-Trusted Execution Environments On GPUs
17 pages
9000 Ethernet Conn
No ratings yet
9000 Ethernet Conn
38 pages
Jang Asplos19
No ratings yet
Jang Asplos19
14 pages
Vortex Micro21 Final
No ratings yet
Vortex Micro21 Final
13 pages
ECE586 Lecture 1
No ratings yet
ECE586 Lecture 1
36 pages
Ucc2818 PDF
No ratings yet
Ucc2818 PDF
45 pages
Heterogeneous Isolated Execution For Commodity GPUs
No ratings yet
Heterogeneous Isolated Execution For Commodity GPUs
14 pages
TCAD Hardware Security Survey
No ratings yet
TCAD Hardware Security Survey
29 pages
Web GPUAttacks
No ratings yet
Web GPUAttacks
13 pages
Operating System Abstractions To Manage Gpus As Compute Devices
No ratings yet
Operating System Abstractions To Manage Gpus As Compute Devices
16 pages
Common Counters: Compressed Encryption Counters For Secure GPU Memory
No ratings yet
Common Counters: Compressed Encryption Counters For Secure GPU Memory
13 pages
Design Patterns For Low-Level Real-Time Rendering - Nicolas Guillemot - CppCon 2017
No ratings yet
Design Patterns For Low-Level Real-Time Rendering - Nicolas Guillemot - CppCon 2017
56 pages
Asplos19 Trustguard
No ratings yet
Asplos19 Trustguard
17 pages
Cryptography On Gpus: Erdem Sarılı
No ratings yet
Cryptography On Gpus: Erdem Sarılı
18 pages
Managing Security in FPGA-Based Embedded Systems
No ratings yet
Managing Security in FPGA-Based Embedded Systems
24 pages
CryptoGraphic-Secret Key Using Graphic Card
No ratings yet
CryptoGraphic-Secret Key Using Graphic Card
18 pages
Sec20 Delshadtehrani PHMon
No ratings yet
Sec20 Delshadtehrani PHMon
19 pages
Ec3401 Ns Unit 5
No ratings yet
Ec3401 Ns Unit 5
11 pages
Hardware-Software Co-Design For Side-Channel Protected Neural Network Inference
No ratings yet
Hardware-Software Co-Design For Side-Channel Protected Neural Network Inference
12 pages
Biomedical Engineering
No ratings yet
Biomedical Engineering
8 pages
Parel Kar
No ratings yet
Parel Kar
13 pages
Graphics Processing Unit GPU Programming Strategie
No ratings yet
Graphics Processing Unit GPU Programming Strategie
14 pages
Cs 8803 Ss Project
No ratings yet
Cs 8803 Ss Project
17 pages
6 - Side-Channel Security of Superscalar CPUs - Evaluating The Impact of Micro-Architectural Features
No ratings yet
6 - Side-Channel Security of Superscalar CPUs - Evaluating The Impact of Micro-Architectural Features
6 pages
A Novel GPU Overdrive Fault Attack: Majid Sabbagh, Yunsi Fei, David Kaeli
No ratings yet
A Novel GPU Overdrive Fault Attack: Majid Sabbagh, Yunsi Fei, David Kaeli
6 pages
Special Topic Submission Enabling Domain-Specific Architectures With An Open-Source Soft-Core GPGPU
No ratings yet
Special Topic Submission Enabling Domain-Specific Architectures With An Open-Source Soft-Core GPGPU
8 pages
An Introduction To Android Development: CS231M - Alejandro Troccoli
No ratings yet
An Introduction To Android Development: CS231M - Alejandro Troccoli
22 pages
Test 2. Operating Systems II - Attempt Review
No ratings yet
Test 2. Operating Systems II - Attempt Review
5 pages
ccs18 Gpu Side Channel
No ratings yet
ccs18 Gpu Side Channel
15 pages
Schneider Electric - Galaxy-300 - G3HT10KHB1S
No ratings yet
Schneider Electric - Galaxy-300 - G3HT10KHB1S
5 pages
Conf Graphics Hardware 2006
No ratings yet
Conf Graphics Hardware 2006
25 pages
CAO Report
No ratings yet
CAO Report
17 pages
CN 7
No ratings yet
CN 7
3 pages
GPU Khoruzhenko
No ratings yet
GPU Khoruzhenko
5 pages
GPU-Co Processing
No ratings yet
GPU-Co Processing
8 pages
Exploiting The Power of Gpus For Asymmetric Cryptography: Abstract. Modern Graphics Processing Units (Gpu) Have Reached A
No ratings yet
Exploiting The Power of Gpus For Asymmetric Cryptography: Abstract. Modern Graphics Processing Units (Gpu) Have Reached A
21 pages
G-GPU A Fully-Automated Generator of GPU-like ASIC Accelerators
No ratings yet
G-GPU A Fully-Automated Generator of GPU-like ASIC Accelerators
4 pages
PWPP Pathport DIN Mount 2 Port Manual
No ratings yet
PWPP Pathport DIN Mount 2 Port Manual
3 pages
Datasheet
No ratings yet
Datasheet
10 pages
Efficient Acceleration of Asymmetric Cryptography On Graphics Hardware
No ratings yet
Efficient Acceleration of Asymmetric Cryptography On Graphics Hardware
17 pages
Tortuga Logic Detect and Prevent Security Vulnerabilities in Your Hardware Root of Trust 1
No ratings yet
Tortuga Logic Detect and Prevent Security Vulnerabilities in Your Hardware Root of Trust 1
15 pages
Cache-Collision Attacks On GPU-based AES Implementation With Electro-Magnetic Leakages
No ratings yet
Cache-Collision Attacks On GPU-based AES Implementation With Electro-Magnetic Leakages
7 pages
csc207 Reflection1
No ratings yet
csc207 Reflection1
1 page
Data Remanence and Digital Forensic Investigation For CUDA Graphics Processing Units
No ratings yet
Data Remanence and Digital Forensic Investigation For CUDA Graphics Processing Units
6 pages
Secret Key Cryptography Using Graphics Cards
No ratings yet
Secret Key Cryptography Using Graphics Cards
14 pages
Examination Paper For TTT4120 Digital Signal Processing: Department of Electronic Systems
No ratings yet
Examination Paper For TTT4120 Digital Signal Processing: Department of Electronic Systems
7 pages
CASE STUDY-Types of GPU
No ratings yet
CASE STUDY-Types of GPU
10 pages
Analysis and Implementation of Parallel Aes Algorithm Based On T-Table Using Cuda On The Multicore Gpu
No ratings yet
Analysis and Implementation of Parallel Aes Algorithm Based On T-Table Using Cuda On The Multicore Gpu
8 pages
Data Sheet: TDA1516BQ
No ratings yet
Data Sheet: TDA1516BQ
12 pages
Storegpu: Exploiting Graphics Processing Units To Accelerate Distributed Storage Systems
No ratings yet
Storegpu: Exploiting Graphics Processing Units To Accelerate Distributed Storage Systems
10 pages
Features Description: SBOS014
No ratings yet
Features Description: SBOS014
14 pages
Brodtkorb Etal Meta10
No ratings yet
Brodtkorb Etal Meta10
15 pages
DS-D2046LU-Y 302504798 Datasheet 20231226
No ratings yet
DS-D2046LU-Y 302504798 Datasheet 20231226
4 pages
Oracle WebLogic Server 12c Performance Tuning Workshop
No ratings yet
Oracle WebLogic Server 12c Performance Tuning Workshop
3 pages
Cryptographics: Cryptography Using Graphics Processing Units
No ratings yet
Cryptographics: Cryptography Using Graphics Processing Units
6 pages
Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems For Programmability and Reliability
No ratings yet
Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems For Programmability and Reliability
6 pages
Side-Channel Power Analysis of A GPU AES Implementation: Chao Luo, Yunsi Fei, Pei Luo, Saoni Mukherjee, David Kaeli
No ratings yet
Side-Channel Power Analysis of A GPU AES Implementation: Chao Luo, Yunsi Fei, Pei Luo, Saoni Mukherjee, David Kaeli
8 pages
BACHELOR OF COMPUTER APPLICATIONS SCHEME OF EXAMINATION W - e - F - 2015-16
No ratings yet
BACHELOR OF COMPUTER APPLICATIONS SCHEME OF EXAMINATION W - e - F - 2015-16
7 pages
TD 6280 1-P 3911 PDF
No ratings yet
TD 6280 1-P 3911 PDF
9 pages
Dfrws2007 Gpu
No ratings yet
Dfrws2007 Gpu
10 pages
Huffmire - Managing Security in FPGA-Based Embedded Systems
No ratings yet
Huffmire - Managing Security in FPGA-Based Embedded Systems
11 pages
8 Things You Should Know About GPGPU Technology: Q&A With TACC Research Scientists
No ratings yet
8 Things You Should Know About GPGPU Technology: Q&A With TACC Research Scientists
2 pages
Engineering AI Excellence
From Everand
Engineering AI Excellence
Azhar ul Haque Sario
No ratings yet
Pop!_OS System Administration Guide: Definitive Reference for Developers and Engineers
From Everand
Pop!_OS System Administration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Rust for Embedded Systems
From Everand
Rust for Embedded Systems
James Oakton
No ratings yet
Jetson Platform Development Guide: Definitive Reference for Developers and Engineers
From Everand
Jetson Platform Development Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Understanding The Security of Discrete GPUs

Uploaded by

Understanding The Security of Discrete GPUs

Uploaded by

Understanding The Security of Discrete GPUs

Zhiting Zhu Sangman Kim Yuri Rozhanski

Yige Hu Emmett Witchel Mark Silberstein

2.1 GPU execution model

Mode IOTLB Flush Workload Bit rate Stale period

[32] NVIDIA. NVIDIA Falcon security. ftp://download.nvidia.com/open-gpu-

You might also like