Decoding Cuda Binary File Format

The document describes the file format of CUDA applications using the Executable and Linkable Format (ELF), detailing its components such as the ELF header, section header table, and various sections that contain GPU code. It explains the structure of the GPU ELF, including the .nv.info sections that store metadata and attributes for kernel functions, as well as the implications of modifying the GPU kernel size on the executable. Additionally, it highlights the importance of adjusting offsets and addresses in the CPU executable when changes are made to the .nv_fatbin section to prevent crashes.

Uploaded by

Armaan Chowfin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views2 pages

Decoding Cuda Binary File Format

Uploaded by

Armaan Chowfin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Decoding CUDA Binary - File Format value is equal to the index of the associated .text...

section’s
Executable and Linkable Format [1], abbreviated to ELF, is index in the section header table.
a standard file format typically used on Unix and Unix-like Several of the sections have an entry in the symbol table of
systems. A variant of this format is also used by NVIDIA the GPU ELF. Additionally, the symbol table can have entries
software in order to package low-level GPU code. An ELF for subroutines within the kernel functions, with an st_info
file consists of four components: the ELF header, the section value of 34.These have a st_shndx value equal to the section
header table which describes each of the ELF file’s sections, number of the associated kernel function’s .text... section,
the program header table which defines the memory seg- and an st_value value equal to their offset inside the kernel
ments, and finally the various sections. function’s code.
Here we describe the file format of CUDA applications,
.nv.info The ELF also contains sections named ".nv.info.func"
which is vital when trying to modify existing code. We per-
for each kernel "func", and also a ".nv.info" section, contain-
formed all of our experimentation on Linux (Ubuntu and
ing various pieces of metadata. For example, the amount
openSUSE), and so information concerning the CPU exe-
of local memory allocated per-thread is controlled by the
cutable may not be applicable for operating systems such as
MIN_STACK_SIZE, FRAME_SIZE, and MAX_STACK_SIZE
Windows. Information concerning the GPU ELF, however,
attributes inside of .nv.info.
is applicable regardless of the operating system.
The .nv.info sections contain one or more attributes. An
attribute starts with a byte indicating the attribute format,
GPU ELF and then a byte with the attribute ID. If the attribute format
Every CUDA program has one or more executable ELF files is 1 (NVAL), then there is no associated value. If the attribute
embedded inside of it, which contain the GPU code. Notably, format is 2 (BVAL), then the following byte contains the
this nested ELF can have some differences from typical ELF attribute’s value. If the attribute format is 3 (HVAL), then the
files, such as a unique version number, which may cause following two bytes contain the attribute’s value. Finally, if
problems for existing programs and libraries when trying to the attribute format is 4 (SVAL), then the following two bytes
analyze it. contain some value, n, and then the next n bytes contain the
Every ELF’s header has an attribute named e_version, attribute’s values.
which is usually set to 1; other values are typically considered For example, for FRAME_SIZE, the attribute type will be 4
invalid. Though the embedded GPU ELF files created by some (SVAL), the next two bytes after the attribute ID will contain
versions of nvcc match this, with others this value is set to the size 8, the next four bytes contain the associated kernel
the compiler version. For example, with nvcc version 6.5, function’s index in the symbol table, and the remaining four
e_version is set to 65, and with nvcc version 8.0, e_version is bytes contain the actual frame size value.
set to 80. The ELF header is otherwise as expected, though In Table 1, we list all attribute IDs we are aware of that can
the e_machine attribute - which indicates the architecture - appear in the .nv.info sections, with their human-readable
is set to 190 to indicate CUDA. name and their number in the actual binary. Some attributes
The compiler will usually generate mangled names for are only compatible with more recent versions of the CUDA
each kernel function. For example, a kernel with the name SDK.
"foo" and one integer parameter will likely have the mangled
name "_Z3fooi". The ELF will use this name instead of the CPU Executable
original, unmangled version. With older versions of nvcc, the GPU ELF described above
For each kernel it describes, the ELF will contain a section was stored inside the .rodata section of the CPU ELF. But
called ".text.func", where func is the mangled name of the starting with version 5.0 released in 2012, the compiler cre-
kernel, containing the function’s binary code. The highest ates an executable containing dedicated sections for GPU
eight bits of the INFO attribute for this section control the code.
number of registers which will be allocated per-thread, and Most important is the .nv_fatbin section. It is split into an
the lowest eight bits hold this section’s index in the symbol arbitrary number of distinct regions, each of which contains
table. one or more GPU ELF files, PTX code files, and/or cubin
For a kernel function which uses shared memory, the ELF files. Each region begins with a 16 byte header: the first 8
will also contain a section named ".nv.shared.func" for each bytes are the .nv_fatbin magic number, and the remaining
kernel "func" - the size of this section is the number of bytes eight bytes contain the size of the rest of the region. The
of shared memory which the GPU will allocate per thread- rest of the region alternates between detailed headers and
block for the associated kernel function. Similarly, it can the embedded file (ELF, PTX, or cubin) which the detailed
contain sections named ".nv.constantX.func" for different header describes.
values of X, allocating (and possibly initializing) constant In the detailed header, the first 4-byte word contains the
memory for the kernel functions. Each .nv... section’s INFO embedded file’s type and ptxas flags; the lower two bytes
1
Table 1. Known .nv.info attributes. have a value of 2 for GPU ELF files. The second word is
the offset of the embedded file, relative to the start of this
Attribute (EIATTR) ID detailed header. The dword comprising the third and fourth
ERROR 0x00 words holds the size of the embedded file. The seventh word
PAD 0x01 is the code version, which is dependent on the compiler. The
IMAGE_SLOT 0x02 eighth word contains the target architecture - a value of 20 for
JUMPTABLE_RELOCS 0x03 compute capability 2.0, a value of 35 for compute capability
CTAIDZ_USED 0x04 3.5, etcetera. The rest of the detailed header contains less
MAX_THREADS 0x05 important metadata, such as the operating system or the
IMAGE_OFFSET 0x06 source code’s filename.
IMAGE_SIZE 0x07 Another section of the CPU ELF that is unique to CUDA
TEXTURE_NORMALIZED 0x08 programs is called .nvFatBinSegment. It contains metadata
SAMPLER_INIT 0x09 about the .nv_fatbin section, such as the starting addresses
PARAM_CBANK 0x0a of its regions. Its size is a multiple of six words (24 bytes),
SMEM_PARAM_OFFSETS 0x0b where the third word in each group of six is an address inside
CBANK_PARAM_OFFSETS 0x0c of the .nv_fatbin section. If we modify the .nv_fatbin, then
SYNC_STACK 0x0d these addresses need to be changed to match it.
TEXID_SAMPID_MAP 0x0e Side Effects of Modification
EXTERNS 0x0f
If we modify the size of the GPU kernel in a way that re-
REQNTID 0x10
quires us to change the size of the .nv_fatbin section, then
FRAME_SIZE 0x11
adjusting the parts described above are insufficient to keep
MIN_STACK_SIZE 0x12
the executable working. There are other changes that need
SAMPLER_FORCE_UNNORMALIZED 0x13
to be made to prevent the program from simply crashing.
BINDLESS_IMAGE_OFFSETS 0x14 Increasing the size of the GPU code shifts the addresses
BINDLESS_TEXTURE_BANK 0x15 of various parts of the executable. As such, several changes
BINDLESS_SURFACE_BANK 0x16 need to be made to the CPU executable. The offsets for sev-
KPARAM_INFO 0x17 eral sections need to be fixed in the program header section.
SMEM_PARAM_SIZE 0x18 We also scan the CPU assembly code for any addresses which
CBANK_PARAM_SIZE 0x19 point anywhere after the changed part of the .nv_fatbin sec-
QUERY_NUMATTRIB 0x1a tion, and increment them appropriately in the binary. Simi-
MAXREG_COUNT 0x1b larly, we fix such addresses inside several sections including
EXIT_INSTR_OFFSETS 0x1c any symbol tables (indicated by a type of SHT_SYMTAB or
S2RCTAID_INSTR_OFFSETS 0x1d SHT_DYNSYM), relocation tables (with type SHT_RELA),
CRS_STACK_SIZE 0x1e dynamic tables (with type SHT_DYNAMIC), and global offset
NEED_CNP_WRAPPER 0x1f tables (with name ".got").
NEED_CNP_PATCH 0x20 While we find that these fixes seem to work in practice,
EXPLICIT_CACHING 0x21 it is difficult to guarantee that they will be successful. For
ISTYPEP_USED 0x22 example, cases may arise where other data is mistakenly
MAX_STACK_SIZE 0x23 treated as an address and incremented, creating errors. As
SUQ_USED 0x24 such, whenever possible, it is best to prepare the executable
LD_CACHEMOD_INSTR_OFFSETS 0x25 in such a way that modifications will not require extra space
LOAD_CACHE_REQUEST 0x26 for additional code.
ATOM_SYS_INSTR_OFFSETS 0x27
COOP_GROUP_INSTR_OFFSETS 0x28 References
COOP_GROUP_MASK_REGIDS 0x29 [1] Committee, T., et al. Tool interface standard (tis) executable and
linking format (elf) specification version 1.2. TIS Committee (1995).
SW1850030_WAR 0x2a
WMMA_USED 0x2b

Different Operating Systems - GeeksforGeeks
No ratings yet
Different Operating Systems - GeeksforGeeks
22 pages
Part 2: Advanced Static Analysis
No ratings yet
Part 2: Advanced Static Analysis
105 pages
Assembler Tutorial PDF
100% (1)
Assembler Tutorial PDF
22 pages
Polytechnic University of The Philippines College of Engineering Computer Engineering Department
No ratings yet
Polytechnic University of The Philippines College of Engineering Computer Engineering Department
5 pages
Cot Slides Linker8777
No ratings yet
Cot Slides Linker8777
96 pages
API Without Secrets
No ratings yet
API Without Secrets
158 pages
Vmrun Command
No ratings yet
Vmrun Command
12 pages
Linking
No ratings yet
Linking
47 pages
SPCC Documentation
100% (1)
SPCC Documentation
33 pages
CUDA Binary Utilities
No ratings yet
CUDA Binary Utilities
32 pages
Cooperating Processes in Operating System
No ratings yet
Cooperating Processes in Operating System
6 pages
The Secret Life of Hello World
No ratings yet
The Secret Life of Hello World
106 pages
Class 2nd Chapter 2 (How Computer Workd) (Modified)
No ratings yet
Class 2nd Chapter 2 (How Computer Workd) (Modified)
12 pages
Windows XP
100% (1)
Windows XP
15 pages
Elf
No ratings yet
Elf
19 pages
C Programming Guddu Mehta
No ratings yet
C Programming Guddu Mehta
177 pages
Making Plain Binary Files Using A C Comp
No ratings yet
Making Plain Binary Files Using A C Comp
23 pages
Https WWW - Embeddedrelated.com Documents Blog 900
No ratings yet
Https WWW - Embeddedrelated.com Documents Blog 900
13 pages
NirajTamang Week8
No ratings yet
NirajTamang Week8
10 pages
ELF Format
100% (3)
ELF Format
60 pages
The Internals of Hello World
No ratings yet
The Internals of Hello World
74 pages
How Kernel, Compiler, and C Library Work Together - OSDev Wiki
No ratings yet
How Kernel, Compiler, and C Library Work Together - OSDev Wiki
5 pages
Basic Linux
No ratings yet
Basic Linux
65 pages
The Evolution of Gpus For General Purpose Computing
No ratings yet
The Evolution of Gpus For General Purpose Computing
38 pages
How To Write Shared Libraries
No ratings yet
How To Write Shared Libraries
47 pages
Ubuntu Manpage - Elf - Format of Executable and Linking Format (ELF) Files PDF
No ratings yet
Ubuntu Manpage - Elf - Format of Executable and Linking Format (ELF) Files PDF
26 pages
NVCC 1.1
No ratings yet
NVCC 1.1
30 pages
Nvidia Smi 367.38
No ratings yet
Nvidia Smi 367.38
34 pages
Lab1 - ELF
No ratings yet
Lab1 - ELF
6 pages
Create Your Own Android PC With Phoenix OS
No ratings yet
Create Your Own Android PC With Phoenix OS
27 pages
Compiler Assembler Linker
100% (1)
Compiler Assembler Linker
15 pages
Gpu Cuda
No ratings yet
Gpu Cuda
204 pages
Controller Editor Manual English
No ratings yet
Controller Editor Manual English
89 pages
CC Module-1 Notes
No ratings yet
CC Module-1 Notes
32 pages
Linux - Manual CSP3213
No ratings yet
Linux - Manual CSP3213
171 pages
Devconf 2012
No ratings yet
Devconf 2012
26 pages
Manually Creating An ELF Executable
No ratings yet
Manually Creating An ELF Executable
6 pages
Rise & Fall of Binaries - Part 03
No ratings yet
Rise & Fall of Binaries - Part 03
38 pages
CST3510 Coursework 2
No ratings yet
CST3510 Coursework 2
37 pages
100 Linker Script
No ratings yet
100 Linker Script
13 pages
Linux Basics (4th Year PUP)
No ratings yet
Linux Basics (4th Year PUP)
85 pages
Chapter 3
No ratings yet
Chapter 3
33 pages
Embedded RTOS System Development Environment
No ratings yet
Embedded RTOS System Development Environment
38 pages
Presentation 5
No ratings yet
Presentation 5
17 pages
DR 100e - NX - Clean - Installation - Procedure - CyberMed - PC-NX 23.00SU1
No ratings yet
DR 100e - NX - Clean - Installation - Procedure - CyberMed - PC-NX 23.00SU1
38 pages
The Cuda Compiler Driver NVCC: Last Modified On
No ratings yet
The Cuda Compiler Driver NVCC: Last Modified On
39 pages
Chapter 2.12: Compilation, Assembling, Linking and Program Execution
No ratings yet
Chapter 2.12: Compilation, Assembling, Linking and Program Execution
43 pages
The Art of ELF - Analysis and Exploitations
No ratings yet
The Art of ELF - Analysis and Exploitations
21 pages
Executable File Format
100% (1)
Executable File Format
22 pages
Introduction To Gpu Programming With Cuda and Openacc
100% (1)
Introduction To Gpu Programming With Cuda and Openacc
40 pages
4 - Key Concepts
No ratings yet
4 - Key Concepts
2 pages
Introduction To System Programming
100% (1)
Introduction To System Programming
50 pages
Development Tools: Compiler and Assembler
No ratings yet
Development Tools: Compiler and Assembler
4 pages
Linux Initialization Process
No ratings yet
Linux Initialization Process
44 pages
Other Services of OS
No ratings yet
Other Services of OS
2 pages
Understanding Linux: Blog Archive
No ratings yet
Understanding Linux: Blog Archive
6 pages
UNIX ELF File Format
No ratings yet
UNIX ELF File Format
45 pages
Unix Operating System
No ratings yet
Unix Operating System
13 pages
With Diploma in Business Law (For The Students Admitted During The Academic Year 2008-2009 and Onwards)
No ratings yet
With Diploma in Business Law (For The Students Admitted During The Academic Year 2008-2009 and Onwards)
41 pages
Linux Objdump Command Examples (Disassemble A Binary File) : Home Free Ebook Start Here Contact About
No ratings yet
Linux Objdump Command Examples (Disassemble A Binary File) : Home Free Ebook Start Here Contact About
17 pages
Elf
No ratings yet
Elf
47 pages
CUDA Binary Utilities
No ratings yet
CUDA Binary Utilities
32 pages
MPL Write - Ups
No ratings yet
MPL Write - Ups
33 pages
Embest IDE Edu Installation
No ratings yet
Embest IDE Edu Installation
17 pages
CUDA Binary Utilities
No ratings yet
CUDA Binary Utilities
36 pages
Turbohdd Usb en
No ratings yet
Turbohdd Usb en
14 pages
Core Dump Analysis
No ratings yet
Core Dump Analysis
31 pages
PE File Format (By Fare9)
No ratings yet
PE File Format (By Fare9)
13 pages
The Compiler, Assembler, Linker, Loader
No ratings yet
The Compiler, Assembler, Linker, Loader
10 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
UNIT 2 Virtualization CC
No ratings yet
UNIT 2 Virtualization CC
73 pages
Embedded Engineering Roadmap
No ratings yet
Embedded Engineering Roadmap
1 page
Intro To Reverse Engineering: Intropy
No ratings yet
Intro To Reverse Engineering: Intropy
63 pages
CSE451 Linking and Loading Autumn 2002: Gary Kimura Lecture #21 December 9, 2002
No ratings yet
CSE451 Linking and Loading Autumn 2002: Gary Kimura Lecture #21 December 9, 2002
20 pages
CC MCQ Unit-3
No ratings yet
CC MCQ Unit-3
3 pages
Computational Course Microsyllabus
80% (5)
Computational Course Microsyllabus
6 pages
Compiler and Assembler
No ratings yet
Compiler and Assembler
21 pages
Envytools PDF
No ratings yet
Envytools PDF
701 pages
Course Outline
No ratings yet
Course Outline
12 pages
Project Report
No ratings yet
Project Report
61 pages
The Compiler, Assembler, Linker, Loader and Process Address Space Tutorial - Hacking The Process of Building Programs Using C Language - Notes and Illustrations
No ratings yet
The Compiler, Assembler, Linker, Loader and Process Address Space Tutorial - Hacking The Process of Building Programs Using C Language - Notes and Illustrations
5 pages
AUTOSAR Functional Safety Do They Rule Each Other Out
No ratings yet
AUTOSAR Functional Safety Do They Rule Each Other Out
8 pages
Weblogic Administration in Linux&Solaris
No ratings yet
Weblogic Administration in Linux&Solaris
3 pages
Portable Executable Format
No ratings yet
Portable Executable Format
18 pages
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
From Everand
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Rodrigo Copetti
No ratings yet
PSP Architecture: Architecture of Consoles: A Practical Analysis, #18
From Everand
PSP Architecture: Architecture of Consoles: A Practical Analysis, #18
Rodrigo Copetti
No ratings yet
Learn Java Programming in 24 Hours
From Everand
Learn Java Programming in 24 Hours
PublishDrive
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
MVS JCL Utilities Quick Reference, Third Edition
From Everand
MVS JCL Utilities Quick Reference, Third Edition
Robert Wingate
5/5 (1)

Decoding Cuda Binary File Format

Uploaded by

Decoding Cuda Binary File Format

Uploaded by

Decoding CUDA Binary - File Format value is equal to the index of the associated .text...

You might also like