TLS support in GPU progamming

LuoYuanke · January 21, 2025, 1:29am

I try to use TLS (thread local storage) variable with ROCM compiler and NVCC, but fail to get the expected behavior for both of the compilers. ROCM crashes when compiling the code, and NVCC seems ignore the keyword __thread and treat tls as a global variable. Does anybody know why TLS is not well supported on GPU programming?

arsenm · January 21, 2025, 7:20am

This is unimplemented, and kind of a hassle. You need to allocate a copy of the global for every item in the dispatch, which is an unknown and potentially large number. I would hope you would get a clean error if you attempt to use __thread

Flakebi · January 21, 2025, 11:38am

Could thread locals be implemented by allocating them at the start of the stack/scratch in kernels?

(I think for x86/glibc TLS is at the start of the stack as well.)

arsenm · January 21, 2025, 12:39pm

Maybe, but that’s using up precious stack space. I think the correct way to handle this is a new address space and using buffers with the magic work item ID indexed SRD configuration

jhuber6 · January 21, 2025, 10:12pm

Being able to codegen thread_local on the GPU would be nice, but right now it’s just unimplemented Compiler Explorer. Another issue would be initializers. Even if we had some special buffer set up we’d need some backend support to write some value to it on kernel start, which would then be a performance issue if we ever wanted that to work between multiple (non-LTO) link jobs.

Right now the closest thing you can get to TLS is burning most of your LDS budget on it. I.e.

__shared__ int local[1024];

thread = local[tid.x];

krzysz00 · January 22, 2025, 1:30am

Also, perhaps because I don’t understand the use case here, why do you want TLS instead of just declaring a variable in your entry point/kernel?

LuoYuanke · January 22, 2025, 1:54am

Clang would report error to indicate that compiler don’t support dynamic initialization for local thread variable. That looks good for us.
Compiler Explorer and Compiler Explorer

LuoYuanke · January 22, 2025, 1:58am

We’d like to have TLS variable that can be accessed freely in our library functions. If it is declared in the entry of kernel, we need to pass the reference of local variable to device function in the call trace.

krzysz00 · January 30, 2025, 9:18pm

Side note: If by “thread-local” you do mean “private to one lane of execution”, this ends up morally equivalent to an alloca() in the kernel entry point.

The trouble with allowing this sort of thing as a global declaration is that, if there’s more than one kernel in your source code, you have to plumb the value through to all of them, probably wasting registers.

That is, something like

thread_local int x;

__device__ __kernel__ f(int a) {
  ...
  g(a);
  ...
}

__device__ g(int b) {
  ...
}

would need to become

__device__ __kernel f(int a) {
  int x;
  ...
  g(b, x); // *maybe* passed by value if you can get away with it
  ...
}
__device__ g(int b, [const] int& x) {
  ...
}

which is a rather irritating transformation to do in general.

(Like, I’m sure no one would complain if this got implemented, but I don’t get the sense this is an extremely desirable feature. Patches welcome, I’d say.)

jhuber6 · January 30, 2025, 9:41pm

I think this works in theory but I wouldn’t for a second recommend anyone actually do it since it’ll blow up depending on how the stack gets allocated Compiler Explorer. Mostly just as a curiosity.

int [[clang::address_space(5)]] *tls = (int [[clang::address_space(5)]] *)(0);

[[clang::amdgpu_kernel, gnu::visibility("protected")]] void kernel() {
  *tls = __builtin_amdgcn_workitem_id_x();
}

Topic		Replies	Views
thread_local LLVM Dev List Archives	2	74	July 11, 2007
thread_local LLVM Dev List Archives	0	78	July 11, 2007
Thread-local storage on x86-64 LLVM Dev List Archives	1	93	October 18, 2007
ideas for TLS implementation LLVM Dev List Archives	2	70	April 11, 2007
JITted code and thread-local storage LLVM Dev List Archives	6	74	August 22, 2016

TLS support in GPU progamming

Related topics