Issues Compiling With Offloading Support

jfuchs · July 9, 2025, 2:01pm

Hi all,
currently I am trying to compile clang/flang with offloading support. So far, I have tried numerous combinations of CMake configurations and environment variables with no success. I am using Ubuntu, the latest NVIDIA drivers with CUDA 12.9 and a RTX 5070ti (but I think this does not matter for compiling LLVM). After consulting the discourse here, I took a step back and tried the recommended approach to get an offloading capable compiler:

cd llvm-project
mkdir build
cd build
cmake ../llvm -G Ninja \
    -C ../offload/cmake/caches/Offload.cmake \
    -DCMAKE_BUILD_TYPE="Release" \
    -DCMAKE_INSTALL_PREFIX="/opt/llvm/llvm-project/install" \
    -DCMAKE_C_COMPILER="clang-19" \
    -DCMAKE_CXX_COMPILER="clang++-19"
ninja -j 24 install

Using this approach I get errors during building various objects in libcxx, all with the same error pattern. E.g. for chrono.cpp:

[2562/2580] Building CXX object libcxx/src/CMakeFiles/cxx_static.dir/chrono.cpp.o
FAILED: libcxx/src/CMakeFiles/cxx_static.dir/chrono.cpp.o 
/home/jfuchs/tools/llvm-compilation/llvm-project/build/./bin/clang++ --target=nvptx64-nvidia-cuda -DLIBCXX_BUILDING_LIBCXXABI -DLIBC_NAMESPACE=__llvm_libc_common_utils -D_LIBCPP_BUILDING_LIBRARY -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_LINK_PTHREAD_LIB -D_LIBCPP_LINK_RT_LIB -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES -I/home/jfuchs/tools/llvm-compilation/llvm-project/libcxx/src -I/home/jfuchs/tools/llvm-compilation/llvm-project/build/include/nvptx64-nvidia-cuda/c++/v1 -I/home/jfuchs/tools/llvm-compilation/llvm-project/build/include/c++/v1 -I/home/jfuchs/tools/llvm-compilation/llvm-project/libcxxabi/include -I/home/jfuchs/tools/llvm-compilation/llvm-project/cmake/Modules/../../libc -O3 -DNDEBUG -std=c++23 -faligned-allocation -nostdinc++ -fvisibility-inlines-hidden -fvisibility=hidden -fsized-deallocation -nogpulib -flto -fconvergent-functions --cuda-feature=+ptx63 -Wall -Wextra -Wnewline-eof -Wshadow -Wwrite-strings -Wno-unused-parameter -Wno-long-long -Werror=return-type -Wextra-semi -Wundef -Wunused-template -Wformat-nonliteral -Wzero-length-array -Wdeprecated-redundant-constexpr-static-def -Wno-nullability-completeness -Wno-user-defined-literals -Wno-covered-switch-default -Wno-suggest-override -Wno-error -fno-exceptions -fno-rtti -fdebug-prefix-map=/home/jfuchs/tools/llvm-compilation/llvm-project/build/include/c++/v1=/home/jfuchs/tools/llvm-compilation/llvm-project/libcxx/include -MD -MT libcxx/src/CMakeFiles/cxx_static.dir/chrono.cpp.o -MF libcxx/src/CMakeFiles/cxx_static.dir/chrono.cpp.o.d -o libcxx/src/CMakeFiles/cxx_static.dir/chrono.cpp.o -c /home/jfuchs/tools/llvm-compilation/llvm-project/libcxx/src/chrono.cpp
In file included from /home/jfuchs/tools/llvm-compilation/llvm-project/libcxx/src/chrono.cpp:27:
/usr/include/unistd.h:27:1: error: unknown type name '__BEGIN_DECLS'
   27 | __BEGIN_DECLS
      | ^
/usr/include/unistd.h:220:19: error: typedef redefinition with different types ('__ssize_t' (aka 'int') vs 'long')
  220 | typedef __ssize_t ssize_t;
      |                   ^
/home/jfuchs/tools/llvm-compilation/llvm-project/build/include/nvptx64-nvidia-cuda/llvm-libc-types/ssize_t.h:12:26: note: previous definition is here
   12 | typedef __PTRDIFF_TYPE__ ssize_t;
      |                          ^
In file included from /home/jfuchs/tools/llvm-compilation/llvm-project/libcxx/src/chrono.cpp:27:
/usr/include/unistd.h:287:52: error: expected function body after function declarator
  287 | extern int access (const char *__name, int __type) __THROW __nonnull ((1));
      |                                                    ^
/usr/include/unistd.h:339:65: error: expected function body after function declarator
  339 | extern __off_t lseek (int __fd, __off_t __offset, int __whence) __THROW;
      |                                                                 ^
/usr/include/unistd.h:371:62: error: expected function body after function declarator
  371 | extern ssize_t read (int __fd, void *__buf, size_t __nbytes) __wur
      |                                                              ^
/usr/include/unistd.h:378:64: error: expected function body after function declarator
  378 | extern ssize_t write (int __fd, const void *__buf, size_t __n) __wur
      |                                                                ^
/usr/include/unistd.h:437:36: error: expected function body after function declarator
  437 | extern int pipe (int __pipedes[2]) __THROW __wur;
      |                                    ^
/usr/include/unistd.h:452:52: error: expected function body after function declarator
  452 | extern unsigned int alarm (unsigned int __seconds) __THROW;
      |                                                    ^
/usr/include/unistd.h:494:6: error: expected function body after function declarator
  494 |      __THROW __nonnull ((1)) __wur;
      |      ^
/usr/include/unistd.h:517:39: error: expected function body after function declarator
  517 | extern int chdir (const char *__path) __THROW __nonnull ((1)) __wur;
      |                                       ^
/usr/include/unistd.h:531:50: error: expected function body after function declarator
  531 | extern char *getcwd (char *__buf, size_t __size) __THROW __wur;
      |                                                  ^
/usr/include/unistd.h:552:27: error: expected function body after function declarator
  552 | extern int dup (int __fd) __THROW __wur;
      |                           ^
/usr/include/unistd.h:555:39: error: expected function body after function declarator
  555 | extern int dup2 (int __fd, int __fd2) __THROW;
      |                                       ^
/usr/include/unistd.h:573:28: error: expected function body after function declarator
  573 |                    char *const __envp[]) __THROW __nonnull ((1, 2));
      |                                          ^
/usr/include/unistd.h:585:6: error: expected function body after function declarator
  585 |      __THROW __nonnull ((1, 2));
      |      ^
/usr/include/unistd.h:590:6: error: expected function body after function declarator
  590 |      __THROW __nonnull ((1, 2));
      |      ^
/usr/include/unistd.h:595:6: error: expected function body after function declarator
  595 |      __THROW __nonnull ((1, 2));
      |      ^
/usr/include/unistd.h:600:6: error: expected function body after function declarator
  600 |      __THROW __nonnull ((1, 2));
      |      ^
/usr/include/unistd.h:606:6: error: expected function body after function declarator
  606 |      __THROW __nonnull ((1, 2));
      |      ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.

More confusingly, if I use a custom CMake command (which works for my colleague using a 4080; see below), I can successfully compile clang/flang. I can even compile my code with this flang build that uses OpenMP offloading and target sections. However, this approach fails when I run my code. If I force OMP_TARGET_OFFLOAD=mandatory, then I get the error:

omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options.
omptarget error: No images found compatible with the installed hardware. Segmentation fault (core dumped)

export CC="clang-19"
export CXX="clang++-19"
export CUDACXX="/usr/local/cuda-12.9/bin/nvcc"

cd $BUILD_DIR
cmake -GNinja \
    -DCMAKE_BUILD_TYPE="Release" \
    -DCMAKE_INSTALL_PREFIX=${INSTALL_DIR} \
    -DCMAKE_CUDA_COMPILER=${CUDACXX} \
    -DCMAKE_CUDA_HOST_COMPILER="clang++-19" \
    -DCMAKE_CUDA_ARCHITECTURES="89;120" \
    -DLLVM_ENABLE_PROJECTS="clang;lld;flang;lldb" \
    -DLLVM_ENABLE_RUNTIMES="compiler-rt;openmp;offload;flang-rt" \
    -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \
    -DLLVM_ENABLE_RTTI="ON" \
    -DOPENMP_ENABLE_LIBOMPTARGET="ON" \
    -DFLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT="OpenMP" \
    -DFLANG_RT_DEVICE_ARCHITECTURES="sm_89;sm_120" \
    -DFLANG_EXPERIMENTAL_OMP_OFFLOAD_BUILD="host_device" \
    ${LLVM_DIR} 2>&1 | tee cmake.log
nice ninja -j 24 install 2>&1 | tee ninja.log

I suspect that there is some issue regarding my environment that I am not aware of. Additionally, there were some resources online that have done some kind of bootstrapping. Is this something to consider here? I am grateful for any hints or guidance in the right direction.

jhuber6 · July 9, 2025, 6:15pm

This is weird, it’s failing on the C++ library compilation (Not strictly necessary, you can open the Offload.cmake file and use that as a base with the libc++ stuff removed if you need it to work now.) The issue is that for some reason it’s including unistd.h which isn’t provided by the GPU libc, so it failed. I’m not sure why it’s hitting that configuration on your build.

I think what’s happening is that there’s a __has_include check on unistd.h and for some reason that’s present in your include path so it tries to use it. However that’s supposed to be handled by the config code that detects libc by passing -nostdlibinc which disables all standard system includes. I don’t see it in your provided error. It’s possible that somehow it’s not detecting CXX_SUPPORTS_NOSTDLIBINC_FLAG? I’d need to see your CMake logs for that.

The latter error is just because it couldn’t find a GPU to run the image on and you stated that offloading was mandatory. What does offload-arch give you? If it doesn’t give you a GPU then there’s an issue with your CUDA installation.

jhuber6 · July 10, 2025, 2:13pm

Your build failure is caused by [libcxxabi][libunwind] Support for using LLVM libc by petrhosek · Pull Request #134893 · llvm/llvm-project · GitHub, I’m looking into it.

jhuber6 · July 10, 2025, 2:28pm

Fixed in [LLVM] Fix GPU build of libcxx/compiler-rt libraries · llvm/llvm-project@f56b6ec · GitHub let me know if you have any further issues.

jfuchs · July 14, 2025, 7:24am

With this fix, I can now compile llvm successfully with the recommended Offload.cmake file. Although, I am still getting the error:

omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options.
omptarget error: No images found compatible with the installed hardware. Segmentation fault (core dumped)

when I try to run my code.
I also ran offload-arch and I got the output:

sm_120
gfx1036

which seems to line up with my RTX 5070ti and some AMD integrated graphics since I am running an AMD Ryzen 9 9950X.
I am no expert but I checked out the install directory and even found /opt/llvm/llvm-project/install/lib/nvptx64-nvidia-cuda which suggests that it found out that nvptx is the correct runtime?
Maybe I am missing something when setting up my environment. This is what I set before running my code:

# cuda
export PATH=${PATH}:"/usr/local/cuda-12.9/bin"
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:"/usr/local/cuda-12.9/lib64"

# llvm
export PATH="/opt/llvm/llvm-project/install/bin:$PATH"
export LD_LIBRARY_PATH="/opt/llvm/llvm-project/install/lib:/opt/llvm/llvm-project/install/lib/x86_64-unknown-linux-gnu:/opt/llvm/llvm-project/install/lib/nvptx64-nvidia-cuda:$LD_LIBRARY_PATH"

I then compile it with

clang -O3 -fopenmp -fopenmp-version=52 -fopenmp-targets=nvptx64-nvidia-cuda main.c

I also tried to append --offload-arch=sm_120 but it yields the same result.

Thank you very much for your help and effort!

jhuber6 · July 14, 2025, 12:53pm

Can you verify what image you’re building for? llvm-objdump --offloading a.out should tell you. That should ideally match sm_120. Realistically this only happens if it fails to start the runtime or there’s a conflicting device, could be some weird bug. If you compiled LLVM with -DLIBOMPTARGET_ENABLE_DEBUG=ON I might be able to figure something out after running it with LIBOMPTARGET_DEBUG=1.

jfuchs · July 14, 2025, 1:17pm

Here is the output if I run llvm-objdump --offloading a.out:

a.out:  file format elf64-x86-64

OFFLOADING IMAGE [0]:
kind            elf
arch            sm_120
triple          nvptx64-nvidia-cuda
producer        openmp

And here is the output for OMP_TARGET_OFFLOAD=mandatory ./a.out after export LIBOMPTARGET_DEBUG=1 and compiling with DLIBOMPTARGET_ENABLE_DEBUG=ON:

omptarget --> Init offload library!
OMPT --> Entering connectLibrary
OMPT --> OMPT: Trying to load library libomp.so
OMPT --> OMPT: Trying to get address of connection routine ompt_libomp_connect
OMPT --> OMPT: Library connection handle = 0x70ef79559e60
OMPT --> Exiting connectLibrary
omptarget --> Loading RTLs...
omptarget --> RTLs loaded!
PluginInterface --> Failure to check validity of image 0x5651b5deed20: Invalid CUDA addressing modePluginInterface --> Failure to check validity of image 0x5651b5deed20: Invalid CUDA addressing modePluginInterface --> Failure to check validity of image 0x5651b5deed20: Invalid CUDA addressing modeomptarget --> No RTL found for image 0x00005651b5dea120!
omptarget --> Done registering entries!
omptarget --> Entering target region for device -1 with entry point 0x00005651b5dea047
omptarget --> Use default device id 0
omptarget --> Call to omp_get_num_devices returning 0
omptarget --> omp_get_num_devices() == 0 but offload is manadatory
omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options.
omptarget error: No images found compatible with the installed hardware. Segmentation fault (core dumped)

Additionally I noticed some CMake warnings during compilation. I don’t know if they are critical but here they are:

CMake Warning at /home/jfuchs/tools/llvm-compilation/llvm-project/compiler-rt/cmake/Modules/CompilerRTUtils.cmake:335 (message):
  LLVMTestingSupport not found in LLVM_AVAILABLE_LIBS
Call Stack (most recent call first):
  CMakeLists.txt:29 (load_llvm_config)


CMake Warning at /home/jfuchs/tools/llvm-compilation/llvm-project/compiler-rt/cmake/Modules/CompilerRTUtils.cmake:335 (message):
  LLVMTestingSupport not found in LLVM_AVAILABLE_LIBS
Call Stack (most recent call first):
  CMakeLists.txt:29 (load_llvm_config)


CMake Warning (dev) at /usr/share/cmake-3.28/Modules/GNUInstallDirs.cmake:243 (message):
  Unable to determine default CMAKE_INSTALL_LIBDIR directory because no
  target architecture is known.  Please enable at least one language before
  including GNUInstallDirs.
Call Stack (most recent call first):
  /home/jfuchs/tools/llvm-compilation/llvm-project/compiler-rt/cmake/base-config-ix.cmake:9 (include)
  CMakeLists.txt:25 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning at /home/jfuchs/tools/llvm-compilation/llvm-project/compiler-rt/cmake/Modules/CompilerRTUtils.cmake:335 (message):
  LLVMTestingSupport not found in LLVM_AVAILABLE_LIBS
Call Stack (most recent call first):
  CMakeLists.txt:29 (load_llvm_config)


CMake Warning at /home/jfuchs/tools/llvm-compilation/llvm-project/libc/test/CMakeLists.txt:13 (message):
  Cannot build libc GPU tests, missing loader.


CMake Warning (dev) at /usr/share/cmake-3.28/Modules/GNUInstallDirs.cmake:243 (message):
  Unable to determine default CMAKE_INSTALL_LIBDIR directory because no
  target architecture is known.  Please enable at least one language before
  including GNUInstallDirs.
Call Stack (most recent call first):
  /home/jfuchs/tools/llvm-compilation/llvm-project/llvm/cmake/modules/AddLLVM.cmake:1 (include)
  CMakeLists.txt:174 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning at /home/jfuchs/tools/llvm-compilation/llvm-project/offload/unittests/CMakeLists.txt:104 (message):
  Cannot run conformance tests without the LLVM C library
Call Stack (most recent call first):
  /home/jfuchs/tools/llvm-compilation/llvm-project/offload/unittests/Conformance/CMakeLists.txt:8 (add_conformance_test)


CMake Warning:
  Manually-specified variables were not used by the project:

    LIBCXX_HAS_GCC_S_LIB

Thank you, let me know if you need any other logs!

jhuber6 · July 14, 2025, 6:02pm

This is interesting, CUDA must’ve changed their ELF ABI again, PluginInterface --> Failure to check validity of image 0x5651b5deed20: Invalid CUDA addressing mode. Can you open an issue for this? It’s related to llvm-project/offload/plugins-nextgen/common/src/Utils/ELF.cpp at main · llvm/llvm-project · GitHub which is supposed to imply that we only support 64-bit CUDA (since 32-bit has been deprecated) but it’s possible that they just changed this convention for the newer SMs, I have no clue what they do there because they don’t publish it.

jfuchs · July 14, 2025, 7:11pm

I created [Offload] Failure to check validity of image for sm_120 architecture · Issue #148703 · llvm/llvm-project · GitHub for further investigation. Thank you for your support!

Topic		Replies	Views
GPU Offloading Docker Image OpenMP gpu , clang , llvm	16	166	June 2, 2025
OpenMP offloading to the sm_35 device from x86 compute nodes OpenMP	10	150	March 18, 2019
Install LLVM OpenMP with GPU offloading OpenMP gpu	9	2117	June 29, 2023
Troubles with offloading in Clang 6.0 and trunk OpenMP	1	164	March 13, 2018
GPU Target Offloading - Cannot Find GPU OpenMP	2	166	January 30, 2019

Issues Compiling With Offloading Support

Related topics