PBQP codegen acceleration?

I’ve heard that the ambiguously-named neural-processing-units on recent model x86-64 CPUs are, in fact, hardware coprocessor functions for linear algebra (aka matrix math). If that’s the case, would the PBQP register allocator be a low-hanging fruit for early adoption?

My hope is that bytecode compilation from WebAssembly to native code could be implemented more transparently and quickly. Of course PBQP was the default allocation algorithm for LLVM last I was aware of so everything else would also benefit.

PBQP is not the default register allocator. At least for common architectures, Greedy is the default register allocator at most optimization levels with the fast allocator being used at O0 (I’m reasonably sure, someone please correct me if I’m wrong).

1 Like

I didn’t really ask about the lower optimization levels but thanks.

Greedy is used for optimized code in basically every targets. I don’t think the PBQP allocator gained much (meaningful) development over the years either.

PBQP was an experimental allocator added by @Arnaud-de-Grandmaiso a long time ago for embedded / DSP type problems and has not made its way into the other targets.

@boomanaiden154-1 and @mshockwave are correct that it is not the default allocator anywhere. It’s not even known to work on other targets, because there are not enough tests.

If you mean Intel’s NPU, then PBQP won’t help you. That is not an x86-64 architecture. There is no NPU back-end in LLVM.

1 Like

Thanks for the clarifications re:greedy allocator.