Running execution engine from Python

I have so far been unable to run a hello world example involving the mlir.execution_engine Python bindings. Following the example in test/python/integration/dialects/linalg/opsrun.py I have tried the following:

import ctypes

from mlir.dialects import func
from mlir.execution_engine import ExecutionEngine
from mlir.ir import *                                                               


with Context(), Location.unknown():
    module = Module.parse("""
    llvm.func @main() -> i32 attributes {llvm.emit_c_interface} {
      %0 = llvm.mlir.constant(0: i32) : i32
      llvm.return %0 : i32
    }
    """)

    print(module)

    ee = ExecutionEngine(module)
    c_int32_p = ctypes.c_int32 * 1
    res = c_int32_p(-1)
    ee.invoke("main", res)

I run this using Python 3.12.7 with PYTHONPATH set to python_packages in the llvm-project build directory. The output is:

Expected<T> must be checked before access or destruction.
Unchecked Expected<T> contained error:
Symbols not found: [ _mlir__mlir_ciface_main ]Aborted (core dumped)

I am not really sure what the issue is, I would have though this issue would only occur if llvm.emit_c_interface was missing but I did add it.

I have figured out a solution but it does not feel quite right, I can switch out llvm.func for func.func and then run:

pm = PassManager.parse("builtin.module(convert-func-to-llvm,func.func(llvm-request-c-wrappers))")
pm.run(module.operation)                                                     

But that would mean that I can’t execute functions fully lowered to LLVM. Which is possible with mlir-cpu-runner so I am probably still missing a puzzle piece. I believe part of the issue is that my to-llvm conversion pass that produced something equivalent to the snippet above uses bare pointer calling. And when this is enabled, wrapped for external callers are never generated. I’m not sure what to do about this.

You can you just need to also emit the C wrapper somehow. What you’re seeing is that the simplest way to emit the C wrapper is just to start with func.func and lower it to llvm dialect and request the wrapper be emitted for you (using llvm.emit_c_interface attribute). But there’s nothing stopping you from emitting the same wrapper yourself (by hand or using one of your own passes).

In general though, ExecutionEngine isn’t an industrial strength runtime so it’s not meant to handle all cases for all people. It’s purposely and purely desigend for running very small test cases in CI.

1 Like

Alright, that makes sense. And depending on the function I intend to call wrappers may not always be necessary anyways. Does your statement about ExecutionEngine extend to the JIT runner functionality in general?

If you’re asking about MLIRJitRunner then yes (if you’re asking about some LLVM util then I don’t know). All of the runner/runtime stuff in tree is just for CI - none of it is meant to be an “industrial grade” runtime. For that you need to head over to IREE (but of course that’s a whole new ball of complexity…).

1 Like

Okay, that’s very good to know, I have not yet looked into IREE but had assumed that it only implements the frontend part of the whole compilation process. I will investigate further.

Testing and exploration :slight_smile: I’d even claim it isn’t really a runtime, it’s meant to just involve the generated code one after the other, it’s a little bit of glue. You could define extern C functions in your own C/C++ file invoke one after the other and then just link in your generated kernels and get similar effect. It’s just a convenience wrapper to avoid doing all of that wrapping by hand. Dead simple, invoke the things and glue together. Which I think is ideal in many cases, but if you want an automatically managed memories etc, this is not it.

That being said, if you see some enhancements to be made here (documentation, usability, perf), those would still be welcome.

2 Likes

That seems completely overblown for JIT/execution needs. Many “industrial grade” projects are assembling the MLIR/LLVM JIT pieces without the needs of a VM and complex runtime/scheduling system. If you don’t start from a ML graph I’m not sure what warrants the complexity really.

This is completely inaccurate! It’s an execution engine for MLIR, not a runtime! One typically links in runtimes with it to get the desired target execution (e.g. OpenMP runtime for multicore CPUs, CUDA runtime for GPUs, and w/e else you need), by providing a list of libraries to link in dynamically. And there isn’t a single thing about it that isn’t industry strength for what it serves! Not just with what it is today, but it already had everything to build production-level JITs nearly 2.5 or more years ago! You can JIT execute MLIR equivalent of anywhere from a few KBs to even 100s of MB of native binary executable code (in-memory) and it functions as expected (corresponds to ~100s of thousands of MLIR ops at the mid-level). Its objective isn’t just to run small execution/integration test cases upstream - that’s done purely to keep test cases minimal enough for full coverage while keeping check-mlir times reasonable.

1 Like