Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] Add better support for "crate-per-schema" #8273

Open
adsnaider opened this issue Mar 31, 2024 · 14 comments · May be fixed by #8563
Open

[Rust] Add better support for "crate-per-schema" #8273

adsnaider opened this issue Mar 31, 2024 · 14 comments · May be fixed by #8563

Comments

@adsnaider
Copy link
Contributor

Currently, flatc allows using --rust-module-root-file and --gen-all to generate multiple schemas into a single crate with a top-level mod.rs. This is good but makes it really hard to use in many contexts since the best (only?) option to have inter-dependent schemas is to generate everything together into a single crate.

Ideally, we can generate each schema independently of the includes (as each include will be its own generated crate), and link them all at build time.

@adsnaider
Copy link
Contributor Author

I don't particularly care about doing this through flatc directly. We use Bazel downstream and I have some patches to implement this, but we're somewhat behind master so it might take me a bit of time to send a PR. It is on my radar though.

The big thing that needs to change is the use per module need to be more specific, for instance:

// foo.fbs
namespace foo;
// ... some definitions
// foobar.fbs
include "foo.fbs"
namespace foo.bar;
// ... some definitions

The generated code should be like

// foo_generated.rs

use foo_generated::*;

pub mod foo {
  use foo_generated::foo::*; // As opposed to currently `foo_generated::*;

  pub mod bar {
    // No need to include foo_generated here since it doesn't live in `foo.bar`
  }
}

I think the correct approach is to include [some_generated_dep]::[some_module_tree]::* if and only if [some_module_tree] is a subset of [some_generated_dep]'s namespace. I'm hoping someone will tell me if that doesn't work on every case :)

@adsnaider
Copy link
Contributor Author

Unfortunately, this doesn't work when the imports are ambiguous. For instance:

foo.fbs -> namespace a1.b1.c1 - includes bar and baz
bar.fbs -> namespace a1.b2.c1
baz.fbs -> namespace a1.b2.c1

This will result in bar::a1::* and baz::a1::* being imported into the same a1 module which result in an ambiguous include of b2.

We will probably have to change the code generation to use absolute paths instead for our use case. Would love to upstream our work once it's ready. Would this be a reasonable contribution or would this change to absolute paths break other use cases?

Copy link
Contributor

This issue is stale because it has been open 6 months with no activity. Please comment or label not-stale, or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Oct 10, 2024
@adsnaider
Copy link
Contributor Author

not-stale

@adsnaider
Copy link
Contributor Author

Btw, would love to hear feedback from other users or maintainers on how they are dealing with this

@github-actions github-actions bot removed the stale label Oct 11, 2024
@csmulhern
Copy link
Contributor

@adsnaider I'm currently dealing with the same issue. Would love to understand how you've approached this and what a PR for this might look like.

@adsnaider
Copy link
Contributor Author

@csmulhern, I haven't solved the issue yet. There are a few options that I see but I'm not sure what would be best for upstream. Part of the solution may involve the build system. I sent a message to the flatbuffer's discord but we didn't really agree on a clear option. This was my message there:

Context: I work in a large C++ codebase and we make significant use of flatbuffers. We use bazel as our build system and it makes it really simple to have each flatbuffer be its own static library. This is incredibly useful for a few reasons, the main one being that 2 distinct C++ libraries that depend on the same flatbuffer may pass flatbuffers around (as a C++ object)
Problem: The issue that I've run into -- and this is a similar problem with other code-generated schemas -- is that in Rust, the smallest unit of compilation is a crate, and each crate must have a unique name in the compilation graph. If I try to follow the same approach as with C++ of 1 flatbuffer per compilation unit 1. it becomes really hard to provide a name for each crate, and 2. more importantly, the code generation of intradependencies of flatbuffers becomes erroneous. This is because flatbuffers use relative paths to include dependencies (e.g. super::super::Foo ), but now Foo does not belong to the same crate.
Possible solutions: These are some options that I've considered, all having different trade-offs, but I'm curious if there are other solutions people have come up with/implemented

  1. Generate all of the flatbuffers and place them into a single crate. Solves a lot of these issues but in a big enough codebase (may?) become a bottleneck in compilation. Additionally, it is not suitable when working in a modular project where you may want to include third-party flatbuffers into your hierarchy.
  2. 1 flatbuffer crate per Rust compilation unit/crate. Essentially, for each Rust crate (rust_library/rust_binary) aggregate all of the flatbuffer-dependencies required into a single crate. Fixes modularity but becomes impossible to pass a generated flatbuffer type to one of its dependencies since Foo in my crate would not be the same type as Foo in my crate dependency
  3. Somehow change the flatbuffer generated code to figure out where to import a specific name from so that it can use absolute paths and also generate a single module that re-exports all of the types of each flatbuffer dependency into a single module hierarchy. I believe this option would solve every problem but there are some technical issues that need solving to implement it and may require a "forever patched" flatbuffer code generator

This hasn't been an urgent issue for us yet so we haven't settled on any solution yet, but I suspect we will have to come back to this eventually.

Open to discussing more and helping with an implementation if you have some ideas.

@csmulhern
Copy link
Contributor

csmulhern commented Mar 18, 2025

@adsnaider, thanks for that additional context.

The three options you've outlined are exactly the same ones that I am considering.

Option 1 seems untenable for the reason you've outlined; a real solution needs to allow the definition of a common schema that's used across projects.

Option 2 seems untenable too, as you don't have cross crate compatibility.

For Option 3, I'm not exactly clear on what you're suggesting in terms of the reexports.

As a reference, Swift is module based in terms of compilation units, and so has to solve similar challenges as Rust. They solve this by having one module (crate) per proto library, and then requiring that proto dependencies of the proto_library have their analogous module be a dependency of the generated module.

For example:

# proto/BUILD.bazel

proto_library(
    name = "foo",
    srcs = ["foo.proto"],
)

proto_library(
    name = "bar",
    srcs = ["bar.proto"],
    deps = [":foo"],
)

swift_proto_library(
    name = "foo_swift",
    protos = [":foo"],
)

swift_proto_library(
    name = "bar_swift",
    protos = [":bar"],
    deps = [":foo_swift"],
)
# foo.proto

syntax = "proto3";

message Foo {
    string field = 1;
}
# bar.proto

syntax = "proto3";

import "proto/foo.proto";

message Bar {
    Foo field = 1;
}

In the generated code, they reference the fields from imports with a fully qualified name. For instance, here is the generated code for the Bar structure in the bar_swift module:

import proto_foo_swift

public struct Bar {
  // SwiftProtobuf.Message conformance is added in an extension below. See the
  // `Message` and `Message+*Additions` files in the SwiftProtobuf library for
  // methods supported on all messages.

  public var field: proto_foo_swift.Foo {
    get {return _field ?? proto_foo_swift.Foo()}
    set {_field = newValue}
  }

  ...

where proto_foo_swift is the Bazel target name where slashes and colons have been converted to underscores (//proto:foo_swift -> proto_foo_swift). This is the module (crate) name given to the module generated by //proto:foo_swift. The equivalent in Rust would be proto_foo_rust::Foo.

This is what I think should be done for flatbuffer generated Rust code in Bazel projects. Ideally, flatc would support this style of generation, but I'm not sure the best way to achieve that (e.g. module name mappings could be provided as a command line argument). protoc uses a plugin architecture to allow custom code generation defined outside of protoc to be leveraged by protoc. flatc doesn't do this, but a custom generator could be built on top of the flatbuffers library. I believe the Swift protobuf generation inside the Bazel rules uses a custom plugin to achieve the style of generation covered above. See: https://github.com/bazelbuild/rules_swift/blob/343f35ebef603b92eb458b929a94f4ef97338d78/proto/swift_proto_compiler.bzl#L31).

I'm unclear on how Swift code generation works in Flatbuffers today, but I will take a look soon.

@csmulhern
Copy link
Contributor

csmulhern commented Mar 18, 2025

Looks like the Flatbuffer Swift code generator has the same problem. The foo / bar example above generates the following code for Bar:

import FlatBuffers

public struct Bar: FlatBufferObject, Verifiable {

  ...

  public var field: Foo? { let o = _accessor.offset(VTOFFSET.field.v); return o == 0 ? nil : Foo(_accessor.bb, o: _accessor.indirect(o + _accessor.position)) }

  ...

I.e. it imports the runtime library only, and uses Foo with the assumption that it's defined in the same module.

@csmulhern
Copy link
Contributor

csmulhern commented Mar 18, 2025

It looks like struct definitions (Definition.file) contain the path to the file they were defined in.

If we have a mapping of file path -> Bazel target, we should be able to map the Type names in a relatively straightforward way through Namer.

I'd be curious if there's an abstraction here we could supply this information to flatc using that is generic / useful enough where there would be an appetite to add this to flatc somehow.

Having to a write a whole custom code generator to support this would be quite a nuisance. The individual code generators used by flatc are not public, so this cannot be easily achieved through e.g. subclassing the current rust code generator and adding a custom codegen binary built using the flatbuffers library.

cc @dbaileychess, @aardappel, @CasperN - any idea who would be best to weigh in here?

@csmulhern
Copy link
Contributor

I've sketched out what I think a useful version of this may look like. The API should be generalizable to all module based languages, and initial support has been added in the Rust code generator. Have a look at #8563.

@csmulhern
Copy link
Contributor

@adsnaider, it makes me happy to see your positive reactions. Would be good to understand if the approach taken in #8563 suits your use case / expected usage.

For reference, I have written custom bazel rules wrapping flatc for Rust code generation from flatbuffer schema files using the --module-mapping flag, and have managed to have modules that generate code from schemas with dependencies that are generated by other targets.

@adsnaider
Copy link
Contributor Author

@csmulhern I'm currently traveling but your solution seems reasonable and we have folks at work that are interested in trying it out for an upcoming project. I can make sure they give their feedback after trying it out

@csmulhern
Copy link
Contributor

Glad to hear it! Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants