Skip to content

Baked data is big, and compiles slowly, for finely sliced data markers #5230

@sffc

Description

@sffc

icu_datetime compile times have regressed a lot since I added neo datetime data, and #5221 appears to be choking in CI.

The finely sliced data markers (small data structs designed to work with many data marker attributes) give compelling data sizes and stack sizes in Postcard (#4818, #4779). However, in Baked, they significantly increase file size, and the numbers for data size are also not as compelling because baked data includes a lot more pointers (for example, at least 24 bytes for a ZeroVec) which are duplicated for each and every instance of the data struct.

Example data struct that is used in a finely sliced data marker:

pub struct PackedSkeletonDataV1<'data> {
    pub index_info: SkeletonDataIndex,
    #[cfg_attr(feature = "serde", serde(borrow))]
    pub patterns: VarZeroVec<'data, PatternULE>,
}

Some ideas:

  1. Instead of storing many static instances of PackedSkeletonDataV1<'static>, we could instead store many static instances of (SkeletonDataIndex, &[u8]), and build an instance of PackedSkeletonDataV1<'static> at runtime. This is "free", and it should significantly reduce file size, but it causes us to use a Yoke code path.
  2. Make the struct derive VarULE and store all of the data in a big VarZeroVec<PackedSkeletonDataV1ULE>, and build an instance of PackedSkeletonDataV1<'static> at runtime. This should result in the smallest file size and data size, in line with postcard sizes, but is a bit more of a runtime cost since we need to do a VZV lookup. However, it's only one lookup and only when the locale was found, so I don't think we should try to avoid this cost for the sake of avoiding this cost.
  3. Construct static instances via pub fn PackedSkeletonDataV1::new_unchecked(SkeletonDataIndex, &[u8]), reducing file size and therefore probably compile times without changing any runtime characteristics. See DataBake: split serialized form from runtime form #2452.

@robertbastian @Manishearth @younies

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-data-infraComponent: provider, datagen, fallback, adaptersS-largeSize: A few weeks (larger feature, major refactoring)

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions