Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns #10020

szehon-ho · 2024-03-22T08:10:21Z

The position_deletes metadata table (used to rewrite_position_deletes) has both 'partition' field and row field ( for the optional 'row' column of position deletes, https://iceberg.apache.org/spec/#position-delete-files, which is the table schema as struct). If the table has over 1000 columns, than the field ids will collide.

szehon-ho · 2024-03-22T08:14:02Z

api/src/main/java/org/apache/iceberg/types/AssignFreshIds.java


-class AssignFreshIds extends TypeUtil.CustomOrderSchemaVisitor<Type> {
-  private final Schema visitingSchema;
+class AssignFreshIds extends BaseAssignIds {


The change here is just to refactor to a base class, for logic re-use.

While we can use this class as-is, it seems only able to completely re-assign ids, and in our case it seems cleaner to only re-assign ids when they collide.

szehon-ho · 2024-03-25T23:37:35Z

Actually testing further, this does not work correctly if position delete file actually has 'row' value populated. The problem being, the position delete file is a parquet file with 'row' id in the metadata, so the ParquetReader then populates those with null (if those ids have been reassigned).

szehon-ho · 2024-03-26T23:03:31Z

Redid the approach, now I re-assign partition field ids.

I fixed all the problems of that approach, it requires a bit of finesse. The broad picture:

When we need to read manifests, we need to use the original specs. This is because it needs the original field ids to deserialize the manifest entry, which has those partition field ids.
When we need to bind filters (like for predicate eval), we need the new spec with reassigned ids, because it needs to bind against the updated schema.

szehon-ho

Leaving some explanations.

szehon-ho · 2024-03-26T23:04:29Z

core/src/main/java/org/apache/iceberg/PositionDeletesTable.java

+   * @param partitionType original table's partition type
+   * @return partition type with reassigned field ids
+   */
+  public static Types.StructType partitionType(Schema tableSchema, Types.StructType partitionType) {


These and following are just helper methods to reasisgn partition field ids to prevent collision with schema's field ids.

szehon-ho · 2024-03-26T23:05:12Z

core/src/main/java/org/apache/iceberg/PositionDeletesTable.java

-
-      // prepare transformed partition specs and caches
-      Map<Integer, PartitionSpec> transformedSpecs = transformSpecs(tableSchema(), table().specs());
+      Map<Integer, PartitionSpec> transformedSpecs =


We use the transformed spec to evaluate the filters, so we need to bind against the new (reassigned) field ids, hence passing the field id map here so the transformed specs have them.

szehon-ho · 2024-03-26T23:06:48Z

core/src/main/java/org/apache/iceberg/PositionDeletesTable.java

+          // Read manifests (use original table's partition ids to de-serialize partition values)
          CloseableIterable<ManifestEntry<DeleteFile>> deleteFileEntries =
-              ManifestFiles.readDeleteManifest(manifest, table().io(), transformedSpecs)
+              ManifestFiles.readDeleteManifest(manifest, table().io(), table().specs())


Here is a bit tricky. We need to read manifests using the original partition field ids (as that's how they are stored in the manifest file), so go back to use the original table's partition spec which preserves the original field ids.

But now we can no longer use the schema of that spec to bind the user-provided partition filter, because the filter needs to bind on the reasisgned ones of the metadata table schema. So here we split out the row-filter from ManifestReader and evaluate it separately outside. We do that below using transformedSpecs (which have the reassigned partition field ids).

szehon-ho · 2024-03-26T23:24:30Z

...rk/src/main/java/org/apache/iceberg/spark/actions/RewritePositionDeleteFilesSparkAction.java


  private StructLike coercePartition(PositionDeletesScanTask task, StructType partitionType) {
-    return PartitionUtil.coercePartition(partitionType, task.spec(), task.partition());
+    Types.StructType dedupType = PositionDeletesTable.partitionType(table.schema(), partitionType);


This was using the original partition type to coerce partition (for grouping during the rewrite), but we actually need the final partition type (with reassigned field ids). This is because PositionDeletesScanTask now has partition data that matches the reassigned field ids.

szehon-ho · 2024-04-11T22:05:37Z

Because this problem may affect more than just rewrite_position_deletes (like for example any table that selects _partition metadata column), I rewrote the patch to make the logic more generic and add support at the the Iceberg API level, so it can be used later more than just PositionDeletesTable.

Key changes:

Schema adds a new concept 'metadataFields' (name suggestion welcome). This allows tables to construct schemas with field ids that 'conflict' only if the conflicts are marked as metadata fields. Those field ids are automatically re-assigned to prevent conflict. There are also helper methods to get the map of reassigned field ids for those cases.
PartitionSpec has a method 'originalPartitionType' that returns the partition type with the original field ids.
ManifestReader uses the partition spec's originalPartitionType (because they are serialized with those in the manifest file). It is the only place in the code so far to need this, the rest of the code uses the partition type with re-assigned field ids.

This changes also makes fixes in tests and RewritePositionDeleteFiles to get the schema/spec always from the position_delete_files metadata table (which has the re-assigned field ids) instead of the original table which has the potentially colliding one.

szehon-ho · 2024-04-12T00:40:09Z

@rdblue @RussellSpitzer may be interested, can you take a look?

bk-mz · 2024-04-12T07:09:31Z

@szehon-ho but this implementation is rather a hack, workaround around original design.

Why folks you don't think just advancing version of the table and switching to bigger constant for partition spec offset (or going to negative ids just to remove set clash) won't be more clean for the implementation?

This way we'd have a clear path forward, marking old spec for deprecation in future releases and having a stable mainline with actual code.

szehon-ho · 2024-04-17T21:06:25Z

Hi @bk-mz we discussed this a bit in the last Iceberg community sync. The motivation in this pr is to fix the position_deletes metadata table and have tools to fix any other place it comes up (should be limited to metadata columns or metadata tables). imo the other fix as you mentioned is either

intrusive as preivously we did not force user to reserve id ranges for partition field vs regular field (its possible today in spec for field id to be negative)
temporary fix (what happens if user makes 10k columns)

api/src/main/java/org/apache/iceberg/PartitionSpec.java

szehon-ho · 2024-05-08T01:21:42Z

Thanks @RussellSpitzer , addressed initial comments

szehon-ho · 2024-05-08T01:38:48Z

.palantir/revapi.yml

      old: "method void org.apache.iceberg.PositionDeletesTable.PositionDeletesBatchScan::<init>(org.apache.iceberg.Table,\
        \ org.apache.iceberg.Schema, org.apache.iceberg.TableScanContext)"
      justification: "Removing deprecated code"
+  "1.5.0":


Looks like adding a new ctor method as suggested changes the serializationVersionUID, is it ok? @RussellSpitzer

+1 Yep, this would only be a concern if we were worried about folks using different Iceberg versions on client and server, this shouldn't be the case.

jgprogramming · 2024-06-07T20:30:44Z

So I've reviewed this PR, built it, tested it in our qa table that has close to 30k columns and it is working perfectly :) would really be interested in seeing this merged so I don't have to maintain custom version of iceberg lib.

One small note, would it be possible to remove old schema definitions from metadata file? With frequent schema evolution and a large amount of columns the metadata file grows very quickly, I'm wondering if during rewrite_manifest it would make sense to iterate over used schemas and keep only those? Maybe in another PR?

api/src/main/java/org/apache/iceberg/Schema.java

RussellSpitzer · 2024-06-11T14:32:27Z

api/src/main/java/org/apache/iceberg/Schema.java

+    AtomicInteger nextId = new AtomicInteger();
+
+    Type res =
+        TypeUtil.assignIds(


I think this may have some ordering issues ... I'm not sure if this is possible but say I see

transformId = 1000
Then after that I see
columnId = 1000

Won't I still have a problem?

Im not entirely sure I get your use case. But , we have two lists conceptually: idsToReassign, usedIds.

We go through all the fields, and if find an idToReassign, we just find the next value that is not in usedIds.

Are you asking, what if we both have idToReassign = usedIds = (1000). I think its ok, it will just pick the next one 1001, (even though it didnt have to..)

… than 1k columns

szehon-ho · 2024-06-11T23:51:32Z

core/src/main/java/org/apache/iceberg/PositionDeletesTable.java

+    // Calculate used ids (for de-conflict)
+    Set<Integer> currentlyUsedIds =
+        Collections.unmodifiableSet(TypeUtil.indexById(Types.StructType.of(columns)).keySet());
+    Set<Integer> usedIds =


@RussellSpitzer i found another issue. In some case where I remove a column, i did not have those deleted column ids in 'usedIds'. This led to some case where I re-assigned an id to a deleted column id, and for old files with that column, I saw bad behavior when it came to pruning. This is the fix.

For this to work, I massaged the Schema API to take in directly a callback to reassign id. I think it actually simplifies the API a bit as it removes the concept of 'metadataIds', which was probably confusing anyway.

Yep this also fixes the situation I was worried about.

core/src/main/java/org/apache/iceberg/PositionDeletesTable.java

RussellSpitzer

I think we are set here. Remembering

We still need to resolve this for SELECT *, _partition Queries (ok for followup)
I think we should have a straight up test of the position delete metdata table so we have a test that is invoked without a spark change. One that specifically would use the re-id'ing . I know the current case does, but mostly on accident since 0 is the default right?

szehon-ho · 2024-06-12T22:52:47Z

Thanks, added test

szehon-ho · 2024-06-12T23:51:10Z

Thanks @RussellSpitzer for helping get this through the finish line!

PR to fix _partition metadata column collisions to come subsequently

dramaticlly · 2024-06-21T20:44:39Z

Thanks @RussellSpitzer for helping get this through the finish line!

PR to fix _partition metadata column collisions to come subsequently

#10547 is attempting to fix the read of _partition for table over 1k columns

…th more than 1k columns (apache#10020)

…th more than 1k columns (apache#10020) (apache#1283)

…th more than 1k columns (apache#10020)

github-actions bot added API spark core labels Mar 22, 2024

szehon-ho commented Mar 22, 2024

View reviewed changes

szehon-ho closed this Mar 22, 2024

szehon-ho reopened this Mar 22, 2024

github-actions bot removed the API label Mar 26, 2024

szehon-ho commented Mar 26, 2024

View reviewed changes

szehon-ho force-pushed the postition_deletes_id_collision branch from d2db7f9 to 79cdd7d Compare March 26, 2024 23:47

github-actions bot added API build labels Apr 11, 2024

szehon-ho force-pushed the postition_deletes_id_collision branch from c19734b to 7e7a6d5 Compare April 11, 2024 21:58

danielcweeks requested review from RussellSpitzer, danielcweeks and rdblue April 22, 2024 19:35

RussellSpitzer reviewed May 7, 2024

View reviewed changes

api/src/main/java/org/apache/iceberg/PartitionSpec.java Outdated Show resolved Hide resolved

RussellSpitzer reviewed May 7, 2024

View reviewed changes

api/src/main/java/org/apache/iceberg/PartitionSpec.java Outdated Show resolved Hide resolved

RussellSpitzer reviewed May 7, 2024

View reviewed changes

api/src/main/java/org/apache/iceberg/PartitionSpec.java Outdated Show resolved Hide resolved

RussellSpitzer reviewed May 7, 2024

View reviewed changes

api/src/main/java/org/apache/iceberg/PartitionSpec.java Outdated Show resolved Hide resolved

szehon-ho force-pushed the postition_deletes_id_collision branch from 44159a4 to 552ae60 Compare May 8, 2024 01:36

szehon-ho commented May 8, 2024

View reviewed changes

RussellSpitzer reviewed Jun 11, 2024

View reviewed changes

api/src/main/java/org/apache/iceberg/Schema.java Outdated Show resolved Hide resolved

RussellSpitzer reviewed Jun 11, 2024

View reviewed changes

szehon-ho added 14 commits June 11, 2024 16:40

Core: Calling rewrite_position_delete_files fails on tables with more…

c8288b0

… than 1k columns

Make test smaller (not sure if it OOM'ed)

3efefc5

Redo change to reassign ids on partition instead of row

7ff8c9d

Remove TypeUtil changes from previous attempt

cfbb0a6

Small fix to use cached spec instead of re-calcuating

2642eac

Fix tests

63437ac

Make logic generic

a23d249

Remove test template from older spark versions

d45ad89

Fix older spark tests

50871d2

Review comments

57ede3b

Fix revapi message

394f07b

More review comments

6aa3993

Review comments

1052a33

Fix problem of re-using ids in deleted columns

02d46c5

szehon-ho force-pushed the postition_deletes_id_collision branch from 0ebca1e to 02d46c5 Compare June 11, 2024 23:46

szehon-ho commented Jun 11, 2024

View reviewed changes

RussellSpitzer reviewed Jun 12, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/PositionDeletesTable.java Show resolved Hide resolved

RussellSpitzer approved these changes Jun 12, 2024

View reviewed changes

Review comments, add test

31dc0e0

szehon-ho merged commit b6c949c into apache:main Jun 12, 2024

szehon-ho deleted the postition_deletes_id_collision branch June 12, 2024 23:52

jasonf20 pushed a commit to jasonf20/iceberg that referenced this pull request Aug 4, 2024

Core, Spark: Calling rewrite_position_delete_files fails on tables wi…

4c850ae

…th more than 1k columns (apache#10020)

szehon-ho pushed a commit to szehon-ho/iceberg that referenced this pull request Sep 16, 2024

Core, Spark: Calling rewrite_position_delete_files fails on tables wi…

8e5cb53

…th more than 1k columns (apache#10020) (apache#1283)

sasankpagolu pushed a commit to sasankpagolu/iceberg that referenced this pull request Oct 27, 2024

Core, Spark: Calling rewrite_position_delete_files fails on tables wi…

b31a84c

…th more than 1k columns (apache#10020)

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024

Core, Spark: Calling rewrite_position_delete_files fails on tables wi…

839d8bb

…th more than 1k columns (apache#10020)

Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns #10020

Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns #10020

Uh oh!

Conversation

szehon-ho commented Mar 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szehon-ho Mar 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Mar 25, 2024

Uh oh!

szehon-ho commented Mar 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 26, 2024

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Apr 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szehon-ho commented Apr 12, 2024

Uh oh!

bk-mz commented Apr 12, 2024

Uh oh!

szehon-ho commented Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szehon-ho commented May 8, 2024

Uh oh!

szehon-ho May 8, 2024

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Jun 7, 2024

Choose a reason for hiding this comment

Uh oh!

jgprogramming commented Jun 7, 2024

Uh oh!

Uh oh!

RussellSpitzer Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

szehon-ho Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Jun 12, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RussellSpitzer left a comment

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Jun 12, 2024

Uh oh!

szehon-ho commented Jun 12, 2024

szehon-ho commented Mar 22, 2024 •

edited

Loading

szehon-ho Mar 22, 2024 •

edited

Loading

szehon-ho commented Mar 26, 2024 •

edited

Loading

szehon-ho Mar 26, 2024 •

edited

Loading

szehon-ho Mar 26, 2024 •

edited

Loading

szehon-ho Mar 26, 2024 •

edited

Loading

szehon-ho commented Apr 11, 2024 •

edited

Loading

szehon-ho commented Apr 17, 2024 •

edited

Loading

szehon-ho Jun 11, 2024 •

edited

Loading

szehon-ho Jun 11, 2024 •

edited

Loading