Skip to content

Conversation

@ajreid21
Copy link
Contributor

Currently, when you deserialize the FileScanTask JSON using RESTFileScanTaskParser.fromJson, the deserializer checks whether delete_file_references node exists (but not whether it's empty), and if so, the deserializer fails at

Collections.max(indices) < allDeleteFiles.size(),
.

TableScanResponseParser.serializeScanTasks currently always puts an empty deleted_file_references list, even if the field does not exist --

RESTFileScanTaskParser.toJson(fileScanTask, deleteFileReferences, spec, gen);

@github-actions github-actions bot added the core label Nov 12, 2025
@nastra
Copy link
Contributor

nastra commented Nov 12, 2025

@ajreid21 the fix LGTM, can you please add a test to TestPlanTableScanResponseParser?

  @Test
  public void roundTripSerdeWithoutDeleteFiles() {
    ResidualEvaluator residualEvaluator =
        ResidualEvaluator.of(SPEC, Expressions.equal("id", 1), true);
    FileScanTask fileScanTask =
        new BaseFileScanTask(
            FILE_A,
            new DeleteFile[] {},
            SchemaParser.toJson(SCHEMA),
            PartitionSpecParser.toJson(SPEC),
            residualEvaluator);
    PlanTableScanResponse response =
        PlanTableScanResponse.builder()
            .withPlanStatus(PlanStatus.COMPLETED)
            .withFileScanTasks(List.of(fileScanTask))
            .withSpecsById(PARTITION_SPECS_BY_ID)
            .build();

    String expectedJson =
        "{\"plan-status\":\"completed\","
            + "\"file-scan-tasks\":["
            + "{\"data-file\":{\"spec-id\":0,\"content\":\"DATA\",\"file-path\":\"/path/to/data-a.parquet\","
            + "\"file-format\":\"PARQUET\",\"partition\":{\"1000\":0},"
            + "\"file-size-in-bytes\":10,\"record-count\":1,\"sort-order-id\":0},"
            + "\"residual-filter\":{\"type\":\"eq\",\"term\":\"id\",\"value\":1}}]"
            + "}";

    String json = PlanTableScanResponseParser.toJson(response);
    assertThat(json).isEqualTo(expectedJson);

    PlanTableScanResponse fromResponse =
        PlanTableScanResponseParser.fromJson(json, PARTITION_SPECS_BY_ID, false);
    PlanTableScanResponse copyResponse =
        PlanTableScanResponse.builder()
            .withPlanStatus(fromResponse.planStatus())
            .withPlanId(fromResponse.planId())
            .withPlanTasks(fromResponse.planTasks())
            .withDeleteFiles(fromResponse.deleteFiles())
            .withFileScanTasks(fromResponse.fileScanTasks())
            .withSpecsById(PARTITION_SPECS_BY_ID)
            .build();

    assertThat(PlanTableScanResponseParser.toJson(copyResponse)).isEqualTo(expectedJson);
  }

@nastra nastra added this to the Iceberg 1.10.1 milestone Nov 12, 2025
@nastra nastra requested a review from singhpk234 November 12, 2025 07:38
Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix, LGTM, agree with @nastra on adding the UT

List<Integer> indices = JsonUtil.getIntegerList(DELETE_FILE_REFERENCES, jsonNode);
Preconditions.checkArgument(
Collections.max(indices) < allDeleteFiles.size(),
indices.isEmpty() || Collections.max(indices) < allDeleteFiles.size(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix LGTM, i had it sitting on the part 2 pr : https://github.com/apache/iceberg/pull/13400/files#diff-584f9d53626a76efc298a57ada578de6d07c6d0b00f767f85a05e40471426374R90 since Jun 26 :)

I have one more fix which is in the part 2 pr let me get this out of this as well

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ajreid21 @singhpk234 for this fix, agree with the fix but would be good to add the test like @nastra mentioned!

generator.writeFieldName(DATA_FILE);
ContentFileParser.toJson(fileScanTask.file(), partitionSpec, generator);
if (deleteFileReferences != null) {
if (deleteFileReferences != null && !deleteFileReferences.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this, we shouldn't be producing the field if there's nothing there. But we can be more accepting on the read side as done below.

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too, thanks @ajreid21 !

@nastra
Copy link
Contributor

nastra commented Nov 12, 2025

thanks for the reviews @singhpk234 @amogh-jahagirdar

@nastra nastra merged commit dc217b0 into apache:main Nov 12, 2025
44 checks passed
nastra pushed a commit to nastra/iceberg that referenced this pull request Nov 12, 2025
huaxingao pushed a commit that referenced this pull request Nov 12, 2025
…es list (#14568) (#14576)

Co-authored-by: ajreid21 <5721775+ajreid21@users.noreply.github.com>
thomaschow pushed a commit to thomaschow/iceberg that referenced this pull request Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants