Skip to content

Conversation

@anuragmantri
Copy link
Contributor

@anuragmantri anuragmantri commented Jun 3, 2024

Since we are using the discriminator field, the type property is inherited from TableRequirements. This PR removed redundant type properties in child classes.

@anuragmantri
Copy link
Contributor Author

@flyrain could you please take a look?

@Fokko
Copy link
Contributor

Fokko commented Jun 3, 2024

Thanks for raising this @anuragmantri

Looking at the changes in the code, it will remove the inheritance:

diff --git a/open-api/rest-catalog-open-api.py b/open-api/rest-catalog-open-api.py
index e13aeee..46c7eca 100644
--- a/open-api/rest-catalog-open-api.py
+++ b/open-api/rest-catalog-open-api.py
@@ -361,10 +361,10 @@ class RemovePartitionStatisticsUpdate(BaseUpdate):
 
 
 class TableRequirement(BaseModel):
-    type: str
+    pass
 
 
-class AssertCreate(TableRequirement):
+class AssertCreate(BaseModel):
     """
     The table must not already exist; used for create transactions
     """
@@ -372,7 +372,7 @@ class AssertCreate(TableRequirement):
     type: Literal['assert-create']
 
 
-class AssertTableUUID(TableRequirement):
+class AssertTableUUID(BaseModel):
     """
     The table UUID must match the requirement's `uuid`
     """
@@ -381,7 +381,7 @@ class AssertTableUUID(TableRequirement):
     uuid: str
 
 
-class AssertRefSnapshotId(TableRequirement):
+class AssertRefSnapshotId(BaseModel):
     """
     The table branch or tag identified by the requirement's `ref` must reference the requirement's `snapshot-id`; if `snapshot-id` is `null` or missing, the ref must not already exist
     """
@@ -391,7 +391,7 @@ class AssertRefSnapshotId(TableRequirement):
     snapshot_id: int = Field(..., alias='snapshot-id')
 
 
-class AssertLastAssignedFieldId(TableRequirement):
+class AssertLastAssignedFieldId(BaseModel):
     """
     The table's last assigned column id must match the requirement's `last-assigned-field-id`
     """
@@ -400,7 +400,7 @@ class AssertLastAssignedFieldId(TableRequirement):
     last_assigned_field_id: int = Field(..., alias='last-assigned-field-id')
 
 
-class AssertCurrentSchemaId(TableRequirement):
+class AssertCurrentSchemaId(BaseModel):
     """
     The table's current schema id must match the requirement's `current-schema-id`
     """
@@ -409,7 +409,7 @@ class AssertCurrentSchemaId(TableRequirement):
     current_schema_id: int = Field(..., alias='current-schema-id')
 
 
-class AssertLastAssignedPartitionId(TableRequirement):
+class AssertLastAssignedPartitionId(BaseModel):
     """
     The table's last assigned partition id must match the requirement's `last-assigned-partition-id`
     """
@@ -4[18](https://github.com/apache/iceberg/actions/runs/9356980269/job/25755651734?pr=10434#step:7:19),7 +418,7 @@ class AssertLastAssignedPartitionId(TableRequirement):
     last_assigned_partition_id: int = Field(..., alias='last-assigned-partition-id')
 
 
-class AssertDefaultSpecId(TableRequirement):
+class AssertDefaultSpecId(BaseModel):
     """
     The table's default spec id must match the requirement's `default-spec-id`
     """
@@ -4[27](https://github.com/apache/iceberg/actions/runs/9356980269/job/25755651734?pr=10434#step:7:28),7 +427,7 @@ class AssertDefaultSpecId(TableRequirement):
     default_spec_id: int = Field(..., alias='default-spec-id')
 
 
-class AssertDefaultSortOrderId(TableRequirement):
+class AssertDefaultSortOrderId(BaseModel):
     """
     The table's default sort order id must match the requirement's `default-sort-order-id`
     """

@flyrain
Copy link
Contributor

flyrain commented Jun 3, 2024

We don't have to use inheritance here, but I'm OK with it.
If we choose the inheritance, we will also need to remove the type properties in each subclasses.

@anuragmantri
Copy link
Contributor Author

Thanks @Fokko and @flyrain - I think we should keep the inheritance and remove the duplicates in sub types. I updated this PR to do that.

@flyrain this pattern of discriminator field present also as child field is used in 3 other places

  • ContentFile: data-file, position-delete-file etc
  • BaseUpdate: update-uuid, upgrade-format-version etc
  • ViewRequirement: assert-view-uuid

Should we remove child fields and move the descriptor fields to parent all these places to be consistent?

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Thanks @anuragmantri for working on it. We can remove the duplication for the other schema in different PRs.

@anuragmantri anuragmantri changed the title Open-API: TableRequirements subclasses need not inherit parent properties Open-API: TableRequirements subclasses should inherit 'type' property Jun 4, 2024
The table must not already exist; used for create transactions
"""

type: Literal['assert-create']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sad to see this go, but it looks like it is a limitation of the open-api generator, since it could also get this information from the mapping field. I would love to get to know what others think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're using it in an odd way here. The example here also does not use inheritance but returns an union:

class Cat(BaseModel):
    pet_type: Literal['cat']
    meows: int


class Dog(BaseModel):
    pet_type: Literal['dog']
    barks: float


class Lizard(BaseModel):
    pet_type: Literal['reptile', 'lizard']
    scales: bool


class Model(BaseModel):
    pet: Union[Cat, Dog, Lizard] = Field(..., discriminator='pet_type')
    n: int

It is just that we try to add this field in there. I think the following is more correct:

diff --git a/open-api/rest-catalog-open-api.yaml b/open-api/rest-catalog-open-api.yaml
index f8f68d06df..b85207e57b 100644
--- a/open-api/rest-catalog-open-api.yaml
+++ b/open-api/rest-catalog-open-api.yaml
@@ -2589,18 +2589,6 @@ components:
       type: object
       required:
         - type
-      properties:
-        type:
-          type: "string"
-          enum:
-          - "assert-create"
-          - "assert-table-uuid"
-          - "assert-ref-snapshot-id"
-          - "assert-last-assigned-field-id"
-          - "assert-current-schema-id"
-          - "assert-last-assigned-partition-id"
-          - "assert-default-spec-id"
-          - "assert-default-sort-order-id"
       discriminator:
         propertyName: type
         mapping:
@@ -2612,6 +2600,15 @@ components:
           assert-last-assigned-partition-id: '#/components/schemas/AssertLastAssignedPartitionId'
           assert-default-spec-id: '#/components/schemas/AssertDefaultSpecId'
           assert-default-sort-order-id: '#/components/schemas/AssertDefaultSortOrderId'
+      oneOf:
+        - $ref: '#/components/schemas/AssertCreate'
+        - $ref: '#/components/schemas/AssertTableUUID'
+        - $ref: '#/components/schemas/AssertRefSnapshotId'
+        - $ref: '#/components/schemas/AssertLastAssignedFieldId'
+        - $ref: '#/components/schemas/AssertCurrentSchemaId'
+        - $ref: '#/components/schemas/AssertLastAssignedPartitionId'
+        - $ref: '#/components/schemas/AssertDefaultSpecId'
+        - $ref: '#/components/schemas/AssertDefaultSortOrderId'
 
     AssertCreate:
       allOf:
@@ -2636,6 +2633,7 @@ components:
         The table branch or tag identified by the requirement's `ref` must reference the requirement's `snapshot-id`; 
         if `snapshot-id` is `null` or missing, the ref must not already exist
       required:
+        - type
         - ref
         - snapshot-id
       properties:
@@ -2651,8 +2649,12 @@ components:
       description:
         The table's last assigned column id must match the requirement's `last-assigned-field-id`
       required:
+        - type
         - last-assigned-field-id
       properties:
+        type:
+          type: string
+          enum: [ "assert-last-assigned-field-id" ]
         last-assigned-field-id:
           type: integer
 
@@ -2662,8 +2664,12 @@ components:
       description:
         The table's current schema id must match the requirement's `current-schema-id`
       required:
+        - type
         - current-schema-id
       properties:
+        type:
+          type: string
+          enum: [ "assert-current-schema-id" ]
         current-schema-id:
           type: integer
 
@@ -2673,8 +2679,12 @@ components:
       description:
         The table's last assigned partition id must match the requirement's `last-assigned-partition-id`
       required:
+        - type
         - last-assigned-partition-id
       properties:
+        type:
+          type: string
+          enum: [ "assert-last-assigned-partition-id" ]
         last-assigned-partition-id:
           type: integer
 
@@ -2684,8 +2694,12 @@ components:
       description:
         The table's default spec id must match the requirement's `default-spec-id`
       required:
+        - type
         - default-spec-id
       properties:
+        type:
+          type: string
+          enum: [ "assert-default-spec-id" ]
         default-spec-id:
           type: integer
 
@@ -2695,8 +2709,12 @@ components:
       description:
         The table's default sort order id must match the requirement's `default-sort-order-id`
       required:
+        - type
         - default-sort-order-id
       properties:
+        type:
+          type: string
+          enum: [ "assert-default-sort-order-id" ]
         default-sort-order-id:
           type: integer

This yields (just one example for readability):

+class TableRequirement(BaseModel):
+    __root__: Union[
+        AssertCreate,
+        AssertTableUUID,
+        AssertRefSnapshotId,
+        AssertLastAssignedFieldId,
+        AssertCurrentSchemaId,
+        AssertLastAssignedPartitionId,
+        AssertDefaultSpecId,
+        AssertDefaultSortOrderId,
+    ] = Field(..., discriminator='type')
+
+

+class AssertCreate(BaseModel):
+    """
+    The table must not already exist; used for create transactions
+    """
+
+    type: Literal['assert-create']

The __root__ means that the class itself is one of these listed in the Union. This would be more in line with the official open-API documentation (example at the bottom of the page): https://swagger.io/docs/specification/data-models/inheritance-and-polymorphism/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, we don't have to use inheritance here. I'm OK with union.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, why do we need a field type in python classes to indicate its type(class name)?

Copy link
Contributor

@Fokko Fokko Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense, for example:

class AssertTableUUID(TableRequirement):
    """
    The table UUID must match the requirement's `uuid`
    """

    type: Literal['assert-table-uuid']
    uuid: str

When this class is converted to JSON using Pydantic, it will result in:

{
    "type": "assert-create",
    "uuid": "550e8400-e29b-41d4-a716-446655440000"
}

I think I prefer the union as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestions, I made the changes. The code now uses a union of subclasses as described in the above example.

@anuragmantri anuragmantri changed the title Open-API: TableRequirements subclasses should inherit 'type' property Open-API: TableRequirements should use union of subclasses Jun 5, 2024
- $ref: '#/components/schemas/AssertDefaultSortOrderId'

AssertCreate:
allOf:
Copy link
Contributor

@flyrain flyrain Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need allOf in case of union? I think allOf is only needed if we are using inheritance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, with union, allOf is not required. I removed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will merge it soon. Hi @Fokko, do you want to take another look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, I was traveling. Thanks @anuragmantri and @flyrain for fixing this 👍

@flyrain flyrain merged commit e0dc57e into apache:main Jun 6, 2024
@flyrain
Copy link
Contributor

flyrain commented Jun 6, 2024

Thanks @anuragmantri for working on this. Thanks @Fokko for the reivew.

jasonf20 pushed a commit to jasonf20/iceberg that referenced this pull request Aug 4, 2024
sasankpagolu pushed a commit to sasankpagolu/iceberg that referenced this pull request Oct 27, 2024
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants