Skip to content

Allow table read/write options to be configured and/or enforced at catalog level using catalog properties #5343

@szehon-ho

Description

@szehon-ho

Background: #4011 allowed table properties to be set on newly-created tables via catalog properties.

Proposal: It would be nice to have these propagate to runtime table properties as well for read/write time.

Problem Solved: There are a lot of table properties that should be overriden at runtime, see : https://iceberg.apache.org/docs/latest/configuration/ . For example, now the delete write-distribution-mode defaults to hash which cause unexpected shuffles (see #5224), and these are not always desirable. Or users would like to override vectorization due to compatibility bugs (#2740) or change read/write split size dynamically.

But, in spark, spark.sql() cannot take any options, so users are stuck setting table properties for this. This is not user-friendly, as imagine concurrent jobs are running, setting a table property just for one job may break something for another job if it picks it up.

This proposal could be one way without changing Spark, for Iceberg to be able to override table properties at runtime, for the cases where there is no other way to override these properties.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions