Skip to content

Support Last Updated Timestamp for Partition Metadata Table #7560

@dramaticlly

Description

@dramaticlly

Feature Request / Improvement

It would be nice to provide last updated timestamp in partition metadata table to see if there's any change applied to given partition.

According to @szehon-ho , today this can be done via following SQL

SELECT 
e.data_file.partition,
MAX(s.committed_at) AS last_modified_time
FROM db.table.snapshots s 
JOIN db.table.entries e 
WHERE s.snapshot_id = e.snapshot_id 
GROUP BY by e.data_file.partition

which join the entries table and snapshot together to derive the max snapshot commit timestamp group by partition, however this feature request attempt to provide native support in partition metadata table

partition spec_id last_updated_timestamp last_updated_snapshot_id record_count file_count
{1} 0 2023-05-08 15:36:23.275 263430419465835934 1 1
{2} 0 2023-05-08 15:36:23.657 3983730822266656596 1 1

Limitation

This assume the snapshot committed time is used to determine when partition is updated. When data compaction executes, the same data will be rewritten by a new snapshot which might cause partition's last update timestamp to change, but it does not necessary mean there's change in underlying data

Query engine

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions