-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Description
Feature Request / Improvement
It would be nice to provide last updated timestamp in partition metadata table to see if there's any change applied to given partition.
According to @szehon-ho , today this can be done via following SQL
SELECT
e.data_file.partition,
MAX(s.committed_at) AS last_modified_time
FROM db.table.snapshots s
JOIN db.table.entries e
WHERE s.snapshot_id = e.snapshot_id
GROUP BY by e.data_file.partitionwhich join the entries table and snapshot together to derive the max snapshot commit timestamp group by partition, however this feature request attempt to provide native support in partition metadata table
| partition | spec_id | last_updated_timestamp | last_updated_snapshot_id | record_count | file_count |
|---|---|---|---|---|---|
| {1} | 0 | 2023-05-08 15:36:23.275 | 263430419465835934 | 1 | 1 |
| {2} | 0 | 2023-05-08 15:36:23.657 | 3983730822266656596 | 1 | 1 |
Limitation
This assume the snapshot committed time is used to determine when partition is updated. When data compaction executes, the same data will be rewritten by a new snapshot which might cause partition's last update timestamp to change, but it does not necessary mean there's change in underlying data
Query engine
None
singhpk234 and pp-eyushinszehon-ho, Fokko and pp-eyushin
Metadata
Metadata
Assignees
Labels
No labels