Skip to content

Conversation

@dramaticlly
Copy link
Contributor

@dramaticlly dramaticlly commented May 5, 2023

close #7364

This enable query readable metrics on entry metadata table, similar to #5376 by @szehon-ho

Given table schema like below

table {
  1: id: optional int
  2: data: optional string
}

we can query its per column min/max for each manifest entry

spark
.sql("select snapshot_id, readable_metrics.id.lower_bound, readable_metrics.data.upper_bound")
.show(false)

+-------------------+-----------+-----------+
|snapshot_id        |lower_bound|upper_bound|
+-------------------+-----------+-----------+
|8728349059432389037|1          |1          |
+-------------------+-----------+-----------+

@dramaticlly dramaticlly changed the title Implement ReadableMetrics for Entries table [WIP] Implement ReadableMetrics for Entries table May 5, 2023
@dramaticlly dramaticlly changed the title [WIP] Implement ReadableMetrics for Entries table Implement ReadableMetrics for Entries table May 6, 2023
Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, just some comments about code - reuse

@szehon-ho szehon-ho merged commit ce01bec into apache:master May 15, 2023
@szehon-ho
Copy link
Member

Merged, thanks @dramaticlly for the change!

@szehon-ho
Copy link
Member

Hi, @dramaticlly as I didnt want to block the change, I had some follow up that I made in #7613 including what we discussed in #7539 (comment).

I think it makes the code between BaseFilesTable and BaseEntriesTable even more similar and with more re-use, if you wanted to take a look

@dramaticlly dramaticlly deleted the entries branch May 18, 2023 21:00
@bwliu62
Copy link

bwliu62 commented Jun 30, 2023

Hello, I want to get upper/lower bounds from the manifest file like:

val icebergTalbe = catalog.loadTable(tableIdentifier)

val filter = Expressions.greaterThan("id", 10)

val scan = icebergTable.newScan().filter(filter)

scan.includeColumnStats().planFiles().forEach { task =>

    val upperBounds = task.file().upperBounds()

    val lowerBounds = task.file().lowerBounds()

    log.info(s"upperBounds: $upperBounds")

    log.info(s"upperBounds: $upperBounds")

}

The log will be byteBuffer, I see @szehon-ho pr for a readable metrics, I am wondering do we have a java api we can use, or I have to use DDL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add readable_metrics to entries metadata table

3 participants