Skip to content

Conversation

@edgarRd
Copy link
Contributor

@edgarRd edgarRd commented May 6, 2022

While the WAP workflow enables users to write data via SQL using a wap_id, there's a lack in usability when applying those changes written to the table during the Publish step of the workflow. The user needs to figure out the snapshot-id programmatically to cherry-pick the changes. Ideally, we should have all steps of the WAP workflow available via SQL, since Iceberg has the wap-id => snapshot-id mapping in its own metadata.

This PR proposes a SQL procedure to cherry-pick the changes created with a wap-id. Functionally, it works the same as the cherry-pick procedure, but receives a wap-id as argument instead of a snapshot-id. This would make the Write and Publish parts of WAP available via SQL. The procedure name proposed is publish_changes but I'm open to suggestions if another name would fit better.

Thanks.

BTW - I considered extending the current cherry-pick procedure but I figured the implementation wouldn't look that much different nor we'd end up with less code since there's a few things that'd need to be overwritten from an already simple procedure; so mostly it would've been coupling both implementations. Open to suggestions.

@github-actions github-actions bot added the spark label May 6, 2022
@edgarRd
Copy link
Contributor Author

edgarRd commented May 16, 2022

PTAL @rdblue when you have a chance. Thanks!

@rdblue
Copy link
Contributor

rdblue commented Jun 29, 2022

@edgarRd, sorry I missed this. Can you rebase and I'll review?

@edgarRd edgarRd force-pushed the spark-proc-apply-wap-changes branch from a57181b to 1273c85 Compare June 29, 2022 22:56
@edgarRd
Copy link
Contributor Author

edgarRd commented Jun 29, 2022

Thanks @rdblue - I've rebased the branch.

@rdblue
Copy link
Contributor

rdblue commented Jul 6, 2022

@RussellSpitzer, can you also take a look at this?

@rdblue rdblue merged commit 56c1993 into apache:master Jul 6, 2022
@rdblue
Copy link
Contributor

rdblue commented Jul 6, 2022

Thanks, @edgarRd! Could you also port this to Spark 3.3?

@edgarRd
Copy link
Contributor Author

edgarRd commented Jul 6, 2022

Thank you, @rdblue - I'll send a PR for porting to Spark 3.3 in a bit.

@singhpk234
Copy link
Contributor

Thanks @edgarRd !!

should we also add this procedure in Spark Procedures Doc

@edgarRd
Copy link
Contributor Author

edgarRd commented Jul 11, 2022

Thanks @edgarRd !!

should we also add this procedure in Spark Procedures Doc

Yeah, good catch! I can add the docs in the follow up PR I have for adding it to Spark 3.3: #5223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants