-
Notifications
You must be signed in to change notification settings - Fork 3k
[DOCS] Document how to add per-catalog hadoop conf values with Spark #2922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] Document how to add per-catalog hadoop conf values with Spark #2922
Conversation
| Similar to configuring Hadoop properties by using `spark.hadoop.*`, it's possible to set per-catalog Hadoop configuration values when using Spark by adding the property for the catalog with the prefix `spark.sql.catalog.(catalog-name).hadoop.*`. These properties will take precedence over values configured globally using `spark.hadoop.*` and will only affect Iceberg tables. | ||
|
|
||
| ```plain | ||
| spark.sql.catalog.hadoop_prod.hadoop.fs.s3a.endpoint = http://aws-local:9000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add an example for hadoop.hive.metastore.uris, which is one of the most common use case here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will update to that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late response, I thought I hit comment and I had not.
Wouldn't hive metastore uri's be set via the catalog's existing exposed uri parameter? E.g. spark.sql.catalog.(catalog-name).uri: https://github.com/apache/iceberg/blame/master/site/docs/spark-configuration.md#L60
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put hadoop.hive.metastore.kerberos.principal=hadoop/_HOST@REALM possibly instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't think that we want to point to the metastore URI because that's what our uri property overrides.
|
Thanks for fixing this, @kbendick! |
Adds a section to the Spark documentation on the website about how to override hadoop configuration values per catalog.
This is a very simple explanation and I'm open to discussion on what should be added.
This closes issue #2907
cc @rdblue