Skip to content

Conversation

@kbendick
Copy link
Contributor

@kbendick kbendick commented Aug 2, 2021

Adds a section to the Spark documentation on the website about how to override hadoop configuration values per catalog.

This is a very simple explanation and I'm open to discussion on what should be added.

This closes issue #2907

cc @rdblue

@github-actions github-actions bot added the docs label Aug 2, 2021
@kbendick kbendick changed the title [SITE][DOCS] Document how to add per-catalog hadoop conf values with Spark [DOCS] Document how to add per-catalog hadoop conf values with Spark Aug 2, 2021
@kbendick
Copy link
Contributor Author

kbendick commented Aug 3, 2021

cc @RussellSpitzer @flyrain @raptond

Similar to configuring Hadoop properties by using `spark.hadoop.*`, it's possible to set per-catalog Hadoop configuration values when using Spark by adding the property for the catalog with the prefix `spark.sql.catalog.(catalog-name).hadoop.*`. These properties will take precedence over values configured globally using `spark.hadoop.*` and will only affect Iceberg tables.

```plain
spark.sql.catalog.hadoop_prod.hadoop.fs.s3a.endpoint = http://aws-local:9000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an example for hadoop.hive.metastore.uris, which is one of the most common use case here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will update to that instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late response, I thought I hit comment and I had not.

Wouldn't hive metastore uri's be set via the catalog's existing exposed uri parameter? E.g. spark.sql.catalog.(catalog-name).uri: https://github.com/apache/iceberg/blame/master/site/docs/spark-configuration.md#L60

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put hadoop.hive.metastore.kerberos.principal=hadoop/_HOST@REALM possibly instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't think that we want to point to the metastore URI because that's what our uri property overrides.

@rdblue rdblue merged commit e315d65 into apache:master Aug 6, 2021
@rdblue
Copy link
Contributor

rdblue commented Aug 6, 2021

Thanks for fixing this, @kbendick!

@kbendick kbendick deleted the document-spark-catalog-hadoop-configuration branch August 10, 2021 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants