-
Notifications
You must be signed in to change notification settings - Fork 3k
Support for Namespace properties in JDBC Catalog #3275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
CC: @rdblue @jackye1995 |
core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalogNamespace.java
Outdated
Show resolved
Hide resolved
|
@jackye1995 . thanks for the review. Made the fixes based on the comments. |
core/src/main/java/org/apache/iceberg/jdbc/JdbcTableOperations.java
Outdated
Show resolved
Hide resolved
| NAMESPACE_NAME + " VARCHAR(255) NOT NULL," + | ||
| NAMESPACE_LOCATION + " VARCHAR(5500)," + | ||
| NAMESPACE_METADATA + " VARCHAR(65535)," + | ||
| NAMESPACE_PROPERTIES + " VARCHAR(65535)," + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference of metadata and properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The createNamespace method had Map<String, String> metadata as a parameter. I kept this assuming there was separation of metadata from properties. Do we wish to have one or the other?
|
Thanks @jackye1995. Addressed the comments except the metadata properties comment. Let me know if we need to have two separate given how the method signature has metadata. |
|
@jackye1995 @rdblue , please have a look when you get a chance. Let me know if I can add/augment anything in the PR. |
core/src/main/java/org/apache/iceberg/jdbc/JdbcTableOperations.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| LOG.debug("Creating table {} to store iceberg catalog namespaces", | ||
| JdbcUtil.CATALOG_NAMESPACE_TABLE_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: This one is also off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
Previously, a namespace existed if there was a table in that namespace. This changes the definition so that a namespace exists if there is a row in the namespaces table. That is an incompatible definition. I think we need to decide what the behavior should be for the JDBC catalog, and then make sure that the implementation is consistent. If you should be able to implicitly create a namespace by creating a table (as is still allowed) then this needs to return that a namespace exists if a table is in the namespace. On the other hand, if that isn't allowed then Next, it looks like this creates a generic table with catalog name, namespace name, and string blobs. Is that the right way to go? Why not use a |
|
Thanks for the comments @rdblue . re: namespace and table re: table schema |
|
Also, I couldn't get clarity on whether we need both namespace |
I don't think we should introduce a breaking change like this. We could, in theory, fix it up when starting up the catalog by populating the table... but that's a bit odd. I think it's better to keep behavior and allow an option to be more strict.
I didn't follow this. Can you explain it a bit more for me?
The trade-off here is that removing the last namespace property would basically drop the namespace. In that case, the logic to check whether tables or properties exist is actually a good thing. I like the idea of the two table being loosely coupled. |
The More explanation:
Maybe there's a cleaner way to do this that I am missing. |
Agreed.
Agreed. I'll make the changes in upcoming commits. |
|
@rdblue added the changes we discussed.
Question:
|
| DatabaseMetaData dbMeta = conn.getMetaData(); | ||
| ResultSet tableExists = dbMeta.getTables(null, null, JdbcUtil.CATALOG_TABLE_NAME, null); | ||
| ResultSet tableExists = dbMeta.getTables(null, null, | ||
| JdbcUtil.CATALOG_TABLE_NAME, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this need to change? Looks like it is a whitespace-only change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed that. Will fix.
|
@nssalian, looks like there are two tables, |
| protected static final String CATALOG_NAMESPACE_TABLE_NAME = "iceberg_namespaces"; | ||
| protected static final String NAMESPACE_NAME = "namespace_name"; | ||
| protected static final String NAMESPACE_METADATA = "metadata"; | ||
| protected static final String NAMESPACE_PROPERTIES_TABLE_NAME = "namespace_properties"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be iceberg_namespace_properties to fit with the other table names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
| NAMESPACE_NAME + " VARCHAR(255) NOT NULL," + | ||
| NAMESPACE_PROPERTY_KEY + " VARCHAR(5500)," + | ||
| NAMESPACE_PROPERTY_VALUE + " VARCHAR(5500)," + | ||
| "PRIMARY KEY (" + CATALOG_NAME + ", " + NAMESPACE_NAME + ")" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the primary key should be catalog, namespace, and key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
| "(" + | ||
| CATALOG_NAME + " VARCHAR(255) NOT NULL," + | ||
| NAMESPACE_NAME + " VARCHAR(255) NOT NULL," + | ||
| NAMESPACE_METADATA + " VARCHAR(65535)," + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think namespace metadata is needed. This should be the same thing as properties, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createNamespace(Namespace namespace, Map<String, String> metadata)
is the reason I added that. What does metadata refer to here? Based on our discussion, I think createNamespace needs to change. In the master branch it is Unsupported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metadata here and the properties that can be set in the other method are the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the PR with setProperties as well in a follow up commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why are they metadata and properties named differently if they mean the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes, an argument is used with a different name because it shadows a field in the class, which would break checkstyle.
So if the class has a field properties, then an argument calld properties would make references to properties ambiguous. So our checkstyle doesn't allow it (outside of constructors where it's not ambiguous and this. has to be used to reference the class's value).
This is often why they're different. Also, in some cases it's just whomever wrote the function signature and they were different over time because of authors or just word choice of the person when they were working on that code🤷 . But in my experience, it's usually that the arguments name would shadow a field on the class.
I think we can remove |
|
Trying to catch up with the conversation...
Yes that's actually the very original idea that we thought about in the initial PR. The schema in the current PR was from @miR172 's POC, and I overlooked the fact that it would break existing behavior, sorry for that. So we should switch to use 1 table for namespace properties and 1 table for just table information as you suggested, and we don't need another namespace table because everything can be modeled as key value pairs. |
|
Thanks for the comment @jackye1995 . |
|
Sorry for the long delay on this @nssalian! Now that we have the docker based JDBC catalog notebook env, I'm going to install this in it and run some actual tests (as it's been a while since I've reviewed this). If things work, then I'm ok with this as I'd like to see this supported and we've already addressed a lot of feedback. |
| ResultSet tableExists = dbMeta.getTables(null /* catalog name */, null /* schemaPattern */, | ||
| JdbcUtil.NAMESPACE_PROPERTIES_TABLE_NAME /* tableNamePattern */, null /* types */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably using the comments to define the values that are null etc one time is fine.
Maybe making this a private function that takes in conn and then has the comments would enhance readability.
Like
/**
* Check if a table exists in the database via a pattern to match against.
* Searches all schemas & catalogs.
* @return true if a table matching the pattern exists in the database
*/
private static boolean doesTableExist(Connection conn, String tableNamePatternToCheck) {
DatabaseMetaData dbMeta = conn.getMetaData();
ResultSet tablesMatchingPattern = dbMeta.getTables(
null /* catalog name */, null /* schemaPattern */, tableNamePatternToCheck, null /* types */);
return tablesMatchingPattern.hasNext();
}That way, the null fields can be annotated once and then we could just call
connections.run(conn -> {
if (!doesTableExist(conn, JdbcUtil.NAMESPACE_PROPERTIES_TABLE_NAME)) {
// create table
}
return true;
});I'm also not sure if there's a reason to use .next() instead of .hasNext() on the ResultSet. I'm ok with either way, but hasNext seems like the more intuitive API if there's no semantic difference. I don't know enough about JDBC to be sure if there is a difference or not though so it might be safer to just leave it as is.
| sql.setString(rowIndex + 1, key); | ||
| rowIndex += 1; | ||
| } | ||
| LOG.info("Final log string {}", sql); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This log line can be removed - or at least downgraded to debug and changed to something like LOG.debug("Running query to update namespace properties: {}", sql)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some feedback. I'm going to try running with this and if it works, I'd say we merge it as it's been quite some time.
| AssertHelpers.assertThrows("Cannot create a namespace with null or empty metadata", IllegalArgumentException.class, | ||
| () -> catalog.createNamespace(testNamespace, null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a separate test, testCreateNamespaceFailsWithoutNamespaceProperties()?
That way, there would be something in the code that self-documents that we are intentional about needing namespace properties.
You could then even do something like the following
AssertHelpers.assertThrows(
"JDBC catalog cannot create a namespace without adding namespace properties as the properties are used as an existence check for RDBMs that don't support an atomic compare and swap",
...The long / detailed comment probably isn't needed, but having it as a separate test (with both null and an empty map being tried in the test) would be nice.
| throw new UncheckedInterruptedException(e, "Interrupted in call to insertProperties(namespace, properties) " + | ||
| "Namespace: %s", namespace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update this error message to not refer to code, but human-readable ones?
If somebody said it was fine for here, then leave it. But generally speaking, we prefer to use plain English (vs English with code in it) as end users are unlikely to know the code.
I'd go with something like Interrupted while inserting properties to namespace %s in catalog %s. Insertion likely needs to be retried.
You can drop the insertion likely needs to be retried part if there's not a chance of a partial success (I don't think there is as you're using prepared statements).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Description:
Adding the table to hold namespace names and attributes like location, metadata, properties.
Splitting up the implementation in: #3274 into smaller single change PRs.
Changes: