-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Update grammar to rely on indexPattern instead of identifier in join target #120494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @idegtiarenko, I've created a changelog YAML for you. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/esql/190_lookup_join.yml
Outdated
Show resolved
Hide resolved
# Conflicts: # x-pack/plugin/esql/qa/security/src/javaRestTest/java/org/elasticsearch/xpack/esql/EsqlSecurityIT.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I would wait for Costin or Alex to also check it, in case I missed something around indexPattern, as I'm relatively new in this part of ESQL 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue description doesn't specify why the change was done.
I presume it's to allow proper syntax ("index") instead of (index
) however there are side effects - changing to indexPattern (instead of indexString) allows for remote clusters to be specified and for index patterns (index*
) to be specified, including date math.
My expectation is that patterns should be rejected (this doesn't seem to be the case), date math could be allowed if the syntax points to just an alias or index - the unit test doesn't seem to cover this case.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add randomization to have the index specified with and without double quotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep this test simple. All possible index name variations are covered in dedicated parser test.
@@ -2939,4 +2941,69 @@ public void testNamedFunctionArgumentWithUnsupportedNamedParameterTypes() { | |||
); | |||
} | |||
} | |||
|
|||
public void testValidJoinPattern() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the index pattern makes sense for parser (vs grammar) validation however I would have expected for patterns to be rejected from the start instead of waiting to resolve them first.
That is LOOKUP JOIN foo
works during parsing (but might fail at runtime if it's an alias pointing to multiple indices) however LOOKUP JOIN foo*
fails at parsing since it's pattern (regardless of whether it gets resolved to 1 or more indices).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 2b642b9
I have added several examples highlighting the difference in the syntax
Remote clusters are handled in #120277. I would like to merge it after to ensure that works for both left and right patterns of the query string.
Added in 2b642b9 |
# Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm nearly done reviewing. So far, this looks very good. I have a couple notes on test cases, but the general grammar change LGTM.
x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're also enabling quoting, could we please add test cases demonstrating the usage of "
and """
quotes in LOOKUP JOIN
?
I understand the quotes shouldn't be necessary right now, but someone could start using them and therefor we should have tests to guard against regression.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
Outdated
Show resolved
Hide resolved
@costin the csv tests show that we currently require a lot of backtick quoting that's not required in (and is inconsistent with)
This is true, and we'll address this with proper validation. CCS will be addressed once #120277 is update for the changes here, date math will get its own follow-up PR with validation, and wildcards are already addressed. Also note that even in the absence of parse-time validation, all of these cases (wildcards, date math, and probably also ccs) should also be rejected due to Luigi's PR which disables aliases, specifically here where we ensure that the concrete index name is equal to the index pattern supplied by the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks @idegtiarenko !
I have mostly minor remarks and think this could be merged as-is. I'd add an integration test for "
and """
usage though, as discussed.
We already identified a follow-up to validate against date math. Feel free to postpone additional enhancements for the next PR, and let's get this wrapped up and merged.
var type = randomFrom("", "LOOKUP "); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty string looks wrong - FROM idx | JOIN other_idx ON field
is not supported, but by the test below, we parse this as a lookup join o.O
Consider removing the plain join, but it's also fine as-is due to this only testing the parsing itself.
if (ESTestCase.randomBoolean()) { | ||
index.append('*'); | ||
} else { | ||
index.insert(ESTestCase.randomIntBetween(0, index.length() - 1), '*'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there could be multiple wildcards in the middle as well.
} else { | ||
index.insert(ESTestCase.randomIntBetween(0, index.length() - 1), '*'); | ||
} | ||
} else if (canAdd(Features.DATE_MATH, features)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For date math, specifically the case with a pipe |
character is interesting in ESQL.
But we can consider this in the follow-up PR for date math invalidation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit tricky as using |
requires mandatory quoting. I would like to followup on it separately.
* Identifier could be an index or alias. It might be hidden or remote or use a pattern. | ||
* See @link <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html#indices-create-api-path-params">valid index patterns</a> | ||
*/ | ||
public static String randomIndexPattern(Feature... features) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thing is really cool. Thanks for researching all the different ways that patterns can look like!
# Conflicts: # x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/esql/191_lookup_join_text.yml
💔 Backport failed
You can use sqren/backport to manually backport by running |
… join target (#120784) * Update grammar to rely on indexPattern instead of identifier in join target (#120494) This replaces identifier with indexPattern in joinTarget grammar. This change is needed to make index selection consistent between FROM and [LOOKUP] JOIN commands: * Both should use the same quotes " (currently join relies on `) * Both should allow specifying indices with - without having to quote them (not possible with join at the moment) * Both should conform to allowed index names (there are number of differences today, for example it is possible to specify test? or +test in join even though it is not a valid index name.) (cherry picked from commit 9ffe3c8)
This replaces
identifier
withindexPattern
injoinTarget
grammar.This change is needed to make index selection consistent between
FROM
and[LOOKUP] JOIN
commands:"
(currently join relies on`
)-
without having to quote them (not possible with join at the moment)test?
or+test
in join even though it is not a valid index name.)