Skip to content

Update grammar to rely on indexPattern instead of identifier in join target #120494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jan 24, 2025

Conversation

idegtiarenko
Copy link
Contributor

@idegtiarenko idegtiarenko commented Jan 21, 2025

This replaces identifier with indexPattern in joinTarget grammar.

This change is needed to make index selection consistent between FROM and [LOOKUP] JOIN commands:

  • Both should use the same quotes " (currently join relies on `)
  • Both should allow specifying indices with - without having to quote them (not possible with join at the moment)
  • Both should conform to allowed index names (there are number of differences today, for example it is possible to specify test? or +test in join even though it is not a valid index name.)

@idegtiarenko idegtiarenko added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Jan 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @idegtiarenko, I've created a changelog YAML for you.

@idegtiarenko idegtiarenko marked this pull request as ready for review January 21, 2025 09:49
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@idegtiarenko idegtiarenko requested a review from ivancea January 21, 2025 13:18
# Conflicts:
#	x-pack/plugin/esql/qa/security/src/javaRestTest/java/org/elasticsearch/xpack/esql/EsqlSecurityIT.java
Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I would wait for Costin or Alex to also check it, in case I missed something around indexPattern, as I'm relatively new in this part of ESQL 👀

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue description doesn't specify why the change was done.
I presume it's to allow proper syntax ("index") instead of (index) however there are side effects - changing to indexPattern (instead of indexString) allows for remote clusters to be specified and for index patterns (index*) to be specified, including date math.

My expectation is that patterns should be rejected (this doesn't seem to be the case), date math could be allowed if the syntax points to just an alias or index - the unit test doesn't seem to cover this case.

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add randomization to have the index specified with and without double quotes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep this test simple. All possible index name variations are covered in dedicated parser test.

@@ -2939,4 +2941,69 @@ public void testNamedFunctionArgumentWithUnsupportedNamedParameterTypes() {
);
}
}

public void testValidJoinPattern() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the index pattern makes sense for parser (vs grammar) validation however I would have expected for patterns to be rejected from the start instead of waiting to resolve them first.
That is LOOKUP JOIN foo works during parsing (but might fail at runtime if it's an alias pointing to multiple indices) however LOOKUP JOIN foo* fails at parsing since it's pattern (regardless of whether it gets resolved to 1 or more indices).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 2b642b9

@idegtiarenko
Copy link
Contributor Author

idegtiarenko commented Jan 22, 2025

The issue description doesn't specify why the change was done.

I have added several examples highlighting the difference in the syntax

I presume it's to allow proper syntax ("index") instead of (index) however there are side effects - changing to indexPattern (instead of indexString) allows for remote clusters to be specified and for index patterns (index*) to be specified, including date math.

Remote clusters are handled in #120277. I would like to merge it after to ensure that works for both left and right patterns of the query string.

My expectation is that patterns should be rejected (this doesn't seem to be the case), date math could be allowed if the syntax points to just an alias or index - the unit test doesn't seem to cover this case.

Added in 2b642b9

@idegtiarenko idegtiarenko requested a review from costin January 22, 2025 10:17
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm nearly done reviewing. So far, this looks very good. I have a couple notes on test cases, but the general grammar change LGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're also enabling quoting, could we please add test cases demonstrating the usage of " and """ quotes in LOOKUP JOIN?

I understand the quotes shouldn't be necessary right now, but someone could start using them and therefor we should have tests to guard against regression.

@alex-spies alex-spies self-requested a review January 23, 2025 13:52
@alex-spies
Copy link
Contributor

alex-spies commented Jan 23, 2025

The issue description doesn't specify why the change was done.

@costin the csv tests show that we currently require a lot of backtick quoting that's not required in (and is inconsistent with) FROM, mostly due to minuses in index names. I do believe this PR's change is very important to address this.

however there are side effects - changing to indexPattern (instead of indexString) allows for remote clusters to be specified and for index patterns (index*) to be specified, including date math.

This is true, and we'll address this with proper validation. CCS will be addressed once #120277 is update for the changes here, date math will get its own follow-up PR with validation, and wildcards are already addressed.

Also note that even in the absence of parse-time validation, all of these cases (wildcards, date math, and probably also ccs) should also be rejected due to Luigi's PR which disables aliases, specifically here where we ensure that the concrete index name is equal to the index pattern supplied by the user.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks @idegtiarenko !

I have mostly minor remarks and think this could be merged as-is. I'd add an integration test for " and """ usage though, as discussed.

We already identified a follow-up to validate against date math. Feel free to postpone additional enhancements for the next PR, and let's get this wrapped up and merged.

Comment on lines +2954 to +2955
var type = randomFrom("", "LOOKUP ");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty string looks wrong - FROM idx | JOIN other_idx ON field is not supported, but by the test below, we parse this as a lookup join o.O

Consider removing the plain join, but it's also fine as-is due to this only testing the parsing itself.

if (ESTestCase.randomBoolean()) {
index.append('*');
} else {
index.insert(ESTestCase.randomIntBetween(0, index.length() - 1), '*');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there could be multiple wildcards in the middle as well.

} else {
index.insert(ESTestCase.randomIntBetween(0, index.length() - 1), '*');
}
} else if (canAdd(Features.DATE_MATH, features)) {
Copy link
Contributor

@alex-spies alex-spies Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For date math, specifically the case with a pipe | character is interesting in ESQL.

But we can consider this in the follow-up PR for date math invalidation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit tricky as using | requires mandatory quoting. I would like to followup on it separately.

* Identifier could be an index or alias. It might be hidden or remote or use a pattern.
* See @link <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html#indices-create-api-path-params">valid index patterns</a>
*/
public static String randomIndexPattern(Feature... features) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thing is really cool. Thanks for researching all the different ways that patterns can look like!

# Conflicts:
#	x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/esql/191_lookup_join_text.yml
@idegtiarenko idegtiarenko merged commit 9ffe3c8 into elastic:main Jan 24, 2025
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 120494

@idegtiarenko idegtiarenko deleted the es-10559 branch January 24, 2025 12:33
idegtiarenko added a commit that referenced this pull request Jan 24, 2025
… join target (#120784)

* Update grammar to rely on indexPattern instead of identifier in join target (#120494)

This replaces identifier with indexPattern in joinTarget grammar.
This change is needed to make index selection consistent between FROM and [LOOKUP] JOIN commands:

* Both should use the same quotes " (currently join relies on `)
* Both should allow specifying indices with - without having to quote them (not possible with join at the moment)
* Both should conform to allowed index names (there are number of differences today, for example it is possible to specify test? or +test in join even though it is not a valid index name.)

(cherry picked from commit 9ffe3c8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants