Skip to content

Apply default k for knn query eagerly #118774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

benwtrent
Copy link
Member

When originally added, the knn query didn't apply top-k restrictions to the query. Instead it would allow the resulting num_candidate to be combined with sibling queries without restricting to top-size results ahead of time.

This honestly is confusing behavior and leads to some bugs in understand how it all works.

This commit addresses this by eagerly gathering only size results when k==null before combining with other queries.

To achieve the previous behavior, this can be done directly by setting k==num_candidates in the query.

@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.0.0 labels Dec 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @benwtrent, I've created a changelog YAML for you.

Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I think we need to change the docs as well, to something like:

k
(Optional, integer) The number of nearest neighbors to return from each shard. Elasticsearch collects k results from each shard, then merges them to find the global top results. This value must be less than or equal to num_candidates. Defaults to the search request size.

@benwtrent benwtrent added the auto-backport Automatically create backport pull requests when merged label Dec 17, 2024
@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Dec 17, 2024
@benwtrent benwtrent removed the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Dec 18, 2024
@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jan 6, 2025
@elasticsearchmachine elasticsearchmachine merged commit c18b48d into elastic:main Jan 7, 2025
16 checks passed
@benwtrent benwtrent deleted the bugfix/apply-default-k-knn-query branch January 7, 2025 20:40
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 118774

@benwtrent
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jan 7, 2025
When originally added, the knn query didn't apply `top-k` restrictions
to the query. Instead it would allow the resulting `num_candidate` to be
combined with sibling queries without restricting to `top-size` results
ahead of time.

This honestly is confusing behavior and leads to some bugs in understand
how it all works.

This commit addresses this by eagerly gathering only `size` results when
`k==null` before combining with other queries.

To achieve the previous behavior, this can be done directly by setting
`k==num_candidates` in the query.

(cherry picked from commit c18b48d)
elasticsearchmachine pushed a commit that referenced this pull request Jan 7, 2025
When originally added, the knn query didn't apply `top-k` restrictions
to the query. Instead it would allow the resulting `num_candidate` to be
combined with sibling queries without restricting to `top-size` results
ahead of time.

This honestly is confusing behavior and leads to some bugs in understand
how it all works.

This commit addresses this by eagerly gathering only `size` results when
`k==null` before combining with other queries.

To achieve the previous behavior, this can be done directly by setting
`k==num_candidates` in the query.

(cherry picked from commit c18b48d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport pending >bug :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants