Skip to content

Add cluster level reduction #117731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 3, 2024
Merged

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Nov 28, 2024

This change introduces cluster-level reduction. Unlike data-node-level reduction, it does not require pragmas because the network latency and throughput across clusters differ significantly from those within a cluster. As a result, the benefits of this reduction should outweigh the risks.

@dnhatn dnhatn force-pushed the cluster-level-reduction branch from 20f02a9 to 50112f1 Compare November 29, 2024 01:40
@dnhatn dnhatn changed the title Reduction Add cluster level reduction Nov 29, 2024
@dnhatn dnhatn requested review from nik9000 and astefan November 29, 2024 08:42
@dnhatn dnhatn marked this pull request as ready for review November 29, 2024 08:42
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 29, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

}
return null;
return EstimatesRowSize.estimateRowSize(fragment.estimatedRowSize(), reducePlan);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the difference between the instanceof tree and what you've got but I'm going to assume you worked it out and its the same enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question about this, actually. You seem to have only kept AggregateExec here; I have vague memories that there were some tests that tested at least two such scenarios with reducing plans (limit, topn, orderby...). Wondering why no tests failed and why you only kept AggregateExec here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are the same. The change was made to remove the switch statement.

Wondering why no tests failed and why you only kept AggregateExec here.

We need to change the mode of AggregateExec to emit intermediate outputs.

@dnhatn
Copy link
Member Author

dnhatn commented Dec 3, 2024

@nik9000 @astefan Thanks for reviewing.

@dnhatn dnhatn added the auto-backport Automatically create backport pull requests when merged label Dec 3, 2024
@dnhatn dnhatn merged commit af7d3f9 into elastic:main Dec 3, 2024
16 checks passed
@dnhatn dnhatn deleted the cluster-level-reduction branch December 3, 2024 01:57
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Dec 3, 2024
This change introduces cluster-level reduction. Unlike data-node-level 
reduction, it does not require pragmas because the network latency and
throughput across clusters differ significantly from those within a
cluster. As a result, the benefits of this reduction should outweigh the
risks.
elasticsearchmachine pushed a commit that referenced this pull request Dec 3, 2024
* Add cluster level reduction (#117731)

This change introduces cluster-level reduction. Unlike data-node-level 
reduction, it does not require pragmas because the network latency and
throughput across clusters differ significantly from those within a
cluster. As a result, the benefits of this reduction should outweigh the
risks.

* compile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants