Skip to content

Avoid reading unnecessary dimension values when downsampling #124451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

martijnvg
Copy link
Member

Read dimension values once per tsid/bucket docid range instead of for each document being processed.
The dimension value within a bucket-interval docid range is always to same and this avoids unnecessary reads.

Latency of downsampling the tsdb track index into a 1 hour interval downsample index drop by ~16% (running on my local machine).

@martijnvg martijnvg added >bug auto-backport Automatically create backport pull requests when merged :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data v8.18.1 v8.19.0 v9.0.1 v9.1.0 labels Mar 9, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat

@martijnvg martijnvg changed the title Improve downsample dimension processing. Avoid reading unnecessary dimension values when downsampling Mar 10, 2025
@martijnvg martijnvg enabled auto-merge (squash) March 10, 2025 10:40
@martijnvg martijnvg merged commit 6afd3ec into elastic:main Mar 10, 2025
16 of 17 checks passed
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Mar 10, 2025
…#124451)

Read dimension values once per tsid/bucket docid range instead of for each document being processed.
The dimension value within a bucket-interval docid range is always to same and this avoids unnecessary reads.

Latency of downsampling the tsdb track index into a 1 hour interval downsample index drop by ~16% (running on my local machine).
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Mar 10, 2025
…#124451)

Read dimension values once per tsid/bucket docid range instead of for each document being processed.
The dimension value within a bucket-interval docid range is always to same and this avoids unnecessary reads.

Latency of downsampling the tsdb track index into a 1 hour interval downsample index drop by ~16% (running on my local machine).
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.18
8.x
9.0

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Mar 10, 2025
…#124451)

Read dimension values once per tsid/bucket docid range instead of for each document being processed.
The dimension value within a bucket-interval docid range is always to same and this avoids unnecessary reads.

Latency of downsampling the tsdb track index into a 1 hour interval downsample index drop by ~16% (running on my local machine).
elasticsearchmachine pushed a commit that referenced this pull request Mar 10, 2025
#124470)

Read dimension values once per tsid/bucket docid range instead of for each document being processed.
The dimension value within a bucket-interval docid range is always to same and this avoids unnecessary reads.

Latency of downsampling the tsdb track index into a 1 hour interval downsample index drop by ~16% (running on my local machine).
elasticsearchmachine pushed a commit that referenced this pull request Mar 10, 2025
#124468)

Read dimension values once per tsid/bucket docid range instead of for each document being processed.
The dimension value within a bucket-interval docid range is always to same and this avoids unnecessary reads.

Latency of downsampling the tsdb track index into a 1 hour interval downsample index drop by ~16% (running on my local machine).
elasticsearchmachine pushed a commit that referenced this pull request Mar 10, 2025
#124469)

Read dimension values once per tsid/bucket docid range instead of for each document being processed.
The dimension value within a bucket-interval docid range is always to same and this avoids unnecessary reads.

Latency of downsampling the tsdb track index into a 1 hour interval downsample index drop by ~16% (running on my local machine).
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Mar 11, 2025
…#124451)

Read dimension values once per tsid/bucket docid range instead of for each document being processed.
The dimension value within a bucket-interval docid range is always to same and this avoids unnecessary reads.

Latency of downsampling the tsdb track index into a 1 hour interval downsample index drop by ~16% (running on my local machine).
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Mar 13, 2025
…#124451)

Read dimension values once per tsid/bucket docid range instead of for each document being processed.
The dimension value within a bucket-interval docid range is always to same and this avoids unnecessary reads.

Latency of downsampling the tsdb track index into a 1 hour interval downsample index drop by ~16% (running on my local machine).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:StorageEngine v8.18.1 v8.19.0 v9.0.1 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants