Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(spans): Avoid back pressure in the segments consumer #88419

Merged
merged 2 commits into from
Apr 1, 2025

Conversation

jan-auer
Copy link
Member

@jan-auer jan-auer commented Apr 1, 2025

In multi processing mode, the segments consumer continuously triggered
backpressure. This is because we batch segments in the multiprocessing
step. When a batch of segments is flushed by a process, the pipeline
unfolds the spans, which immediately exhausts the produce buffer.

For example, at a batch size of 100 with 4 processes and assuming an
average of 40 spans per segment, we write up to 16k spans if all
processes finish around the same time. This already exceeds the default
buffer size of 10k.

To avoid manual tuning, we've run long-term tests and determined that
the p95 spans per segment is at 350. We use this along with the number
of processes and the batch size to compute a safer upper bound for the
produce buffer. The consumer is primarily CPU and I/O bound, so we can
afford the additional memory usage.

@jan-auer jan-auer requested a review from a team as a code owner April 1, 2025 10:54
@jan-auer jan-auer requested a review from untitaker April 1, 2025 10:54
@jan-auer jan-auer self-assigned this Apr 1, 2025
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 1, 2025
@jan-auer jan-auer enabled auto-merge (squash) April 1, 2025 11:04
@jan-auer jan-auer merged commit 95afb88 into master Apr 1, 2025
47 checks passed
@jan-auer jan-auer deleted the fix/spans-segment-producer-batch branch April 1, 2025 11:35
andrewshie-sentry pushed a commit that referenced this pull request Apr 8, 2025
In multi processing mode, the segments consumer continuously triggered
backpressure. This is because we batch segments in the multiprocessing
step. When a batch of segments is flushed by a process, the pipeline
unfolds the spans, which immediately exhausts the produce buffer.

For example, at a batch size of 100 with 4 processes and assuming an
average of 40 spans per segment, we write up to 16k spans if all
processes finish around the same time. This already exceeds the default
buffer size of 10k.

To avoid manual tuning, we've run long-term tests and determined that
the p95 spans per segment is at 350. We use this along with the number
of processes and the batch size to compute a safer upper bound for the
produce buffer. The consumer is primarily CPU and I/O bound, so we can
afford the additional memory usage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants