Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(grouping): Include grouphash age when logging parent_hash_missing_group events #88635

Conversation

lobsterkatie
Copy link
Member

@lobsterkatie lobsterkatie commented Apr 2, 2025

Sometimes, Seer will identify as a match a grouphash which doesn't have a group. In theory this shouldn't happen (all of the data in Seer theoretically comes from events with groups, and when a group is deleted, we trigger a delete in Seer, too), but we consistently see a few dozen such errors an hour.

Our best guess to explain this is a race condition: Event A gets sent to Seer, doesn't find a match, and gets put in the Seer database, and then before it's been able to create a group for itself, event B gets sent to Seer and matches with event A. One of the (multiple) initial motivations for storing grouphash metadata was to test this theory out. Since I've recently come back to our Seer ingest code, I finally got around to doing so, by adding the age in seconds of the group-less grouphash to the log we collect when we run into this. This should let us distinguish race-condition-caused instances (where the age should be under a second) from ones caused by something else (where the age could be anything).

Once we have a better sense of what's going on, we can decide whether we want to exclude the race condition cases from the record deletion that hitting this situation triggers in Seer.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 2, 2025
Copy link

codecov bot commented Apr 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #88635   +/-   ##
=======================================
  Coverage   87.72%   87.72%           
=======================================
  Files       10064    10064           
  Lines      569157   569168   +11     
  Branches    22351    22351           
=======================================
+ Hits       499269   499293   +24     
+ Misses      69508    69495   -13     
  Partials      380      380           

@lobsterkatie lobsterkatie force-pushed the kmclb-log-grouphash-age-when-seer-recommends-hash-with-no-group branch from 0499a9e to a32f1a0 Compare April 3, 2025 03:45
@lobsterkatie lobsterkatie marked this pull request as ready for review April 3, 2025 16:15
@lobsterkatie lobsterkatie requested review from a team as code owners April 3, 2025 16:15
@lobsterkatie lobsterkatie force-pushed the kmclb-log-grouphash-age-when-seer-recommends-hash-with-no-group branch from a32f1a0 to 5d55917 Compare April 4, 2025 01:42
@lobsterkatie lobsterkatie merged commit 73434e6 into master Apr 4, 2025
49 checks passed
@lobsterkatie lobsterkatie deleted the kmclb-log-grouphash-age-when-seer-recommends-hash-with-no-group branch April 4, 2025 15:28
andrewshie-sentry pushed a commit that referenced this pull request Apr 8, 2025
…ing_group` events (#88635)

Sometimes, Seer will identify as a match a grouphash which doesn't have a group. In theory this shouldn't happen (all of the data in Seer theoretically comes from events with groups, and when a group is deleted, we trigger a delete in Seer, too), but we consistently see a few dozen such errors an hour. 

Our best guess to explain this is a race condition: Event A gets sent to Seer, doesn't find a match, and gets put in the Seer database, and then before it's been able to create a group for itself, event B gets sent to Seer and matches with event A. One of the (multiple) initial motivations for storing grouphash metadata was to test this theory out. Since I've recently come back to our Seer ingest code, I finally got around to doing so, by adding the age in seconds of the group-less grouphash to the log we collect when we run into this. This should let us distinguish race-condition-caused instances (where the age should be under a second) from ones caused by something else (where the age could be anything).

Once we have a better sense of what's going on, we can decide whether we want to exclude the race condition cases from the record deletion that hitting this situation triggers in Seer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants