-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(seer grouping): Prepare ingest for multiple Seer results #88619
Merged
lobsterkatie
merged 3 commits into
master
from
kmclb-handle-multiple-seer-results-in-ingest
Apr 3, 2025
Merged
ref(seer grouping): Prepare ingest for multiple Seer results #88619
lobsterkatie
merged 3 commits into
master
from
kmclb-handle-multiple-seer-results-in-ingest
Apr 3, 2025
+120
−68
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
27e817c
to
8214c7c
Compare
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #88619 +/- ##
==========================================
- Coverage 87.73% 87.73% -0.01%
==========================================
Files 10043 10043
Lines 568198 568215 +17
Branches 22303 22303
==========================================
+ Hits 498535 498545 +10
- Misses 69259 69266 +7
Partials 404 404 |
2a504d6
to
4656c81
Compare
8214c7c
to
891aeff
Compare
4656c81
to
459a950
Compare
891aeff
to
1a0edee
Compare
459a950
to
8636e58
Compare
armenzg
approved these changes
Apr 3, 2025
JoshFerge
approved these changes
Apr 3, 2025
Base automatically changed from
kmclb-prework-for-handling-multiple-seer-results
to
master
April 3, 2025 16:05
8636e58
to
ceff3c0
Compare
lobsterkatie
added a commit
that referenced
this pull request
Apr 3, 2025
This adds new metrics to `get_seer_similar_issues`, so we'll be able to see the affects of requesting multiple results from Seer during ingest once we start doing that. Metrics added: - `grouping.similarity.seer_results_returned`: Just because we ask Seer for the 100 closest matches (say), it doesn't mean Seer's necessarily going to find 100 which exceed the `should_group` threshold. It's therefore useful to know how many matches Seer is actually finding, so we can get a sense for when increasing the number requested stops making a difference. - `grouping.similarity.hybrid_fingerprint_results_checked`: This similarly will help us evaluate the number of results we're requesting. If we almost always find a match within the first 50 results, for example, it doesn't make sense to request many more than that, even if they exists. Both this and the metric above are tagged with platform, so we can determine if it would make sense to vary the number of matches requested based on platform. - `grouping.similarity.get_seer_similar_issues`: This replaces the `grouping.similarity.hybrid_fingerprint_seer_result` metric which was removed in #88619, and tracks the overall result of the `get_seer_similar_issues` call. It's different from the old metric in two ways, though: 1) In the case in which the Seer match(es) is/are rejected, it's not as specific as the old one about the reason, since if Seer returns multiple results, it might be a combo of reasons. 2) It also includes non-hybrid cases. (It includes an `is_hybrid` tag to differentiate one from the other.) This PR also adds the above data to our logs.
andrewshie-sentry
pushed a commit
that referenced
this pull request
Apr 8, 2025
This refactors `get_seer_similar_issues`, which is used during ingestion, to allow it to handle multiple Seer matches. (Note that it does not actually change the number of results requested - which is still only 1 - but makes it so that when we do increase that number, we'll be able to handle what comes back.) Key changes: - Pull logic for checking whether a match can be used into a helper, `_should_use_seer_match_for_grouping`, to be called on each result. - Run the results returned from Seer through that function regardless of the hybrid fingerprint status of the incoming event, because the closest Seer match(es) might be hybrid and therefore the same check is needed. - Change the `grouping.similarity.hybrid_fingerprint_seer_result` metric to a `grouping.similarity.hybrid_fingerprint_match_check` metric, since there will now be multiple instances for a single incoming event, possibly with different values. A new metric in the spirit of the original (one encompassing the entire process for a given event) will be added back in in a follow-up PR. We can see that these changes don't affect the eventual outcome of the process because the only changes to tests required by this refactor are ones having to do with the metric. (To keep things simple, tests testing the handling of multiple results will be added in a follow-up PR[1]. The point of this PR is simply to do the refactor and show that the new code is equivalent to the old code.) [1] #88621
andrewshie-sentry
pushed a commit
that referenced
this pull request
Apr 8, 2025
This adds new metrics to `get_seer_similar_issues`, so we'll be able to see the affects of requesting multiple results from Seer during ingest once we start doing that. Metrics added: - `grouping.similarity.seer_results_returned`: Just because we ask Seer for the 100 closest matches (say), it doesn't mean Seer's necessarily going to find 100 which exceed the `should_group` threshold. It's therefore useful to know how many matches Seer is actually finding, so we can get a sense for when increasing the number requested stops making a difference. - `grouping.similarity.hybrid_fingerprint_results_checked`: This similarly will help us evaluate the number of results we're requesting. If we almost always find a match within the first 50 results, for example, it doesn't make sense to request many more than that, even if they exists. Both this and the metric above are tagged with platform, so we can determine if it would make sense to vary the number of matches requested based on platform. - `grouping.similarity.get_seer_similar_issues`: This replaces the `grouping.similarity.hybrid_fingerprint_seer_result` metric which was removed in #88619, and tracks the overall result of the `get_seer_similar_issues` call. It's different from the old metric in two ways, though: 1) In the case in which the Seer match(es) is/are rejected, it's not as specific as the old one about the reason, since if Seer returns multiple results, it might be a combo of reasons. 2) It also includes non-hybrid cases. (It includes an `is_hybrid` tag to differentiate one from the other.) This PR also adds the above data to our logs.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This refactors
get_seer_similar_issues
, which is used during ingestion, to allow it to handle multiple Seer matches. (Note that it does not actually change the number of results requested - which is still only 1 - but makes it so that when we do increase that number, we'll be able to handle what comes back.)Key changes:
Pull logic for checking whether a match can be used into a helper,
_should_use_seer_match_for_grouping
, to be called on each result.Run the results returned from Seer through that function regardless of the hybrid fingerprint status of the incoming event, because the closest Seer match(es) might be hybrid and therefore the same check is needed.
Change the
grouping.similarity.hybrid_fingerprint_seer_result
metric to agrouping.similarity.hybrid_fingerprint_match_check
metric, since there will now be multiple instances for a single incoming event, possibly with different values. A new metric in the spirit of the original (one encompassing the entire process for a given event) will be added back in in a follow-up PR.We can see that these changes don't affect the eventual outcome of the process because the only changes to tests required by this refactor are ones having to do with the metric. (To keep things simple, tests testing the handling of multiple results will be added in a follow-up PR. The point of this PR is simply to do the refactor and show that the new code is equivalent to the old code.)