Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(alerts): Add issue summary to slack issue alerts #88033

Merged
merged 8 commits into from
Mar 31, 2025

Conversation

roaga
Copy link
Member

@roaga roaga commented Mar 26, 2025

This PR should:

  • for any issue alert in slack, only for error issues, only if behind the gen-ai-features FF and behind the new project level summary x alerts FF...
  • fetch the summary for the issue (will hit the cache if it already exists or generate a new one otherwise)
  • time out at 5 seconds
  • replace the title and body of the alert with the summary content only if we got a summary successfully

@roaga roaga requested a review from jennmueng March 26, 2025 21:17
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 26, 2025
Copy link
Member

@jennmueng jennmueng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, defer to alerts team to review

Comment on lines 524 to 525
# Enables automatically triggering issue summary on alerts
manager.add("projects:trigger-issue-summary-on-alerts", ProjectFeature, FeatureHandlerStrategy.FLAGPOLE, api_expose=True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a project level flag so I can just try it out on Seer before releasing to internal org

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this to be exposed in the API to the frontend?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, set it to false

@roaga roaga marked this pull request as ready for review March 26, 2025 21:38
@roaga roaga requested review from a team as code owners March 26, 2025 21:38
Copy link
Member

@iamrajjoshi iamrajjoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also please get a screenshot of how the notification looks?
https://develop.sentry.dev/integrations/slack/ - docs on how to set it up

try:
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(get_issue_summary, self.group)
summary_result, status_code = future.result(timeout=5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we refactor the timeout magic number into a constant and perhaps add metrics/span here so we can monitor how long this is taking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead define a new sentry option that allows us to modify this timeout on the fly if we need to adjust it. It'll be faster than waiting for a full rollout if something goes wrong.

try:
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(get_issue_summary, self.group)
summary_result, status_code = future.result(timeout=5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead define a new sentry option that allows us to modify this timeout on the fly if we need to adjust it. It'll be faster than waiting for a full rollout if something goes wrong.

return summary_result
return None
except (concurrent.futures.TimeoutError, Exception) as e:
logger.exception("Error generating issue summary: %s", e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want metrics to track these failures?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a span and the Sentry issue generated from this sufficient? I'm not too knowledgeable about other ways to track metrics here

Copy link
Member

@GabeVillalobos GabeVillalobos Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be fine for now. If we want more granular tracking of things like duration over time, success/failure count, we tend to wrap calls like these in metrics decorators like this for us to graph in Datadog, assign to SLOs, etc:

with metrics.timer("sentry.tasks.process_suspect_commits.process_loop"):

This is totally optional though.

return None

try:
with concurrent.futures.ThreadPoolExecutor() as executor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this pattern something seer uses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. We use threading a lot in Seer but doesn't seem relevant here. I just thought this would be a simple way to set a timeout, since we said we wanted a timeout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of starting an async thread here and synchronously waiting for it vs passing it as a parameter to get_issue_summary util function instead? Seems like we're just using a request under the hood which can already handle this concern:

response = requests.post(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with the timeout is not just the Seer call, but also the queries we're running to get trace-connected issues on the Sentry backend. In my experience those are often the main culprit for slow summary generation. Is there a better way to wrap both of those in a timeout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh that's fair. I'm not super familiar with eventstore, but this approach makes sense if we're trying to cover those under the same 5s timeout.

Copy link

codecov bot commented Mar 26, 2025

Codecov Report

Attention: Patch coverage is 93.20388% with 7 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...try/integrations/utils/issue_summary_for_alerts.py 82.14% 5 Missing ⚠️
...entry/integrations/slack/message_builder/issues.py 96.87% 1 Missing ⚠️
src/sentry/integrations/slack/utils/escape.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #88033      +/-   ##
==========================================
- Coverage   87.71%   87.71%   -0.01%     
==========================================
  Files        9977     9978       +1     
  Lines      564630   564726      +96     
  Branches    22232    22232              
==========================================
+ Hits       495292   495375      +83     
- Misses      68922    68935      +13     
  Partials      416      416              

@roaga
Copy link
Member Author

roaga commented Mar 27, 2025

Screenshot of current version:
Screenshot 2025-03-27 at 2 22 46 PM

Once again, this is under a project feature flag, and the plan is to start with just the internal seer project

@roaga roaga merged commit 07dd6e4 into master Mar 31, 2025
48 checks passed
@roaga roaga deleted the issue-summary/alerts branch March 31, 2025 16:57
andrewshie-sentry pushed a commit that referenced this pull request Mar 31, 2025
This PR should:
- for any issue alert in slack, only for error issues, only if behind
the gen-ai-features FF and behind the new project level summary x alerts
FF...
- fetch the summary for the issue (will hit the cache if it already
exists or generate a new one otherwise)
- time out at 5 seconds
- replace the title and body of the alert with the summary content only
if we got a summary successfully

---------

Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
Copy link

sentry-io bot commented Apr 1, 2025

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ TimeoutError sentry.tasks.post_process.post_process_group View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants