-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
feat(aci): setup workflow preview abstraction #87553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# check fast conditions first | ||
fast_conditions, slow_conditions = split_conditions_by_speed(conditions) | ||
|
||
# TODO: early return if they are invalid conditon pairs? see src/sentry/rules/history/preview.py VALID_CONDITION_PAIRS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sentry/src/sentry/rules/history/preview.py
Lines 53 to 59 in 6a4af04
# Most of the ISSUE_STATE_CONDITIONS are mutually exclusive, except for the following pairs. | |
VALID_CONDITION_PAIRS = { | |
"sentry.rules.conditions.first_seen_event.FirstSeenEventCondition": "sentry.rules.conditions.high_priority_issue.NewHighPriorityIssueCondition", | |
"sentry.rules.conditions.high_priority_issue.NewHighPriorityIssueCondition": "sentry.rules.conditions.first_seen_event.FirstSeenEventCondition", | |
"sentry.rules.conditions.reappeared_event.ReappearedEventCondition": "sentry.rules.conditions.high_priority_issue.ExistingHighPriorityIssueCondition", | |
"sentry.rules.conditions.high_priority_issue.ExistingHighPriorityIssueCondition": "sentry.rules.conditions.reappeared_event.ReappearedEventCondition", | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of other early return conditions in the existing implementation
sentry/src/sentry/rules/history/preview.py
Lines 74 to 95 in 6a4af04
issue_state_conditions, frequency_conditions = categorize_conditions(conditions) | |
# must have at least one condition to filter activity | |
if not issue_state_conditions and not frequency_conditions: | |
return None | |
elif len(issue_state_conditions) > 1 and condition_match == "all": | |
# Of the supported conditions, any more than two would be mutually exclusive | |
if len(issue_state_conditions) > 2: | |
return {} | |
condition_ids = {condition["id"] for condition in issue_state_conditions} | |
# all the issue state conditions are mutually exclusive | |
if not any(condition in VALID_CONDITION_PAIRS for condition in condition_ids): | |
return {} | |
# if there are multiple issue state conditions, they must be one of the valid pairs | |
for condition in condition_ids: | |
if ( | |
condition in VALID_CONDITION_PAIRS | |
and VALID_CONDITION_PAIRS[condition] not in condition_ids | |
): | |
return {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These conditions are generally quite complicated - might be a good time to see if we can simplify it a bit or remove some restrictions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could not call the endpoint from the frontend at all if these exist 🤔
if not groups_to_check: | ||
return groups_meeting_fast_conditions | ||
|
||
_, groups_meeting_conditions = preview_conditions(logic_type, slow_conditions, groups_to_check) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current implementation estimates slow conditions for the groups with the top 10 most events in the past two weeks to approximate, since we don't want to blow up Snuba with queries
sentry/src/sentry/rules/history/preview.py
Lines 248 to 278 in 6a4af04
""" | |
Filters the activity to contain only groups that have the most events (out of the given groups) in the past 2 weeks. | |
If no groups are provided because there are no issue state change conditions, returns the top groups overall. | |
Since frequency conditions require snuba query(s), we need to limit the number groups we process. | |
""" | |
if has_issue_state_condition: | |
datasets = {dataset_map.get(group) for group in condition_activity.keys()} | |
else: | |
# condition_activity will be empty because there are no issue state conditions. | |
# So, we look to find top groups over all datasets | |
datasets = set(DATASET_TO_COLUMN_NAME.keys()) | |
group_ids = list(condition_activity.keys()) | |
# queries each dataset for top x groups and then gets top x overall | |
query_params = [] | |
for dataset in datasets: | |
kwargs = get_update_kwargs_for_groups( | |
dataset, | |
group_ids, | |
{ | |
"dataset": dataset, | |
"start": start, | |
"end": end, | |
"filter_keys": {"project_id": [project.id]}, | |
"aggregations": [("count", "group_id", "groupCount")], | |
"groupby": ["group_id"], | |
"orderby": "-groupCount", | |
"selected_columns": ["group_id", "groupCount"], | |
"limit": FREQUENCY_CONDITION_GROUP_LIMIT, | |
}, |
FREQUENCY_CONDITION_GROUP_LIMIT = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this all be done with one query though, similar to how we do batching? It has to go through all of the data here anyway. We could limit it to 10k results, the max snuba will return. Maybe a good limit for earlier group query too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i need to figure out how slow conditions will be batched properly. similarly for tagged event filter and event attribute filter that need to query snuba
@@ -169,6 +169,9 @@ def evaluate_value(self, value: T) -> DataConditionResult: | |||
result = handler.evaluate_value(value, self.comparison) | |||
return self.get_condition_result() if result else None | |||
|
|||
def get_preview_groups(self, group_ids: set[int]) -> set[int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
placeholder, should implement for every DataCondition
type
, via the condition_handler_registry
?
Codecov ReportAll modified and coverable lines are covered by tests ✅ ✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## master #87553 +/- ##
========================================
Coverage 87.74% 87.75%
========================================
Files 10014 10016 +2
Lines 567496 567640 +144
Branches 22297 22297
========================================
+ Hits 497977 498108 +131
- Misses 69102 69115 +13
Partials 417 417 |
6a4af04
to
1776798
Compare
Group.objects.filter( | ||
project__in=projects, | ||
last_seen__gte=timezone.now() - timedelta(days=14), | ||
type__in=group_types, | ||
).values_list("id", flat=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't have this in the old code, but might be a good idea to set an upper bound for number of ids we return here. Something high is fine, 10k, 100k, not sure... If there were 100k groups I think it'd likely time out
for condition in conditions: | ||
conditions_to_triggered_groups[condition.id] = condition.get_preview_groups(group_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we get the preview conditions here? Is each condition going to need to do some querying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes they will do their own querying and return the group_ids that meet the condition. this could be implemented on each DataConditionHandler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might end up being a bit more inefficient compared to the old implementation, since it uses activity and I think most conditions are able to infer what happened from that.
I guess in most cases, we won't have a lot of conditions, and we can make batch queries within the conditions as needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the handlers can use their choice of whatever filtering they need to do with the group id to filter correctly -- some use Activity and others filter on Group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was going to filter on Activity
or Group
just like the existing implementation, which also does a query per condition and pulls the Activity
or Group
result into memory via values_list
if not groups_to_check: | ||
return groups_meeting_fast_conditions | ||
|
||
_, groups_meeting_conditions = preview_conditions(logic_type, slow_conditions, groups_to_check) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this all be done with one query though, similar to how we do batching? It has to go through all of the data here anyway. We could limit it to 10k results, the max snuba will return. Maybe a good limit for earlier group query too?
df4ae4b
to
38850f2
Compare
this requires a whole tech spec of its own. punting until after V0 |
Get
group_ids
from the past 14 days that would have been triggered by aWorkflow
, which is represented by its trigger condition group + triggers and a list of filter condition groups + their filters. We also manually pass in the detectors that we want to think of as "attached" to the workflow.We need to pass in all these arguments because we want the workflow preview to update when the user updates the form, so the
Workflow
may not be saved in the db yet.preview_workflow
logicA. How to find groups that would have fired a data condition group:
logic_type
(see B + "continue checking")B. How to find groups that met a list of conditions
get_preview_groups
for each condition with the group_ids, each condition will have this implemented their own way. This will return the group_ids that meet the conditionlogic_type=DataConditionGroup.Type.ALL
: groups that met all conditions, continue checking groups that met all conditionslogic_type=DataConditionGroup.Type.ANY
/logic_type=DataConditionGroup.Type.ANY_SHORT_CIRCUIT
: groups that met any condition, continue checking groups that did notlogic_type=DataConditionGroup.Type.NONE
: groups that met no conditions, continue checking groups that met no conditions