-
Notifications
You must be signed in to change notification settings - Fork 240
[XL] Add Python Crank Scheduling tool #2106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…he new scheduling tool.
…generating the yml pipelines.
…added machine_groups for the base azure configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive Python-based crank scheduling tool to automate CI pipeline generation and optimize machine/scenario allocation across multiple performance testing machines.
- Adds a complete Python crank scheduler with sophisticated machine allocation algorithms and multi-YAML generation capabilities
- Updates existing CI configurations to use the new machine group system and multi-capability machine definitions
- Replaces manual YAML matrix files with JSON-based configuration and automated pipeline generation
Reviewed Changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/crank-scheduler/*.py | Core scheduler implementation with machine allocation, runtime estimation, and template generation |
| scripts/crank-scheduler/requirements.txt | Python dependencies for the scheduler |
| scripts/crank-scheduler/*.md | Documentation and configuration guides |
| build/benchmarks_ci*.json | Updated machine configurations with new capability-based structure and machine groups |
| build/benchmarks*.yml | Updated pipeline files generated by the new scheduler |
| build/benchmarks.template.liquid | Updated template comments to reflect new generation process |
…e unused requirements and code, and added some new entries to .gitignore.
c79428e to
a9d28b0
Compare
…. Also improved the scheduler to better handle role-priority based profile selection.
| epilog=""" | ||
| Examples: | ||
| # Generate schedule from JSON files | ||
| python main.py --config config.json --format table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--format is used in the examples, but we don't seem to have a --config argument anywhere.
|
|
||
| machines_by_type = {} | ||
| for machine in machines: | ||
| # Get primary machine type (lowest priority capability) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I assume by lowest here we mean lowest number, which would actually be higher priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 23 out of 24 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ```json | ||
| { | ||
| "name": "performance-test-scenario", | ||
| "scenario_type": 2, | ||
| "estimated_runtime": 45.0, | ||
| "target_machines": ["machine-1", "machine-2"] | ||
| } | ||
| ``` | ||
|
|
||
| #### Scenario Properties | ||
|
|
||
| - **name**: Scenario identifier | ||
| - **scenario_type**: Number of machines required (1=SUT only, 2=SUT+Load, 3=SUT+Load+DB) |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example and the scenario property list document a scenario_type field, but DataLoader.load_combined_configuration currently reads the scenario type from the type key and example_complete_features.json also uses "type". As written, a config that follows this README and uses scenario_type will cause a ScenarioType lookup failure; please either adjust the loader to accept scenario_type or update the docs/examples to use the actual key (type) so JSON configs are valid.
| ```json | ||
| { | ||
| "name": "Simple Single Machine Test", | ||
| "template": "simple-single.yml", | ||
| "scenario_type": 1, | ||
| "target_machines": ["single-type-machine", "multi-type-machine"], | ||
| "estimated_runtime": 10.0, | ||
| "description": "Basic single machine scenario with default profiles" | ||
| } | ||
| ``` | ||
|
|
||
| **Result:** Uses default profiles for all machines | ||
|
|
||
| ### 2. Custom Profile Selection | ||
|
|
||
| ```json | ||
| { | ||
| "name": "Triple Machine Test with Custom Profiles", | ||
| "template": "triple-custom.yml", | ||
| "scenario_type": 3, | ||
| "target_machines": ["multi-type-machine"], | ||
| "estimated_runtime": 45.0, | ||
| "profile_overrides": { | ||
| "multi-type-machine": { | ||
| "sut": "multi-sut-high-cpu", | ||
| "load": "multi-load-high-throughput", | ||
| "db": "multi-db-memory-optimized" | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| **Result:** Uses specific custom profiles for each machine type | ||
|
|
||
| ### 3. Mixed Profile Usage | ||
|
|
||
| ```json | ||
| { | ||
| "name": "Mixed Profile Scenario", | ||
| "template": "mixed-profiles.yml", | ||
| "scenario_type": 2, | ||
| "target_machines": ["single-type-machine", "multi-type-machine"], | ||
| "profile_overrides": { | ||
| "multi-type-machine": { | ||
| "sut": "multi-sut-low-memory" | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| **Result:** | ||
|
|
||
| - `single-type-machine`: Uses default profile | ||
| - `multi-type-machine` SUT: Uses custom profile | ||
| - `multi-type-machine` LOAD: Uses default profile | ||
|
|
||
| ## Configuration Properties Explained | ||
|
|
||
| ### Machine Properties | ||
|
|
||
| | Property | Required | Description | | ||
| | -------------------- | -------- | ---------------------------------------------- | | ||
| | `name` | ✅ | Unique machine identifier | | ||
| | `capabilities` | ✅ | Dict of machine types this machine can fulfill | | ||
| | `preferred_partners` | ❌ | List of preferred machines for other roles | | ||
|
|
||
| ### Capability Properties | ||
|
|
||
| | Property | Required | Description | | ||
| | ----------------- | -------- | ------------------------------------------------------------------- | | ||
| | `machine_type` | ✅ | Key: "sut", "load", or "db" | | ||
| | `priority` | ✅ | 1=preferred, 2=secondary, 3=fallback | | ||
| | `profiles` | ✅ | List of available profile names | | ||
| | `default_profile` | ❌ | Which profile to use by default (defaults to first profile in list) | | ||
|
|
||
| ### Scenario Properties | ||
|
|
||
| | Property | Required | Description | | ||
| | ------------------- | -------- | ---------------------------------- | | ||
| | `name` | ✅ | Scenario identifier | | ||
| | `template` | ✅ | YAML template file | | ||
| | `scenario_type` | ✅ | 1=single, 2=dual, 3=triple machine | | ||
| | `target_machines` | ✅ | List of machines to run on | | ||
| | `estimated_runtime` | ❌ | Runtime in minutes | | ||
| | `description` | ❌ | Human-readable description | | ||
| | `profile_overrides` | ❌ | Custom profile overrides | |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These scenario examples and the "Scenario Properties" table document a scenario_type field, but the scheduler code reads the type from a type key in the JSON (and example_complete_features.json uses "type"). Using scenario_type as shown here will break loading; please align the docs with the implementation (or update the loader to accept scenario_type) so configuration authors can rely on the documented shape.
| | Property | Required | Description | | ||
| | -------------------- | -------- | ---------------------------------------------- | | ||
| | `name` | ✅ | Unique machine identifier | | ||
| | `capabilities` | ✅ | Dict of machine types this machine can fulfill | | ||
| | `preferred_partners` | ❌ | List of preferred machines for other roles | | ||
|
|
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The machine configuration docs list name, capabilities, and preferred_partners, but do not mention the new machine_group field that is now used by the scheduler for group-based compatibility (see Machine.machine_group in models.py and the updated build/benchmarks_ci*.json files). To make the new grouping behavior discoverable and configurable, please extend this table (and the surrounding text) to describe the machine_group field and how it interacts with enforce_machine_groups in metadata.
| # - Update this file with the result of the template generation | ||
| # - The file benchmarks*.json defines how each pipeline set of jobs is run in parallel | ||
| # - Update the associated benchmarks*.json file with machine and scenario updates | ||
| # - Install python and install the requirements for the crank-scheduler in benchmarks/scripts/crank-scheduler/requirements.txt |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instructions here reference benchmarks/scripts/crank-scheduler/requirements.txt, but in this repo the requirements file lives at scripts/crank-scheduler/requirements.txt (and the example command below already uses ./scripts/crank-scheduler/main.py). To prevent confusion when following these steps, consider updating this path (and any similarly generated headers in the CI YAML files) to match the actual directory layout.
| # - Install python and install the requirements for the crank-scheduler in benchmarks/scripts/crank-scheduler/requirements.txt | |
| # - Install python and install the requirements for the crank-scheduler in scripts/crank-scheduler/requirements.txt |
| def process_yaml_generation(args, partial_schedules: List[PartialSchedule], config: CombinedConfiguration) -> list: | ||
| """ | ||
| Unified flow for YAML generation (single or multi) | ||
|
|
||
| Returns: | ||
| bool: True if YAML files were generated, False otherwise |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring for process_yaml_generation states that the function returns a bool, but the implementation actually returns a list of dictionaries describing the generated YAML files. Please update the docstring (and/or add a return type annotation) to reflect the real return type so callers know what to expect.
| def process_yaml_generation(args, partial_schedules: List[PartialSchedule], config: CombinedConfiguration) -> list: | |
| """ | |
| Unified flow for YAML generation (single or multi) | |
| Returns: | |
| bool: True if YAML files were generated, False otherwise | |
| def process_yaml_generation(args, partial_schedules: List[PartialSchedule], config: CombinedConfiguration) -> List[dict]: | |
| """ | |
| Unified flow for YAML generation (single or multi) | |
| Returns: | |
| List[dict]: List of metadata dictionaries for each generated YAML file |
| schedule_times = ScheduleOperations.generate_schedule_times( | ||
| config, len(partial_schedules)) |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CLI overrides for --target-yamls and --schedule-offset are applied here after partial_schedules have already been computed in main.py, and generate_schedule_times uses the overridden target_yaml_count instead of the actual len(partial_schedules). This can lead to mismatches (e.g., some partial schedules never get a YAML file when target_yamls is reduced, or extra offset times are generated and then dropped when target_yamls is increased). To avoid silently skipping work, apply the overrides before splitting the schedule (or re-split after updating yaml_generation) so that both partial_schedules and schedule_times are derived from the same effective target_yaml_count.
| schedule_times = ScheduleOperations.generate_schedule_times( | |
| config, len(partial_schedules)) | |
| # Ensure the YAML generation config's target count matches the actual | |
| # number of partial schedules so that we don't silently drop or omit work. | |
| effective_count = len(partial_schedules) | |
| if config.metadata.yaml_generation is not None: | |
| config.metadata.yaml_generation.target_yaml_count = effective_count | |
| schedule_times = ScheduleOperations.generate_schedule_times( | |
| config, effective_count) |
| try: | ||
| partner_index = preferred_partners.index(machine.name) | ||
| score += 0.01 * (partner_index + 1) # 0.01, 0.02, 0.03, ... | ||
| except ValueError: |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| except ValueError: | |
| except ValueError: | |
| # Machine not found in preferred_partners; skip partner bias adjustment. |
In order to make simplify the work needed when updating the scenarios we run and to minimize the chance of error, this adds a python script to be used to generate a CI schedule from a single configuration file. Most of the recently added and updated pipeline flows already used this new flow, but this update does add an option for a machine_group to ensure machines only use other machines at similar perf levels for load and db machines.
Changes include the addition of the crank-scheduler, running the configurations through the scheduler one more time with the updated benchmarks.template.liquid, updating the benchmarks.template.liquid to include the new steps to run, and added the machine_group configuration option where applicable.