|
| 1 | +# WG Serving Charter |
| 2 | + |
| 3 | +This charter adheres to the conventions described in the [Kubernetes Charter README] and uses |
| 4 | +the Roles and Organization Management outlined in [wg-governance]. |
| 5 | + |
| 6 | +[Kubernetes Charter README]: /committee-steering/governance/README.md |
| 7 | + |
| 8 | +## Scope |
| 9 | + |
| 10 | +Discuss and enhance serving workloads on Kubernetes, specifically focusing on |
| 11 | +hardware-accelerated AI/ML inference. The working group will focus on the novel |
| 12 | +challenges of compute-intensive online inference. Scenarios solving use cases |
| 13 | +involving non-fungible accelerators will be prioritized over solutions against |
| 14 | +generic CPU. However, all improvements should, where possible, benefit other |
| 15 | +serving workloads like web services or stateful databases, be usable as |
| 16 | +primitives by multiple ecosystem projects, and compose well into the workflows |
| 17 | +of those deploying models to production. The Working Group Batch has a similar |
| 18 | +scope. The difference in scope by a simplified definition is that the Serving WG |
| 19 | +will generally concentrate on the workloads where Pods are running with |
| 20 | +restartPolicy=Always, while WG Batch will generally be looking at Pods with the |
| 21 | +restartPolicy=OnFailure. There are edge cases to this definition, but it creates |
| 22 | +an easy enough framework to differentiate the scope of these two Working Groups. |
| 23 | + |
| 24 | +### In scope |
| 25 | + |
| 26 | +- Gather requirements for serving workloads (inference primarily, but benefiting |
| 27 | + other non-batch use cases where possible) that have broad community alignment |
| 28 | + from practitioners, distros, and vendors. Provide concrete input to other SIGs |
| 29 | + and WGs around needs for identified requirements. Do it in partnership |
| 30 | + with existing ecosystem projects like kServe, Seldon, Kaito, and |
| 31 | + others to identify, extract, or implement common shared problems (like Kueue |
| 32 | + abstracted deferred scheduling for multiple batch frameworks). |
| 33 | +- Specific areas of improvement include: |
| 34 | + - Directly improve key kubernetes workload controllers when used with |
| 35 | + accelerators and the most common inference serving frameworks and model |
| 36 | + servers. |
| 37 | + - Explore new projects that improve orchestration, scaling, and load balancing |
| 38 | + of inference workloads and compose well with other workloads on Kubernetes |
| 39 | + - Being able to run serving workloads safely while giving up |
| 40 | + available slack capacity to batch frameworks |
| 41 | + |
| 42 | +### Out of scope |
| 43 | + |
| 44 | +- Training and batch inference, which are covered by WG Batch. |
| 45 | +- Ability to describe the workflows for serving workloads is out of scope, |
| 46 | + Kubernetes will offer building blocks to MLOps platforms to build those. |
| 47 | + |
| 48 | +## Stakeholders |
| 49 | + |
| 50 | +Stakeholders in this working group span multiple SIGs that own parts of the |
| 51 | +code in core kubernetes components and addons. |
| 52 | + |
| 53 | +- SIG Apps as a primary SIG |
| 54 | +- SIG Architecture |
| 55 | +- SIG Node |
| 56 | +- SIG Scheduling |
| 57 | +- SIG Autoscaling |
| 58 | +- SIG Network |
| 59 | +- SIG Instrumentation |
| 60 | +- SIG Storage |
| 61 | + |
| 62 | +## Deliverables |
| 63 | + |
| 64 | +The list of deliverables include the following high level features: |
| 65 | + |
| 66 | +- To SIG Apps: |
| 67 | + - Ability to express the model serving workloads with easy to understand logical |
| 68 | + objects with the ability to scale to multi-host |
| 69 | +- To SIG Scheduling and Autoscaling |
| 70 | + - Faster scaling up and down |
| 71 | + - Ability to preempt workloads |
| 72 | +- To SIG Node: |
| 73 | + - Runtime support for Pods preemption |
| 74 | + - Runtime support for devices partitioning |
| 75 | + |
| 76 | +## Roles and Organization Management |
| 77 | + |
| 78 | +This WG adheres to the Roles and Organization Management outlined in [wg-governance] |
| 79 | +and opts-in to updates and modifications to [wg-governance]. |
| 80 | + |
| 81 | +[wg-governance]: /committee-steering/governance/wg-governance.md |
| 82 | + |
| 83 | +Additionally, the WG commits to: |
| 84 | + |
| 85 | +- maintain a solid communication line between the Kubernetes groups and the wider CNCF community; |
| 86 | +- submit a proposal to the KubeCon/CloudNativeCon maintainers track; |
| 87 | + |
| 88 | +## Timelines and Disbanding |
| 89 | + |
| 90 | +As a first mandate, the WG will define a roadmap in the first quarter of operation. |
| 91 | +We believe there will be a set of features the Working Group can identify and deliver |
| 92 | +that will enable the majority of frameworks operate natively on Kubernetes. |
| 93 | + |
| 94 | +Achieving the aforementioned deliverables, also mentioned in the `In Scope` |
| 95 | +section, will allow us to decide when to disband this WG. There is no |
| 96 | +expectations that the Working Group will be converted into SIG long term, |
| 97 | +however, there is a chance that a separate project or a sizeable sub-component |
| 98 | +of SIG Apps can be created as a result of a Working Group. |
0 commit comments