Skip to content

Latest commit

 

History

History
122 lines (101 loc) · 11 KB

annual-report-2024.md

File metadata and controls

122 lines (101 loc) · 11 KB

2024 Annual Report: SIG Node

Current initiatives and Project Health

  1. What work did the SIG do this year that should be highlighted?

SIG-Node remains a structural piece of the Kubernetes community and the span of the work done in 2024 highlights that. As the community continues rallying behind AI use cases and identifying gaps with Kubernetes as a platform for LLM training and serving, SIG-Node made strides in multiple AI related areas. DRA structured parameters made it to beta, meaning more flexible scheduling and allocation of device resources is now possible. In 2025 there will be a lot of continued work on DRA, including enhancing drivers to be able to report device health and Kubernetes components be able to react to that, extending DRA to support advanced networking use cases, device taints and tolerations, and lots more! Outside of DRA, OCI image volume mounts have been added as alpha in 2024, allowing users to mount AI models into containers via a separate image (and one day artifact) instead of a model car or embedding it in the container image. Also, work like in-place pod resize and pod level resource limits will unlock use cases for power AI users: allowing more flexibility in pod resource limit calculation at both initialization and during runtime.

Plenty of work has been being done outside of AI as well! SIG-Node remains the top SIG in KEPs progressing, moving forward on 13, 16, and 17 KEPs between 1.30, 1.31, and 1.32 respectively. Lots of progress has been made in the CPU manager: like adding support for split uncore cache, adding a policy option for restricting resrevedSystemCPUS and a new static policy for optimizing CPU alignment. We have also worked on some long awaited linux technologies like user namespaces, swap, AppArmor, ephemeral storage quotas, recursive read only mounts, and better support for supplemental groups, as well as announced feature freeze on cgroupv1.

All of these features don't even begin to cover the amount of CI stabilization, bug fixes, and other work the SIG is doing. We remain a productive (albeit, occasionally overbooked) SIG. To help keep up with all of the work, we've inducted one new approver Sergey Kanzhelev, reinducted a formerly emertius approver Tim Allclair, welcomed a new SIG chair Peter Hunt, as well as began crafting a role to help KEP authors follow along the KEP process, currently called the KEP wranglers.

  1. Are there any areas and/or subprojects that your group needs help with (e.g. fewer than 2 active OWNERS)?

SIG-Node, in being so busy, always has a bottleneck of top level approvers. Any path in the kubelet could use more people who have expertise and confidence in reviewing. Please refer to our contributor ladder to see ways to grow in the SIG!

  1. Did you have community-wide updates in 2024 (e.g. KubeCon talks)?
  1. KEP work in 2024 (v1.30, v1.31, v1.32):

New in 2024:

  • cri-client Continuing:
  • ci-testing
  • cri-api
  • cri-tools
  • kernel-module-management
  • kubelet
  • node-api
  • node-feature-discovery
  • node-problem-detector
  • resource-management
  • security-profiles-operator

New in 2024:

  • Device Management
  • Serving Continuing:
  • Batch
  • Policy
  • Structured Logging

Operational

Operational tasks in sig-governance.md:

  • README.md reviewed for accuracy and updated if needed
  • CONTRIBUTING.md reviewed for accuracy and updated if needed
  • Other contributing docs (e.g. in devel dir or contributor guide) reviewed for accuracy and updated if needed
  • Subprojects list and linked OWNERS files in sigs.yaml reviewed for accuracy and updated if needed
  • SIG leaders (chairs, tech leads, and subproject leads) in sigs.yaml are accurate and active, and updated if needed
  • Meeting notes and recordings for 2024 are linked from README.md and updated/uploaded if needed