KubeConEU24-Composable Systems in Kubernetes
KubeConEU24-Composable Systems in Kubernetes
Kubernetes
Michele Gazzetti
Software Engineer, IBM Research Europe
Agenda
3. Sunfish
5. Demo
Traditional Infrastructure
Kubernetes
1 1 1 ✅
Traditional Infrastructure
Kubernetes
1 1 3 ⏳
Traditional Infrastructure
Kubernetes
1 1 3 ⏳ 1 1 3 ✅
Traditional Infrastructure Composable Disaggregated Infrastructure (CDI)
Kubernetes Kubernetes
Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node
Resource Pool
Infrastructure
Benefits and Challenges of CDI
Benefits of CDI:
Kubernetes
• Compose high value resources to fit
workload requirements Worker Node Worker Node Worker Node
Composable
hardware
Resource Pool
Infrastructure
Benefits and Challenges of CDI
Benefits of CDI:
Kubernetes
• Compose high value resources to fit
workload requirements Worker Node Worker Node Worker Node
Composable
hardware
management complexity
Resource Pool
Infrastructure
CDI In Kubernetes
• Request resources via CRDs Worker Node Worker Node Worker Node
Composable
hardware
Resource Pool
Infrastructure
The Complexity Of Managing CDI Resources
Kubernetes
CDI Management
Composable
CDI Management
Composable CDI Management
Sunfish
Kubernetes
Resource
Sunfish Composability Layer abstraction and
selection
Sunfish
Kubernetes
Resource
Sunfish Composability Layer abstraction and
selection
Sunfish
https://github.com/IBM/composable-resource-operator
Composable Resource Operator
ComposabilityRequests
Kubernetes
Sunfish
Pending
kubectl apply nvidia-smi-pod.yaml
Nvidia-smi-pod
composable04
allocatable resources:
"cpu": …,
"memory": …,
"nvidia.com/gpu": ”0",
Demo
Completed
kubectl apply composabililtyrequest.yaml
Nvidia-smi-pod
composable04
allocatable resources:
"cpu": …,
"memory": …,
"nvidia.com/gpu": ”4",
Demo
Sunfish
CDI Management
Composability Request Creation
Sunfish
Translate request in
vendor specific APIs
CDI Management
Compose resources
Composability Request Creation
Sunfish
Translate request in
vendor specific APIs
CDI Management
Compose resources
Composability Request Creation
Sunfish
Translate request in
vendor specific APIs
CDI Management
Compose resources
Hard Lesson Learned On Resource Detachment
Sunfish
CDI Management
Detach resources
Hard Lesson Learned On Resource Detachment
Sunfish
❌
Error: GPU has fallen off
Translate the request in
vendor specific APIs the bus 🚌 ⤵
CDI Management
Detach resources
Composability Request Deletion
NFD worker
Sunfish
CDI Management
Composability Request Deletion
NFD worker
Sunfish
CDI Management
Composability Request Deletion
2. Evict
Composable Resource GPU pci-<device>.present: true K8s Node X
Operator
Composable Resource workloads
Operator allocatable: { gpu: 4 }
Delete CR Sunfish Client
unschedulable Device driver
NFD worker
Sunfish
CDI Management
Composability Request Deletion
2. Evict
Composable Resource GPU pci-<device>.present: true K8s Node X
Operator workloads
unschedulable
Delete CR Sunfish Client 3. Force driver
removal
NFD worker
Sunfish
CDI Management
Composability Request Deletion
2. Evict
Composable Resource GPU K8s Node X
Operator workloads
unschedulable
Delete CR Sunfish Client 3. Force driver
removal
4. Undo
Composition NFD worker
Request
Sunfish
CDI Management
Detach resources
Composability Request Deletion
2. Evict
Composable Resource GPU K8s Node X
Operator workloads
Sunfish
CDI Management
Detach resources
Summary And What’s Next
Links:
• Sunfish: https://www.openfabrics.org/openfabrics-management-framework/