0% found this document useful (0 votes)
49 views23 pages

Node Resource Interface

The document proposes a Node Resource Interface (NRI) specification to provide a standardized way for container runtimes like containerd to manage node, CPU, memory and other hardware resource allocation and topology. It outlines some issues with the current approaches and suggests NRI as a composable, extensible plugin-based solution similar to CNI for networking. An example konfine plugin is described that could provide dynamic topology management and quality of service capabilities as an initial NRI implementation.

Uploaded by

Davanum Srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views23 pages

Node Resource Interface

The document proposes a Node Resource Interface (NRI) specification to provide a standardized way for container runtimes like containerd to manage node, CPU, memory and other hardware resource allocation and topology. It outlines some issues with the current approaches and suggests NRI as a composable, extensible plugin-based solution similar to CNI for networking. An example konfine plugin is described that could provide dynamic topology management and quality of service capabilities as an initial NRI implementation.

Uploaded by

Davanum Srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Node Resource Interface

Extensible resource interface for containers

@crosbymichael - Apple
The Problem…
Resource Management
Cgroups and Topology

• Workload Performance Requirements

• Batch

• Latency Sensitive

• Customer Requirements

• SLA/SLO

• My workload is P1!
Resource Management
Cgroups and Topology

• CPU

• Schedule across cores

• Hyperthreads

• Numa

• Allocation of an entire node

• L3 Cache

• Hugepages
Resource Management
Cgroups and Topology

• DPDK

• VM Isolation

• Proximity

• GPU

• Network
Large Matrix
Current Solutions
Kubelet

• CPU Manager

• Few KEPs proposing improvements and extensions

• Weird UX

• Requests == limits OK!

• Off by default

• Topology Manager

• Hint providers
Current Solutions
Intel CPU Manager for Kube

• CMK cli

• Manages topology and pinning of resources

• CRI Resource Manager

• Implements CRI to create an interface for Kube

• Intel specific labels


QoS Is hard
Everyone has different
requirements
Focus on APIs not implementations!
Networking
Container Network Interface

• Simple

• Elegant

• Extensible

• Composable

• No controversy in the design that I know of


Let's make “CNI” for Resources
NRI
Because CRI was already taken :(

• Kubelet is not the right abstraction for this

• The lines between kubelet and CRI are getting too blurry

• Hook into the lifecycle of containers at the CRI level

• CRI implementations like containerd are robust and know how to interface
with the underlying host systems

• CRIs already support CNI for networking


NRI
Config

"version": "0.1",
"plugins": {
"konfine": {
"systemReserved": [0,1]
}
}
}
NRI
Skeleton

package main

import (

"context"
"fmt"
"os"

"github.com/containerd/containerd/pkg/nri/skel"
"github.com/sirupsen/logrus"
)

func main() {
ctx := context.Background()

if err := skel.Run(ctx, &konfine{}); err != nil {


fmt.Fprintf(os.Stderr, "%s", err)
os.Exit(1)
}

}
NRI
Integration in a CRI

if _, err := nri.Invoke(ctx, task, "create"); err != nil {


task.Delete(ctx, containerd.WithProcessKill)
container.Delete(ctx, containerd.WithSnapshotCleanup)
return errors.Wrap(err, "nri invoke")
}

defer func() {
if _, err := nri.Invoke(ctx, task, "delete"); err != nil {
fmt.Println(err)
}
}()

if err := task.Start(ctx); err != nil {


return err
}
konfine
konfine
Dynamic Topology and QoS Management Plugin
konfine

• Builds a dynamic node topology

• Dynamic placement of workloads based on QoS class

• NUMA Support

• Supports default (batch) and latency sensitive


konfine

• No need to wait for kube release cycle

• Can be updated independently

• Chain multiple plugins together

• Keeps this small and doing one thing and one thing well

• If it does not work for you

• Fork it, changed it, make it your own

• Build more plugins fo r your needs


Next Steps

• Formal Spec Proposal

• Demo plugins

• Containerd implementation for ctr and CRI


Thanks!

You might also like