SIG Node Meeting Notes

Dec 27th [Canceled for holidays]

Dec 20th [Canceled for holidays]

Dec 13th, 2022

Recording: https://www.youtube.com/watch?v=iw_xZZPuXDI

Total: 196 (-22, yay!)

Incoming		Completed
Created:	32	Closed:	37
Updated:	161	Merged:	22

[mrunal/sergey/ruiwen] 1.26 retro/1.27 planning
- 1.26 retro, with tracked KEPs and finished KEPs
- 1.27 planning with initial KEP candidates:
[everpeace] KEP-3169: Fine-grained SupplementalGroups control
- NOTE: I’m sorry that I can’t attend to the regular community meeting due to timezone gap (3am in my timezone(Tokyo)). I put this agenda to help 1.27 planning.
- This KEP can resolve very unfamiliar behavior of SupplementalGroups field described in k/k#112879, which keeps group membership defined in the container image. I believe many (probably most) cluster admins don’t know the behavior. Moreover, when a cluster uses hostPath volumes, the unfamiliar behavior could cause security concerns even when cluster admins enforce some policy engines in the cluster.
[swsehgal] Topology Manager GA graduation: happy to volunteer to drive this work in 1.27 (~~if we want to move ahead with it?~~)
- Lack of Multi NUMA systems in CI is the key blocker
  - [fromani] this is also relevant for memory manager
- Currently e2e tests that require multi NUMA are skipped
- kubernetes/test-infra#28211
  - Inputs/suggestions on how this can be potentially handled welcome on the issue
  - Slack discussion with test-infra group: here (potential use of equinix nodes is being discussed here)
[SergeyKanzhelev] kubernetes/kubernetes#114394 CRI API version skew policies. See slides from contributors summit for extra details
[vinaykul] InPlace Pod Vertical Scaling PR - status update
- vinaykul not joining this meeting (on vacation in India)
- Please review and merge KEP milestone update PR
- PR 102884 approved by Derek.
  - @bobbypage fixed containerd/main E2E pull test job, we now have full E2E coverage
  - I recommend that we merge API changes PR 111946 at the earliest possible point in 1.27 and watch to see nothing bad happens.
  - And then merge PR 102884 shortly after (< 1 week) and re-add periodic CI test jobs.
  - Does the 1st week of Jan 2023 look realistic for the above proposed PRs merge plan, assuming the above plan sounds good?
[SergeyKanzhelev] Reconcile SIG Node teams and OWNERs files: kubernetes/org#3893
[mweston & atanas] Quick update on issue kubernetes/enhancements#3675
- recording: https://www.youtube.com/watch?v=ai_d3qXr8xg

Dec 6, 2022

Recording: https://www.youtube.com/watch?v=t3PcHj62f0c

Total: 218

Incoming		Completed
Created:	11	Closed:	7
Updated:	63	Merged:	2

[mweston & atanas] Ready Dec 6th: filed issue on kubelet plugin model here: kubernetes/enhancements#3675
- starting KEP and looking for interested parties, discussion partially based around dynamic resource allocation and thoughts on how to incorporate it.
[pacoxu] small improvement on memoryThrottlingFactor proposal(I listed 3 problems here in the link), but a behavior change. memory.high = memory.request + (memory.limit - memory.request) * memoryThrottlingFactor. Also, defaulting to 0.8 may result to performance issue for pods that may always use 80%+ memory of the limit like Java applications. Probably we need a pod level setting for it like softrequest/throttlingLimit besides limits and requests.
- Collect feedback in the doc and then open a KEP update with alternatives (discussion: https://youtu.be/t3PcHj62f0c?t=686)
[msau] (joining at 10:30) Looking for maintainers from multiple sigs to participate in a discussion/roundtable with Data on Kubernetes (stateful end users). If you’re interested, add your name to the list. First roundtable is going to be sometime in January.
[claudiubelu] Proposed changes to how kubelet detects updates for registered plugins b/c current implementation doesn’t work on Windows due to timestamp granularity issues kubernetes/kubernetes#114136
- https://testgrid.k8s.io/sig-windows-signal#windows-unit-master test failure caused by timestamp
[SergeyKanzhelev] Sidecar WG: we are getting to the conclusion. Will send summary soon. Find information here: https://docs.google.com/document/d/1E1guvFJ5KBQIGcjCrQqFywU9_cBQHRtHvjuqcVbCXvU/edit#
[SergeyKanzhelev] No perma betas:
- AppArmor beta since 1.4 (owner: @tallclair)
  - kubernetes/enhancements#3298
- QOSReserved alpha since 1.11 (owner: @sjenning)
  - Mrunal or Ryan will take a look
- RotateKubeletServerCertificate beta since 1.12 (owner: @mikedanese)
  - Sergey to ping Mike
- CustomCPUCFSQuotaPeriod alpha since 1.12 (owner: @szuecs)
  - Mrunal to take a look
- KubeletPodResources beta since 1.15 (owner: @dashpole)
  - [@fromanirh] I volunteer to help graduating this in GA in 1.27 - I'll add this to the 1.27 planning document when we start it
- TopologyManager beta since 1.18 (owner: @lmdaly)
  - Graduate before out of process plugins was the past decision
  - [Dawn] Let’s put together a one pager explaining the roadmap short and longer term.
  - [Swati] Device and CPU manager are graduated. Maybe let’s be consistent
- DownwardAPIHugePages beta since 1.21 (owner: @derekwaynecarr, )
- ProbeTerminationGracePeriod beta since 1.22 (owner: )

Nov 29, 2022

[aditi] Expose pod cgroup path: kubernetes/kubernetes#113342

[sig-node] Concerns about exposing cgroup information at pod api status level. There could be races depending on how the path will be used. Better to focus down the issue to the interaction between the runtime and CNI plugin at pod bring up time. Peter(CRI-O), David Porter(bobbypage)/mikebrow(containerd) and Mike Zappa(CNI @MikeZappa87 ) to figure out details of approach across runtimes.
[everperace] KEP-3169: Fine-grained SupplementalGroups control
- NOTE: I’m sorry that I can’t attend to the regular community meeting due to timezone gap (3am in my timezone(Tokyo)). I put this agenda to gain more visibility of my KEP in the sig-node community.
- This KEP can resolve very unfamiliar behavior of SupplementalGroups field described in k/k#112879. I believe many (probably most) cluster admins don’t know the behavior. Moreover, when a cluster uses hostPath volumes, the unfamiliar behavior could cause security concerns even when cluster admins enforce PSPs(or some policy engines) in the cluster. So, I would like to implement the KEP hopefully in v1.27.
- I would very appreciate it if somebody help reviewing my KEP.
- This KEP includes modification of CRI. So, we probably need to update CRI implementation first, at least most popular ones(containerd and cri-o are enough??). I’m not familiar how to do this. I recognize we can’t apply feature gate on CRI and its implementations. I also appreciate it if some contributors advise it to me.
[klueska] Update to KEP to reflect actual implementation that was merged
- Needs approval by Dawn or Derek
- kubernetes/enhancements#3663
[swsehgal] Need Derek’s architecture approval on kubernetes/kubernetes#110252. API updates are proposed in a separate PR (ready for review as well).
[bobbypage] Update/thoughts on CRI healthz: kubernetes/kubernetes#109653
[vinaykul] InPlace Pod Vertical Scaling PR - status update
- vinaykul may not join due to conflicting appointment
- PR 102884 approved, missed 1.26, targeting for 1.27
- Please review and merge KEP milestone update PR
- Please review test-infra inplace resize test pull job PR

Nov 22, 2022

No meeting due to the Thanksgiving holiday in USA.

Nov 15, 2022

[rata, giuseppe] Userns support
- For stateful pods, shall we create a new KEP or change the scope of the existing?
- Will join sig-storage to start the conversation with them about stateful pods too
- [Derek] - Separate KEP and Feature gate recommended
- [Sergey] - Should we GA the existing support?
- [Derek/Mrunal] Yes, we should move it to beta.
- [Rodrigo] Concerns around validation if we introduce another Feature Flag.
- [Rodrigo] Id mapped mounts could solve issues around right permissions for files such as ssh keys.
- [Derek] Can we key off the kernel version to figure out if we have id mapped mounts? Any way to implement a fallback?
[klueska] Dynamic Resource Allocation (DRA) update
- Merged on Friday (after an extension request) as an alpha feature for 1.26
- New staging repo created for k8s.io/dynamic-resource-allocation with helper libraries to build resource drivers against the DRA API
- Outstanding request to create dra-example-driver repo
- Request to “associate” DRA with an official sig-node subproject
  - Should we reuse an existing subproject or create a new one?
  - My vote is for a new one (but what to call it?)
[Sergey] sidecar WG: https://docs.google.com/document/d/1E1guvFJ5KBQIGcjCrQqFywU9_cBQHRtHvjuqcVbCXvU/edit#

Nov 8, 2022

Recording: https://www.youtube.com/watch?v=mnZWYAuOJ90

[bobbypage/eric lin] kubernetes/kubernetes#109653
[pacoxu] kubelet: make registry qps/Burst to limit parallerel pulling count #112242
- after rethinking, the current qps/burst of image pull makes no sense to users. And the current PR tries to make it limit the image in pulling process at the same time. The flag and the meaning will not match then. So I suggest to just deprecate and then remove the current registryPullQPS and registryBurst flags. Meanwhile if this is a concern, we should provide a new flag like parallel-image-pull-limit as a new feature. (#112044 the issue) At least we should add more explanation for the flag.(registryPullQPS: limit registry pull QPS to this value. qps is request per second.)
- [ruiwen-zhao] +1 on adding a node-level limit of parallel pulls. I can help with this effort.
- [paco] containerd/containerd#7313 I am working on a PullRequest in containerd to add some image pull related metrics. One of them is the processing count of image pulling
- [mikebrow] needs more declarative hints in the pod/container spec, and more resource information image manager will not know about other activities… declarative info: qos/cache policy/confidential meta/lazy snapshots vs pull all/does the container runtime optimize for common layers/… as mrunalp says, it’s not just about the image it’s the connection cost/manifests/layers and soon artifacts
[vinaykul] InPlace Pod Vertical Scaling PR - status update
- Fixed nits and updated code to catch up after rebase.
- Updated E2E test to run full-spectrum for containerd>=1.6.9. Tested in a local cluster.
- Investigating failures with newly added cgroupv1 , cgroupv2 for in-place resize CI job with containerd-main.
- Requested 4 day exception to investigate/fix issues from rebase and CI job failure.
- IMHO, it may be safer to merge this early 1.27 rather than late 1.26
[iancoolidge] cpuset to kubernetes/utils
- [time permitting]
- kubernetes/kubernetes#113744
- minor controversies: NoSort/Sort, Int64 vs int
- plan: merge all changes here, then copy into k/utils, then revendor in k/k
~~[klueska] Need approval from sig-node-leads for feature gate addition in following PR~~
- ~~kubernetes/kubernetes#112914~~
- ~~I’ve already LGTM’d and APPROVED the kubelet changes, it just needs the feature gate approval now (@liggitt already confirmed to do the API approval)~~
- ~~Assigning ~~~~since he did the KEP approval~~
~~[klueska] Need sig-node-leads approval for creation of dynamic-resource-allocation staging repo~~
- ~~kubernetes/org#3821~~
- ~~Assigning ~~
- ~~Relevant slack conversation: https://kubernetes.slack.com/archives/C01672LSZL0/p1667751787905519~~
~~[MaRosset] - Windows hostnetwork alpha #112961~~
- ~~kubernetes/kubernetes#112961 ~~

Nov 1, 2022

~~[klueska] Need approval for minor update to KEP~~
- ~~KEP-3063: dynamic resource allocation: bump latest-milestone to 1.26~~
[dashpole] Kubelet context plumbing
- kubernetes/kubernetes#113408
[vinaykul] InPlace Pod Vertical Scaling PR - status update
- Fixed scheduler-focussed E2E test flakiness
- Cgroupv2 support and tests in review, issues fixed. Mrunal PTAL.
- Ruiwen and I have PRs for running pod resize E2E tests with containerd-main.
  - kubernetes/test-infra#27816
  - kubernetes/test-infra#27854
  - One or both of these should give us E2E coverage
  - I’m working on updating E2E test to run full-mode if containerd >=1.6.9
Alexander shared notes from kubecon contributor summit -

Oct 25, 2022 [Cancelled for KubeCon]

Oct 18, 2022

Total active pull requests: 213 (+8 from last week)

Incoming		Completed
Created:	24	Closed:	7
Updated:	66	Merged:	9

[SergeyKanzhelev] Canceling next week for KubeCon?
[Sergey, Swati] CI group report https://docs.google.com/document/d/1vfqqFtN4Ke2JtB9O4wjoKvMChW2Ptizmsom1_gRGauU/edit#heading=h.eawmmxfxo8vq
[Sergey] Sidecar WG proposed times: https://doodle.com/meeting/participate/id/bkZyZgJa
[vinaykul] InPlace Pod Vertical Scaling PR - status update
- Please review KubeCon slides 11-16, if possible
- Cgroupv2 support changes are in review, issues fixed. Mrunal PTAL.
- Awaiting containerd release in order to enable full-E2E tests.
- Mothership PR 102884 can merge once we have the 1.6.9 containerd release, the CI picks it up, E2E tests are fully enabled (validates PodStatus for resize), and cgroupv2 review issues have been addressed.
- API changes PR 111946 also on hold for containerd.

Oct 11, 2022

Total active pull requests: 205 (-3 from last week)

Incoming		Completed
Created:	17	Closed:	11
Updated:	69	Merged:	8

[matthyx] request a WG creation to work on sidecar containers
- A summary from Sergey:
- [Dawn] define exit criteria and a way to report back status to this SIG. Also define the term for WG.
- [Sergey] wait for the Doodle to decide scheduling.
[vinaykul] InPlace Pod Vertical Scaling PR - status update
- Please review KubeCon slides 11-16, if possible
- Cgroupv2 support changes are in review, issues fixed. Mrunal PTAL.
- Awaiting containerd release in order to enable full-E2E tests.
- Mothership PR 102884 can merge once we have the next containerd release (1.7 per Ruiwen - sorry I accidentally deleted Ruiwen’s comment ), the CI picks it up, E2E tests are fully enabled (validates PodStatus for resize), and cgroupv2 review issues have been addressed.
- API changes PR 111946 also on hold for containerd.
[mimowo]: Heads up for "Standardization of the OOM kill communication between container runtime and kubelet" (kubernetes/kubernetes#112910)
- first - standardization of what we have
- second - add more information - whether it is because exceeding the limits or memory pressure on the node. This is more involving.
- [Dawn] user space oom killer in cgroupv2 will also introduce more standardization in this space
- [Dawn] thought it is already aligned, how much has it diverged?
- [Sergey] is it required for the KEP?
  - [Michael] no, but may break in future
- [Sergey] How easy to troubleshoot that it was indeed the OOM kill when people start relying on job retries based on OOM kills.
  - [Michael] Feature: customer can define policies for jobs depending on pod end state. Today pod conditions are used to understand the pod end state. Pod condition will be “resource exhausted”.
[Dawn] there will be cases when kubelet just cannot tell that something was oom killed. But it is still good to have everything unified.
[David Porter] how practically will it be standardized? It is just a string. Are there any conformance tests or something?
[Lantao] logging format is the same way
[David] Container running multiple processes when subprocesses were OOM killed, container runtime may not detect this.
[Lantao] cgroupv1 behavior is different, is it? IIRC, when there is a cgroup OOM, a random process in the cgroup will be killed. In that case, OOMKilled is still set, even if pid 1 is running happily. ([David] Let’s confirm)
[Dawn] this was one of the first issues that was fixed.
[SergeyKanzhelev] containerd 1.6 is going LTS containerd/containerd#7454
- [Lantao] 1.7 introduces major changes like sandbox API, image pull progress etc.
  - [ruiwen] List of PRs in containerd 1.7: https://github.com/containerd/containerd/milestone/42
- [Lantao] 2.0 will remove many deprecated APIs and configs
  - kubernetes/kubernetes#110312
  - https://github.com/containerd/containerd/blob/main/RELEASES.md
- [Mark] Windows brings many features to 1.7, Will we end up not supporting 1.6 by K8s

Oct 4, 2022

Total active pull requests: 208 (-22 since last week)

Incoming		Completed
Created:	11	Closed:	21
Updated:	89	Merged:	12

[ruiwen-zhao] [KEP planning] 1.26 KEPs are tracked in: https://github.com/orgs/kubernetes/projects/98/views/1?filterQuery=sig%3Asig-node
- If you are planning to work on any KEP in 1.26, please make sure it is tracked on the board above. To be tracked, the KEP needs to be in the v1.26 milestone and have a lead-opted-in label.
[SergeyKanzhelev] Sidecar containers: https://docs.google.com/document/d/10xqIf19HPPUa58GXtKGO7D1LGX3t4wfmqiJUN4PJfSA/edit#heading=h.a3qp744ia2p6
~~[swsehgal] Device Manager graduation to GA~~
- ~~kubernetes/enhancements#3573~~
- ~~Please add milestone and lead-opted-in labels so it can be tracked for 1.26~~
- Update: Labels Added (Thanks Mark)
[ddebroy] Q around runtimeClass fields
- Is introducing capabilities of a container runtime handler (that Kubelet can use for contexts other than CRI) as fields in RuntimeClass allowed?
- Previously discussed in the context of KEP kubernetes/enhancements#2893 and it was recommended that fields from runtimeClass should not be passed to CRI.
[Alexey Fomenko] image pull improvements - design may take more time - feedback from Derek.
- [Mrunal] need more details on CRI APIs, etc.
- [Dawn] still can do progress on this
- [Alexey] wasn’t sure the process - KEP process doc says that the first draft will be merged fast and then iterated on.

Sep 27, 2022

Total active pull requests: 230 (+3 since last week)

Incoming		Completed
Created:	22	Closed:	12
Updated:	76	Merged:	9

[ruiwen-zhao] Just a quick reminder on 1.26 planning: Please update the status on 1.26 planning, or add to it if you are planning to work on something not tracked there. KEP freeze is 18:00 PDT Thursday 6th October 2022
[alexey fomenko]: CRI Image Pulling Progress and Notifications: https://hackmd.io/nyLLTtAkTgOuYwxmnu0sIQ
- [paco]: recently, I see some image pulling related issues. 1. image pull time is including waiting time due to default serialize image pulling behavior. 2. no image pulling related metrics in kubelet(pr) or containerd(pr). 3. kubelet registry qps/burst is not working as expected. The current registry qps is like start n image pulling in a second. The user want parallerel image pulling number is qps. BTW, enabling parallerel image pulling will solve some problems that are caused by image pulling stucking. We may discuss this as a whole.
- [lantaol]:
  - We hit this issue as well with serial image pull. A bad container image can block all other pods from coming up forever.
  - However, with parallel image pull, there is no good way to control the concurrency. The QPS is not the best way to solve this problem, because each image pull request can take a long time, just controlling the query per second is not sufficient.
  - This is worse with containerd, which doesn’t have an overall image pull timeout, or a progress based timeout like dockershim.
- [Derek] Do we want image pull status on Pod as well?
  - [Alexey] yes, we want but not looked into details yet
  - [Derek] qps of progress reports may be a concern for a lot of updates
  - [Alexey] maybe just key points like 25%, 50%
  - [Derek] still a lot of information on happy path. Must be careful with it, only needed for debugging
- [Derek] is serial pull policy still being used? It only exists for very old runtimes
  - [lantaol] parallel pull may have qps issues
  - [Derek] we have up to n images in parallel. Exactly for this reason. There is no reason to not switch to parallel
- [Wenjun] many customers wants a image pulling status.
  - [Derek] maybe metrics instead or something that will help minimize the traffic
- [Lantaol] What about image pull timeouts? Overall timeout is impossible to set. So we had pull timeout. Is it exist in Containerd?
  - [Ruiwen] Containerd pull timeout will be in 1.7: containerd/containerd#6150
  - [Sergey] Do we need it configured from kubelet or runtime?
- [Alexey] another option is to issue an ETA as an update instead of the progress.
- [Derek] How it will work with “lazy image pulling” like GKE image streaming?
  - [Mrunal] yes, it likely needs to be accounted for.
  - [Dawn] should work well with it.
- [lantaol] Checked with mrunal offline that, for cri-o the “concurrent pulled image layers” configuration is in cri-o.
  - Containerd only has MaxConcurrentDownloads which limits the concurrent downloads for each image.
  - To do this at the containerd level, we will need the CRI plugin to implement a cap at the daemon level.
  - Or we can consider implementing this at kubelet level to extend serial-image-pull to max-concurrent-image-pull.
- [mikebrow]nod to lantao’s comments.. we have a need to parallelize pulls, a need to identify resource contention when to many parallel pulls (layers/manifest checks/… soon artifacts), a need to handle slow/no progress due to registry contention/access issues… ** Because there is a large cost to getting the resource contention proximity and responding from the back seat if you will, we will probably be better off passing prioritization information/policies from the kubelet side down to the code (in the container runtimes) that is performing the resolve and pulling of layers..
- Summary: let’s proceed with CRI part of it
[fangyuchen86]: Kubelet Support Custom Prober Protocol
- [Derek] Where is the probe run?
- [fangyuchen86] On the node
- [Derek] who is charged with the execution of these probes? They are not running inside the cgroup, who is reserving compute and memory for these probes.
- [fangyuchen86] Controller will allocate this - it will be a custom pod in VPC network on the same VPC as a user workload. kubelet cannot access the Pod’s network. It can start it, attach storage, but not the network.
- [Dawn] I understand the requirement. But maybe that third party controller can take even more responsibility and actually do the pod management on cluster level. It may be easier. Introducing a custom prober to the node has some security concerns too.
- [Wenjun] What prevents to create a proxy?
  - [fangyuchen86] security does not allow this
  - [Derek] this is the most interesting question here. Naively we believe that kubelet is an admin for all workload. And why the networking is taken away from the containers. Maybe we can make a session about these requirements?
  - [Dawn] In some environments, the workload cannot access the kubelet network. This is where gVisor helps for example.
  - [Derek] there were nobody before pushing back on probes.
- [fangyuchen86] we also have a problem of other protocols that needs to be covered.
  - [Derek] this is separate requirement that might be solved differently.
- Summary: Let’s create a doc that explains the requirements and scenarios
[klueska] Small, self-contained TopologyManager update planned for this release:
- kubernetes/enhancements#3545
- Already added to
- Please add the **/label lead-opted-in to the issue so it can be tracked
- [Derek] this is done
sig-node meeting recordings: uploading recent ones.
- [Derek] will do
- [Dawn] might help with it
[Sergey] per container restartPolicy override: https://docs.google.com/document/d/1gX_SOZXCNMcIe9d8CkekiT0GU9htH_my8OB7HSm3fM8/edit
- [Derek] Alternative is “BindsTo” semantics that can be used for termination of sidecar containers. Maybe kubelet can delegate it to OS as well like systemd.
- [Mrunal] BindsTo needs to be experimented. But delegating to OS is also appealing
- [MikeB] Question is how much we can delegate.
  
  summary: read the doc and compare with BindsTo.
[Sergey] more sidecars: kubernetes/kubernetes#111356 (comment)
- [Dawn] restartPolicy and QoS (including OOM score) were per container initially. But then after debate it was changed per pod and convinced community. There were even conversations of scheduling container into the Pod. This is why sidecar feel so unnatural in k8s. Once opening this can of worm, we are starting to do Pod v2.
- [Mrunal] this OOM adj calculation may be challenging.
- [Derek] I’d rather move to OOMd than change the present state.
summary: likely not (closed the issue).
[marquiz] update on QoS-class resources KEP
[pacoxu] If ResourceQuota for cpu/memory is set, no best effort pod can be created. For other resources like ephemeral-storage, best effort pod can be created.
- kubernetes/kubernetes#112310 (comment) Should we document it? Or should we fix it as a bug(Currently, only some comments in code)?

Sep 20, 2022

[ruiwen/mrunal] 1.26 planning
[marquiz] QoS-class resources KEP (renamed)
- Status update (slides
[vinaykul] InPlace Pod Vertical Scaling PR - status update
- Fabian has a PR adding Windows support for in-place resize. Thanks Fabian!
- JaffWan fixed my missing unit tests and typo for cgroupv2 :) Thanks Jaixin!
- Jaixin also found the root-cause for issue 112264 and will work on a fix. This should significantly speed up E2E tests.
- I tried out in-place resize with CRI-O (in local cluster) and it works!
- Tested resize E2E tests using Ruiwen’s containerd support
  - It works but PodStatus.Resources update takes ~60s. Issue 112264
  - UpdateContainerResources (containerd) applies the resize in < 50 ms
- API changes PR 111946 ready for review & preferably early-merge.
- Cgroupv2 support changes are in review.
- Mothership PR 102884 can merge once we have the next containerd release (1.6.9?), the CI picks it up, E2E tests are fully enabled (validates PodStatus for resize), and cgroupv2 review issues have been addressed.
[mimowo] Promote KEP-3329 "Retriable and non-retriable Pod failures for Jobs" for Beta
[Sergey] SIG ongoing things update:

Total active pull requests: 227 (+35 since June)

Incoming		Completed
Created:	32	Closed:	10
Updated:	87	Merged:	15

Bugs untriaged: 13 https://github.com/orgs/kubernetes/projects/59

PRs untriaged: 99 https://github.com/orgs/kubernetes/projects/49

CI group meetings are back and we will be triaging issues and getting the tests back on track.

Sep 13, 2022

[mweston/atanas] KEP involving Resource Control Plugin
- https://docs.google.com/presentation/d/1uIHF_t97WZzIJgvyD75PYu46PRPi62i2/
[jstur/MaRosset] Windows cri pod sandbox fields: kubernetes/enhancements#3439
- Decided to keep them separate.
[MaRosset] - Host/Node network support for Windows Pods
kubernetes/enhancements#3507
- +1 from sig-node to proceed
[marquiz] QoS-class resources KEP (renamed) update
[vinaykul] InPlace Pod Vertical Scaling PR - status update
- vinaykul not in the SIG-node meeting today due to conflict.
- Tested resize E2E tests using Ruiwen’s containerd support
  - It works but PodStatus.Resources update takes ~60s. Issue 112264
  - UpdateContainerResources (containerd) applies the resize in < 50 ms
- API changes PR 111946 ready for review & preferably early-merge.
- Cgroupv2 support changes are in review.
- Mothership PR 102884 can merge once we have the next containerd release (1.6.9?), the CI picks it up, E2E tests are fully enabled (validates PodStatus for resize), and cgroupv2 review issues have been addressed.

Sep 6, 2022

[ruiwen] 1.25 retro
[danielye] CRI Stats Performance Update
[qiutongs] Issue awareness: unexpected initial delay of probes
- kubernetes/kubernetes#96614 (comment)
- ``initialDelaySeconds doesn’t work as the API spec says.
  - first probe time = container start time + initialDelaySeconds
  - kubelet restart: wait reasonable amount of time; still respect initialDelaySeconds differences for probes in the same container?
- Jitter is needed in the case of kubelet restart. Avoid thundering herd problems.
  - The jitters given to the probes in the same container are different.
    - kubernetes/kubernetes#102064
  - Since 1.21, the jitter is only added when kubelet recently started/restarted.
    - If the periodSeconds are the same for all probes in a container, the probes will be invoked at the same time. initialDelaySecondsmakes no difference.
[adrianreber] Checkpoint/Restore next steps
- main focus how to secure checkpoint
- a checkpoint contains all memory pages (maybe secrets, random numbers)
- possible suggestions how to secure checkpoint archives
  - add additional authorization to kubelet API checkpoint endpoint
  - run kubelet API checkpoint endpoint on another port
  - encrypt checkpoint archives (https://github.com/containers/ocicrypt)
[vinaykul] InPlace Pod Vertical Scaling PR status update
- Attempted “full scope” E2E tests with Ruiwen’s containerd support
  - Tested this by switching containerd binaries on GKE worker with those I built from master latest.
  - All tests pass and verify that Ruiwen’s code works correctly.
  - The tests took considerably longer (869s vs 2988s to run all 34 E2E tests with GKE 1master-1worker cluster)
    - The issue is NOT in containerd.
    - ContainerStatus() CRI called within 50 millisec of UpdateContainerResources() shows updated cgroup values.
    - The long delay is in updating apiPodStatus. This needs further investigation.
- cgroup v2 support for in-place resize - review is in progress.
- Is there any interest in merging API changes ( PR 111946 ) early?
  - If yes, please add an ok-to-test label.

Aug 30, 2022

[klueska, pohly] Update on Dynamic Resource Allocation
- KEP accepted for 1.25
- Delayed implementation to 1.26
  (Mostly functional Draft PR)
- Demo with NVIDIA GPUs
[dgl] Status of ProcMountType feature gate
- This has been alpha since 1.12, I’m interested in potentially progressing it
[pehunt] inheritable capabilities regression follow up
- kubernetes/kubernetes#111196
- Outcome: each CRI should consider whether the capabilities should be added, and optionally be able to configure them.
  - CRI-O follow up cri-o/cri-o#6236
[mgroot]
- Revisiting allowing node labels to be referenced via the downward API
  - kubernetes/kubernetes#40610
  - Closed PR: kubernetes/kubernetes#25957 (comment)
  - node name and scheduleability are allowed, prior decision from 2016 could be revisited after 6 years
[marquiz] QoS-class resources KEP (renamed)
- request for comments
- also for post in k8s developers’ blog (PR)
[vinaykul] InPlace Pod Vertical Scaling PR status update
- CRI (containerd) support has been merged. Thanks Ruiwen!
  - Next step: containerd release. K8s pickup new containerd version.
- PR 111946 (API changes for in-place resize from 102884) needs ok-to-test
- cgroup v2 support for in-place resize awaiting review.
- Marion Lobur from the GKE team (Warsaw) is joining this effort.
  - His use case can help get significant test coverage in alpha.
[qbarrand]
- Code push complete - development ongoing
- KMM admin PR - needs attention from one of the chairs
- Sponsorship for an additional KMM contributor to become member of the kubernetes org

Aug 23, 2022

[rphillips] Ephemeral Storage by Disk/Mount Point [Issue#111965]
[vinaykul] InPlace Pod Vertical Scaling PR status update
- KEP PR updating milestone to v1.26-alpha has been merged. Thanks Dawn.
- Created PR 111946 - API changes for in-place resize. (already reviewed in PR 102884 and stable for quite a while)
  - Can we merge 111946 early in 1.26?
  - The mothership 102884 PR can bring in kubelet & scheduler changes when Ruiwen’s CRI support code has been merged and picked up.
- Mrunal, LiuBo, Can you please review cgroup v2 support for in-place resize?
  - Mike, Peter: Should we eventually delegate this to CRI?
[dawnchen] SIG Node CI testgrid
- https://testgrid.k8s.io/sig-node-kubelet#kubelet-gce-e2e-swap-fedora failed all the time.
- https://docs.google.com/spreadsheets/d/1IwONkeXSc2SG_EQMYGRSkfiSWNk8yWLpVhPm-LOTbGM/edit#gid=1187923038
- [Brian McQueen (github: xmcqueen)] two linked ticket I am currently driving: kubernetes/kubernetes#109295 and related PR https://github.com/kubernetes/test-infra/pull/27051

Aug 16, 2022

[bobbypage/pehunt/Daniel] CRI stats Prometheus endpoint and adding cAdvisor metrics to CRI
- KEP as it stands proposes to have CRI impl emit the prometheus metrics. New proposal is to enhance CRI API to have CRI impl give Kubelet the metrics, then have Kubelet emit them.
  - Same concerns with performance, but possibly it won’t be as bad as we worry about
- Plan is to have Daniel/David/Peter setup proof of concept with Kubelet/containerd fork to see if it regresses in performance.
- Aim to have a POC for 1.26 KEP time to be able to decide how to go forward in 1.26 cycle.
[ndixita] Looking for a reviewer for External Credential Provider GA PR: kubernetes/kubernetes#111495
[vinaykul] InPlace Pod Vertical Scaling PR status update
- vinaykul not in Node meeting today due to conflict
- Please review KEP PR that updates milestones for this feature
- I will spawn a separate PR for API changes later this week.
[pehunt] How best to request reviewers to look at a PR not attached to a release cycle
- shameless plug for kubernetes/kubernetes#108855
- Recommendations:
  - PRs appear on triage board: https://github.com/orgs/kubernetes/projects/49
  - Poke reviewers on slack (sig-node, pr-reviews)
  - Propose review trades

Aug 09, 2022

- any reason “it was a mistake that user must make an explicit request/limit when ResourceQuota has CPU/Memory setting."?
[vinaykul] InPlace Pod Vertical Scaling PR status update
- Extracted and merged CRI changes in PR 111645
  - Many thanks to Mrunal, Peter, Mike, Ruiwen & Mark for quick reviews!!
  - Huge thanks to wangchen615 for pushing hard on the scheduler code!
  - This now unblocks runtime implementing support for in-place update.
- If there are no objections, can I squash all the discrete kubelet commits till now into a single commit (easier to rebase), and then add cgroupv2 support?
- Can we target an early 1.26 merge of API code? (saves me rebase headache)
[qbarrand] kubernetes-sigs membership for the KMMO contributors PR
- feel free to ping @endocrimes
[pehunt] repercussions of dropping inheritable capabilities - moby/moby#43420

Aug 02, 2022

Reminder of the coming code freeze on 08/02.

[vinaykul] InPlace Pod Vertical Scaling PR status update
- Pod resize E2E test failed after rebase because GKE switched to cos-97 last week which defaults to cgroup2
  - we don’t support it yet (it was planned for beta)
  - disabling cgroup values check for resize verification unblocks but manual test on pre cos-97 (cgroup1) needed
- wangchen615 found that the scheduler takes 5 minutes to reevaluate pending pods after resizing down a bound pod.
  - Late stage fix in commits 563b254 and c6581a8
  - SIG-scheduling feels it is low risk and has signed off unofficially on slack
- thockin has been LGTM on API changes.
- Open issues tracked here.
- My sentiment has changed - I am nervous about late stage changes and missing cgroup2 support when CI is on cgv2

[harche] evented PLEG - kubernetes/kubernetes#111384

          [   https://github.com/kubernetes/kubernetes/pull/111642](https://github.com/kubernetes/kubernetes/pull/111642)

[rata]: Asked for exception for userns PR, as talked with Mrunal on slack
- Mrunal mentioned there are some concerns with phase II to discuss
- rata to update on PR, KEP and exception:
  - Capture the discussion, we agreed to reduce the scope to stateless pods.
  - One one hand, this buys us more time to figure out the details that some reviewers want about persistent volume support. On the other, it is very valuable to have support for stateless pods and it is an end in itself.
  - This should also eliminate all concerns on how this can graduate to beta and GA.
  - Should we change the feature gate name?
- any reason “it was a mistake that user must make an explicit request/limit when ResourceQuota has CPU/Memory setting."?
[bobbypage] cgroup v2 GA update
- CI has been running cgroupv2 images (COS/Ubuntu) on node e2e and cluster e2e
- cgroupv1 specific tests added
- More feedback has been obtained on cgroupv2 from customers
  - Tencent has been running on cgroups v2 for a while
- Planned doc updates and blog post

July 26, 2022

[ruiwen-zhao]:
- Reminder of the coming code freeze on 08/02..
- KEP status:
[Xander/Anish]: Discuss KEP #1003
- https://github.com/intel/platform-aware-scheduling/tree/master/telemetry-aware-scheduling/
- Power example scheduling: https://github.com/intel/platform-aware-scheduling/tree/master/telemetry-aware-scheduling/docs/power
Jing Follow up on local Storage Capacity Isolation Feature
- discussed with Ben kubernetes/enhancements#361 (comment)
- automatic detect and silence the feature will cause problem when system is switching to rootless in the future
- (dawnchen): SIG Node is fine with the proposal.
[rata]: userns k/k PR is open for 2 weeks now, need review/approvers
- We have a LGTM from Ruiwen (thanks!), but need approvers from lot of paths
  - Dawn will ping Jordan (github @liggitt) for approval in several paths. Tim Hockin is out this week
- Any help to have the lgtm/approved on time?
- Unrelated: Maybe user namespaces is a good topic for a kubernetes blog post?
  - (danielle): yes!
  - Danielle is asking how to proceed to go with a blog post for this (thanks!)
  - (danielle): **update: **added to the sig-release tracking sheet!
[ddebroy]: PodHasNetwork pod condition PR needs node reviewer/approver
- kubernetes/kubernetes#111358
- Confirmed that API review is not required for Alpha const definitions.
- Mrunal has started on this. Thanks Mrunal!!
- Qiutong will also take a look
[vinaykul] InPlace Pod Vertical Scaling PR status update
- Awaiting Node LGTM
- Added concise guidance for UpdateContainerResources CRI. mikebrow is LGTM
- SIG-Scheduling (huang-wei) signaled LGTM for current code.
  - E2E test and optimization can come in follow-up PRs.
    - wangchen615 is close to wrapping up tests requested by SIG-scheduling
- thockin has been LGTM on API changes.
- Open issues tracked here.
[bobbypage] cgroupv2 GA
- New tests added to support cgroupv1
- Blocking presubmit updated to latest COS-97 and other tests to Ubuntu 22.04 with cgroup v2 enabled(kubernetes/kubernetes#111412 , kubernetes/test-infra#26847, kubernetes/test-infra#26831
- Feedback gathered about some customer usage

July 19, 2022

[rata]: Can we have a review for userns k/k PR? Code freeze is coming soon :)
- Mrunal is reviewing it
- Ruiwen would review it for Containerd related changes
[jstur]: Windows cri-only posandbox stats: kubernetes/kubernetes#110754. Do we keep the structure generic or make it specific for windows?
- David Porter is reviewing it.
- move design (including kubernetes/kubernetes#110754 (comment)) to the kep
[Brett]: SRO to kmmo repo rename
- Brett to open an issue to get the rename going
[vinaykul] InPlace Pod Vertical Scaling PR status update
- Rebased code to resolve latest conflict
- Added concise guidance for UpdateContainerResources CRI
- SIG-Scheduling (huang-wei) signaled LGTM for current code.
  - E2E test and optimization can come in follow-up PRs.
    - wangchen615 is working on addressing Danielle’s feedback & E2E test targeting scheduler changes
- thockin has been LGTM on API changes.
- Open issues tracked here.
- What do we need for Node & CRI LGTM for alpha?
Jing Local Storage Capacity Isolation Feature
- Problem:
  - kubernetes/enhancements#361 (comment)
  - It depends on the capability of checking local storage root filesystem usage where kubelet is running. Some special systems (kind rootless) cannot detect root file system disk usage, so disable this feature with feature gate for CI testing
  - Feature gate will be removed when moving to GA 1.25. If rootfs disk usage is not available by cadvisor, it will block kubelet starting. https://github.com/kubernetes/kubernetes/blob/5b92e46b2238b4d84358451013e634361084ff7d/pkg/kubelet/kubelet.go#L1385
- Proposal: Add kubelet option enableLocalStorageCapacityIsolation(default=true) into kubelet configuration
  - The default value is true.
  - For systems that cannot support detecting root disk usage, set enableLocalStorageCapacityIsolation=false in kubelet configuration. In this case, kubelet can continue to start without rootfs disk usage information. So ephemeral storage allocatable is not set either. If pod has ephemeral storage request/limit set in this case, pod will fail to create because allocatable storage is not available.
  - [feedback] whether we can automatically detect this. avoid complicating kubelet
- Feedback from sig-storage: seems fine
[Peter] CRI stats check-in
[Dawn] FYI: PR for SIG Node Contributor ladder was merged: #6725 Thanks Derek!
- Please send your PR against that ladder if you think you are ready. Thanks!

July 12, 2022

[danielfoehrn] KEP proposal: Dynamic Resource Reservations
[vinaykul] InPlace Pod Vertical Scaling PR status update
- Rebased code to resolve latest conflict
- Guidance for runtime is taking shape - thanks mrunalp & kolyshkin
- SIG-Scheduling (huang-wei) signaled LGTM for current code.
  - E2E test and optimization can come in follow-up PRs.
    - wangchen615 is working on addressing Danielle’s feedback & E2E test targeting scheduler changes
- thockin has been LGTM on API changes.
- Open issues tracked here.
- What do we need for Node & CRI LGTM for alpha?
[adrianreber] Forensic Container Checkpointing
- code PR kubernetes/kubernetes#104907
- LGTM by Ryan, Danielle, Mike(per existing kep) and Mrunal
- I think only Derek's /approve is now missing

July 5, 2022

[decarr] Feedback for evaluating reviewer/approver requests
[vinaykul] InPlace Pod Vertical Scaling PR status update
- vinaykul on vacation for the week but below is latest status:
- Guidance for runtime is taking shape - thanks mrunalp & kolyshkin
- SIG-Scheduling (huang-wei) signaled LGTM for current code.
  - E2E test and optimization can come in follow-up PRs.
- thockin has been LGTM on API changes.
- wangchen615 is working on addressing Danielle’s feedback on E2E test and adding E2E test that targets scheduler. Tracked issue 110490.
[adrianreber] Forensic Container Checkpointing
- code PR kubernetes/kubernetes#104907
- LGTM by Ryan, Danielle, Mike(per existing kep) .
- Ready to be merged?
- Derek expressed possible discussion/evaluation needed for the new exposed checkpoint service, mrunalp requested to look it over in that context.
  - This was discussed a couple of months ago in the KEP and the consensus was that for Alpha we use it as it is right now. The checkpoint archive can only be accessed by the local root user as designed in the KEP. For Beta we need to think about if we want additional authorization on the Kubelet checkpoint API endpoint.
[swsehgal] Follow up on populating Node Resource Topology-api repository
- Issue: #6308
- Response from Kubernetes Release Engineering team to follow the github workflow and solicit reviews.
- First PR to review: kubernetes/kubernetes#110252
[ddebroy] Enhance Pod Initialized condition (for pods without init containers) in a fresh KEP
- scoped to pods without init containers
- will address the comment from Dawn at kubernetes/enhancements#3087 (comment)
[natalivlatko] SIG Node reviewers for Docs reviews (see Slack thread: https://kubernetes.slack.com/archives/C0BP8PW9G/p1656915324289179)
- https://github.com/kubernetes/website/pulls?q=is%3Apr+is%3Aopen+sig-node-pr-reviews

June 28th, 2022

[vinaykul] InPlace Pod Vertical Scaling PR status update (vinaykul OOO next week)
- KEP template changes merged. KEP is now tracked for 1.25
- Fixed CRI & test issues found by Mike and Derek respectively.
- [derek on 6/14] Reviewed with feedback, need to clarify core behavior expectation
  - Kubelet -> CRI interaction pattern for observing state
    - This assumes the runtime reports values as read from host cgroup
  - [vinaykul] Please review my response on ResizeStatus generation.
  - [vinaykul] Please review my comment on suggested runtime behavior.
[bthurber & mrunal] - special-resource-operator repo rename
- Issue: kubernetes-sigs/kernel-module-management#6
- Community meeting notes
- Derek - Write up a readme and send it to the mailing list
[adrianreber] Forensic Container Checkpointing
- code PR kubernetes/kubernetes#104907
- LGTM by Ryan, Danielle, Mike(per existing kep) .
- Ready to be merged? Derek expressed possible discussion/evaluation needed for new exposed checkpoint service, mrunalp requested to look it over in that context.
[swsehgal] Populating Node Resource Topology-api repository
- Issue: #6308
- PRs:
- Action Items
  - [swsehgal] To identify if we need an enhancement proposal for this work (currently no code is being proposed in core Kubernetes only the API) and get a member of the release team to take a look at the PRs.
    - Slack thread created on #sig-release channel
  - [Derek/ Sasha] To take a look at the PRs/ repo creation request to see if we want to populate the repo.

June 21st, 2022

[klueska] Need final approval from Derek or Dawn for following KEP
- CPUManager policy option to align CPUs by Socket instead of NUMA node
- I have given my preliminary /lgtm and /approve on it already (with caveats of things we should consider before moving to beta)
[mrunal] - KEPs needing approval / review
[danielle] Testing/Reliability
- General agreement that we have work to do
  - Defining unreliability:
    - “Simple”: Violating a published invariant of the kubernetes api
    - More complicated when adding plugin interfaces to the mix, and “undocumented” things around when state transitions will happen.
  - A few key areas for improvements:
    - CRI test interfaces, both testing CRIs themselves, and testing how the Kubelet will respond to CRI failure
      - Kubelet+CRI failure: kubernetes/kubernetes#110429
      - Need an issue for critest in cri-tools.
      - contract testing
    - Use tests to document the invariants that exist, to avoid shipping regressions, and be aware when shipping behavioral change.
      - We don’t need to find every latent bug, but we should make it harder to ship new ones.
      - Prioritize testing behavior over timing, and then we can fixing timing later.
      - [unit tests issue] kubernetes/kubernetes#109717
    - Clarifying where we’re lacking features that cause failure: kubernetes/kubernetes#110428
    - As reviewers/approvers we need to ensure that new changes get test coverage.
[alculquicondor] Overview of kubernetes/enhancements#3374
- first round reviewers: Qiutong, David
[adrianreber] Forensic Container Checkpointing
- code PR kubernetes/kubernetes#104907
- LGTM by Ryan, Danielle, Mike
- Ready to be merged?

June 14th, 2022

Total active pull requests: 192

Incoming		Completed
Created:	17	Closed:	6
Updated:	57	Merged:	13

Fish out: kubernetes/kubernetes#104140

[dawnchen] https://bit.ly/k8s125-enhancements is updated for SIG Node.
- Total: 19 enhancements
[paco] I worked on this kubernetes/enhancements#1029 for sometime since 1.22 and fixed a bug and tried to add metrics/log for this feature, and the promotion pr was updated kubernetes/enhancements#2697 Can we add this to v1.25 if it met the beta-promotion bar?
[vinaykul] InPlace Pod Vertical Scaling PR status (vinaykul unavailable next two weeks)
- [derek] Reviewed with feedback, need to clarify core behavior expectation
  - Kubelet -> CRI interaction pattern for observing state
    - This assumes the runtime reports values as read from host cgroup
- KEP needs a template catch-up update. Please review this PR.
  - Partial/placeholder current unit-test cov info added. A more detailed breakdown will have to wait until after the OSS NA conference.
- API (Tim Hockin) LGTM. Scheduling (need to add E2E test, then LGTM likely)
- Issues to fix are being tracked here. Volunteers welcome :)
[adrianreber] Forensic Container Checkpointing (not able to join (again))
- code PR kubernetes/kubernetes#104907
  - reviews done by Mrunal and Danielle (thanks)
  - probably almost finished, waiting for additional reviews/approval
  - Waiting for feedback from Mike about CRI changes
    - Initially we targeted checkpoints archive on the local file system
    - Right now we have successfully implemented checkpoint OCI images (not standardized (yet)) in the local registry in containerd and CRI-O
    - Storing checkpoints as OCI images was a request during early discussions (1.5 years ago)
    - At this point we could completely drop checkpoints written to the local file system and only store checkpoint images in the local containerd/CRI-O registry
    - Please let me know in the PR if it would be preferred to drop the local file system checkpoint archives (I am in favor of it)
- Concerning the PR discussion this would mean we need to keep the parameter about the checkpoint destination (something like localhost/checkpoint-image:tag) and not remove the destination parameter as suggested by Mike
  
  [Derek] fundamental question - PR allows to change CPU and Memory request and limit. But when the change will manifest? How kubelet will know if this errored? Should we read the value back on whether change was applied?
  
  [MikeB] error code from API call will notify if failed to lower the limit. Swallowed on increase.
  
  [Mrunal] confirm this^^^. As long as we increment properly should be fine.
  
  [Derek] Need to make sure kubelet always knows the latest applied values. Also in case of emptyDir <missed this>
  
  Also PR makes an assumption on a PLEG event being handled properly.
  
  [Derek] Need to check the behavior on cgroupv2 as well.
[ddebroy] SandboxReady pod condition KEP
- KEP kubernetes/enhancements#3087
  - Reviewed by Qiutong and Ruiwen
  - Looking for a review from Derek.
[ruiwen-zhao] Adding GA criteria for KEP-2133 kubelet credential provider
- kubernetes/enhancements#3379
- Reviewed/Approved by SergeyKanzhelev and deads2k
- Looking for a review from Derek (or other sig node approvers)
[mikebrow] exec with uid/gid (maybe user) option vs current root only.. any interest? original discussion1224—
- discussion centered on use cases, comparing with ephemeral container support, login with ssh plugin extension…
- Sergey had a good idea about a flag for disabling root defaulting
  - perhaps we could use container default here..?
[ed] Dynamic resource allocation KEP: request for review: kubernetes/enhancements#3064
- Being reviewed by Tim Hockin
- Looking for a 2nd review round from Derek
[marquiz] Class resources KEP, re-triage, reviewers/approver were missing last week
[mckdev] Always set alpha.kubernetes.io/provided-node-ip kubernetes/kubernetes#109794
```
      * 
```

June 7th, 2022

Total active pull requests:192

(for the past two weeks):

Incoming		Completed
Created:	29	Closed:	17
Updated:	94	Merged:	17

[Ruiwen] Fill up https://bit.ly/k8s125-enhancements from https://docs.google.com/document/d/1U10J0WwgWXkdYrqWGGvO8iH2HKeerQAlygnqgDgWv4E/edit#bookmark=id.qovbe39npaih
[SergeyKanzhelev] Proposed soft freeze July 12th. Expectations for the soft freeze:
Beta graduations and deprecations should be fully merged by this date
Alpha features and GA must have PRs open that should be ready to review
[marquiz] Class resources KEP, follow-up on blockio
[adrianreber] Forensic Container Checkpointing (cannot make it to today's meeting)
- KEP kubernetes/enhancements#3264
  - Merged, thanks for the review and approval
- code PR kubernetes/kubernetes#104907
  - multiple review rounds (thanks Mrunal!)
  - probably almost finished, waiting for additional reviews/approval
  - Added unit test, fixed e2e test as suggested by Danielle (thanks)
[vinaykul] InPlace Pod Vertical Scaling PR status (vinaykul out today)
- Enhancements PR merged - thanks Dawn!
- Derek’s re-review is in progress.
- API (Tim Hockin) LGTM. Scheduling (need to add E2E test, then LGTM likely)
- Issues to fix are being tracked here. Volunteers welcome :)
[ddebroy] SandboxReady pod condition KEP
- KEP kubernetes/enhancements#3087
  - Reviewed by Qiutong and Ruiwen
  - Looking for a review from Derek.
[Peter] CRI-O and containerd CVE and long-term fix discussion
- There is a cap on memory used that we need to document this limitation to k8s customers.
- [mrunal] document it in CRI and document the behavior: we truncate at that point.
- [peter] will take an action item to do this documentation.
- CRI-O: https://access.redhat.com/security/cve/cve-2022-1708
- containerd: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2022-31030
- Followed up in kubernetes/kubernetes#110435
[Hanamantgoud] Reviving https://github.com/kubernetes/enhancements/pull/1224

May 31, 2022

[klueska]: Please add following enhancement to tracking sheet:
- New CPU Manager Policy: align-by-socket
- Link to enhancement Issue
- Link to row in KEP - Planning document
[matthyx] (cannot be present): Please remove Keystone containers KEP from tracking:
- adisky is on maternity leave
- code will likely impact PLEG, will prefer to have more tests added by sig-node: reliability project (which I want to participate to) before refactoring
[rata]: userns KEP PR open since early April. Anything missing?
- The sig-node freeze is in a few days
- AFAIK there is nothing missing. Got LGTM a few hours ago, missing /approve
[vinaykul] InPlace Pod Vertical Scaling - status update
- Merged KEPs 2273 with 1287. Awaiting review.
- API (Tim Hockin) LGTM. Scheduling (need to add E2E test, then LGTM likely)
- Awaiting Derek review completion**.**
- Issues to fix are being tracked here. Volunteers welcome :)
[adrianreber] Forensic Container Checkpointing (cannot make it to today's meeting)
- KEP kubernetes/enhancements#3264
  - Reviewed, now waiting for approval
  - There is a review comment from my Mike which is not totally clear to me and I am looking for clarification what Mike was exactly asking for
- code PR kubernetes/kubernetes#104907
  - multiple review rounds (thanks Mrunal!)
  - probably almost finished, waiting for additional reviews/approval
[bobbypage/ruiwen] issue with terminating pods reporting ready=true
- issue: kubernetes/kubernetes#108594
- Repro test found: kubernetes/kubernetes#110257
- PR fix: kubernetes/kubernetes#110256
[ddebroy] SandboxReady pod condition KEP
- KEP kubernetes/enhancements#3087
  - Reviewed by Qiutong and Ruiwen
  - Looking for a review from Derek.

May 24, 2022

[rata]: userns KEP PR open since early April. Can we have a review?
- We want to move to alpha (phase I) in 1.25
- PR here: kubernetes/enhancements#3275
- Mrunal will review it this week. Mike Brown and other folks in Containerd will review the related pieces for Containerd.
[bobbypage] Pod Lifecycle Documentation / Guarantees Doc needed
- xref: kubernetes/kubernetes#110115
- 1.22 vs old diff - kubernetes/kubernetes#109414 (comment)
- Pod Readiness issue and PR fix kubernetes/kubernetes#102537
  - kubernetes/kubernetes#110191
[vinaykul] InPlace Pod Vertical Scaling PR status
- Merged KEPs 2273 with 1287. Please review.
- API (Tim Hockin) LGTM. Scheduling (need to add E2E test, then LGTM likely)
- Awaiting Derek to complete the kubelet review. Can we please prioritize to avoid another release slip? My time is going to be limited as June rolls around - multiple CPFs accepted for LF OSS conference, I have additional work of content & demo prep.
- Issues to fix are tracked here. Volunteers welcome :)
[adrianreber] Forensic Container Checkpointing (cannot make it to today's meeting)
- KEP kubernetes/enhancements#3264
  - Reviewed, now waiting for approval
- code PR kubernetes/kubernetes#104907
  - multiple review rounds (thanks Mrunal!)
  - probably almost finished, waiting for additional reviews/approval
[ddebroy] SandboxReady pod condition KEP
- KEP kubernetes/enhancements#3087
  - Reviewed by Qiutong and Ruiwen
  - Looking for a review from Derek.
[jan0ski] Promote AppArmor to GA KEP
- KEP kubernetes/enhancements#3298
  - Looking for confirmation from sig-node, sig-architecture, sig-auth for 1.25 target
  - Help with implementation or review is welcome :)
- Reworked from old KEP from Sascha kubernetes/enhancements#1444

May 17, 2022

Canceled due to Kubecon

May 10, 2022

[Derek/Dawn] Update on sig-node reliability kickoff last week

          will upload the recording


          spent time reaching consensus on what reliability means for node - clarify    kubelet vs. runtime vs. operating system.


     Increase test coverage and then next steps.

     Calling on the community to help. 

     matthyx - discuss at kubecon

[matthyx] discuss about Keystone containers KEP
- matthyx will update the document and come back in a few weeks to present the milestone and design
[knight42] review KEP: Split stdout and stderr log stream
```
       Mrunal will make a pass
```
[rphillips] Evented PLEG initial work [doc] (Points of Contacts: Ryan Phillips, Mrunal Patel, Harshal Patil)
- Derek - Data on perf?. Ryan - We don’t have that yet.
- Derek - Clarification that we won’t get rid of the list entirely but be able to make the lists less frequent.
- Mike will review
[mikebrow/Paul] KEP: Sub-Second Probes
- hackmd version of the KEP update based on reviews
  - hackmd document around the various options.
- PR draft
- KEP: Sub-Second Probes (issue) KEP text will be updated to latest implementation/proposal, would like to have agreement on direction and move discussion to prioritization.
- Derek: We should as a community agree on how probes should be measured, charged. Mrunal to present what Red Hat has been working on in this area.
- Dawn: We haven’t defined how the pod has additional overhead to run probes or logs, etc. The initial step is sharing the performance benchmark to the community.
[marquiz] follow-up discussion on Class resources KEP.
- Action to Markus:
  - blog post for k8s.io blog, description on what is possible now with runtimes and existing annotations
  - Come back with demo/description of how Block I/O will be utilized by user
[vinaykul] InPlace Pod Vertical Scaling PR status
- Merged KEPs 2273 with 1287 per last week's discussion. Please review.
- API (Tim Hockin) LGTM.
- Awaiting Derek to complete the review. Can we please prioritize to avoid another release slip? My time is going to be limited as June rolls around - multiple CPFs accepted for LF OSS conference, I have additional work of content & demo prep.
- Issues to fix are tracked here. Volunteers welcome :)
[mrunal] kubecon next week - do we keep the meeting?
Cancelling next week.

May 3, 2022

[mrunalp/ ruiwen] 1.25 planning continue: https://docs.google.com/document/d/1U10J0WwgWXkdYrqWGGvO8iH2HKeerQAlygnqgDgWv4E/edit#bookmark=id.qovbe39npaih
[Announcement, danielle] testing/reliability project, we'll be meeting at 8AM PST (5PM CEST) on Wednesday (more info: https://groups.google.com/g/kubernetes-sig-node/c/-Y8TC_l7xp8/m/U9b6icAyAwAJ)
[vinaykul] InPlace Pod Vertical Scaling PR status
- vinaykul not in today due to conflict.
- Merged KEPs 2273 with 1287 per last week's discussion. Please review.
- API (Tim Hockin) LGTM.is:open is:issue label:sig/node created:
- Awaiting Derek to complete the review.
- Issues to fix are tracked here. Volunteers welcome :)
- Can we make an early commit for v1.25?

April 26, 2022

1.24 retro
- KEPs tracking: https://docs.google.com/spreadsheets/d/1T21mUTvPh70NB2eseHjCyD4LgRjyxWI9Bd1SoP8zAwA/edit#gid=0

Done (6):

Issue Number	Name	Stage Status	Stage	Assignee
281	DynamicKubeletConfig	Removal		SergeyKanzhelev
688	PodOverhead	Graduating	Stable	SergeyKanzhelev
2133	Kubelet Credential Provider	Graduating	Beta	adisky
2221	Dockershim removal	Major Change	Stable	SergeyKanzhelev
2712	PriorityClassValueBasedGracefulShutdown	Graduating	Beta	mrunalp
2727	gRPC probes	Graduating	Beta	SergeyKanzhelev

Removed from Milestone (17):

Issue Number	Name	Stage Status	Stage	Assignee
127	User Namespaces	Graduating+	Alpha	rata
1287	In-place Pod Vertical Scaling	Graduating+	Alpha	vinaykul
1972	ExecProbeTimeout	Graduating+	Stable	jackfrancis
2008	Container Checkpointing (CRIU)	Graduating+	Alpha	adrianreber
2043	List/watch for concrete resource assignments via PodResource API	Graduating+	Stable	swatisehgal
2254	Cgroupsv2	Graduating+	Stable	giuseppe
2371	cAdvisor-less, CRI-full stats	Graduating+	Beta	haircommander
2400	Swap	Graduating+	Beta	ehashman
2413	SeccompByDefault	Graduating	Beta	saschagrunert
2535	Ensure Secret Pulled Images	Graduating+	Alpha	mikebrow
2823	Node-level pod admission handlers	Graduating+	Alpha	SaranBalaji90
2837	Pod level resource limits	Graduating+	Alpha	n4j
2872	Keystone Containers	Graduating+	Alpha	adisky
2902	New CPU Manager Policy: distribute-across-numa	Graduating+	Beta	klueska
3063	Dynamic resource allocation	Graduating+	Alpha	pohly
3085	Pod conditions around starting and completion of pod sandbox creation	Graduating+	Alpha	ddebroy
3162	Add Deallocate and PostStopContainer to device plugin API	Graduating	Alpha	zvonkok

    1.22 release with 24 KEPs tracked and **13 merged**


    1.23 was tracking 14 with **8 merged**


    1.24 with 23 tracked and **6 merged**


    1.23 release retro summary:


    Good:

Planning and tracking is useful
Soft freeze helps
Early merges are great
```
  Can be better:
```
Lack of reviewers and early reviews
Lack of approver’s bandwidth
Things that went well

Notes:
Reviewer found missing tests during the review process
We are making progress even though things are moving slow sometimes.
For in-place pod vertical scaling, containerd side changes are done in parallel and ready to go.
Collaboration with the runc community is good.
Things that didn’t went well

Notes:
In-place vertical scaling takes long, but we are practicing caution in the review process.
Original author moved forward last minute
Keystone Containers design came late
In-place scaling scope increased over the review process
unit tests in different location than the code
Syncing changes between Kubernetes and container runtime. (One side needs to cut a release first.)
(compared to runc) Containerd community could be more proactive when cutting releases
AIs
- Investment on testability and reliability next cycle. Leadership to scope and direct these works needed.
- Don’t accept changes without test coverage or those lack testability. Reviewers need to hold the bar.
- Build component tests - volunteers needed
- Clearer instructions on which folder to add tests, based on the code. Automated tool?
[SergeyKanzhelev] KEP 1.24 retro and KEPs 1.25 planning kick-off
- https://docs.google.com/document/d/1U10J0WwgWXkdYrqWGGvO8iH2HKeerQAlygnqgDgWv4E/edit#bookmark=id.qovbe39npaih
[rata]: userns KEP: CRI changes PR open for 19 days now
- Can we ask for a review, please? :)
- Do we need review from Windows/VM runtimes maintainers for Alpha phase or for beta?
  - Mark R (what is the github handle?) will take a look. But it doesn’t seem like a blocker for alpha
  - rata: Also, this lives inside the Linux section of the CRI
- Can you help us to reach out to the relevant runtime maintainers to also take a look?
- Can we aim for alpha in 1.25
  - It is currently not listed in that section in the doc Sergey shared
  - rata: Added, thanks!
[adrianreber] Forensic Container Checkpointing
- KEP update PR created as suggested three weeks ago
  - kubernetes/enhancements#3264
  - Move from 1.24 to 1.25
  - include CRI API changes in KEP
- Corresponding code PR updated
  - kubernetes/kubernetes#104907
- Ready for review
[vinaykul] InPlace Pod Vertical Scaling PR status
- API (Tim Hockin) LGTM. Derek’s review is in progress.
- Issues to fix are being tracked here.
- Can we make an early commit for v1.25?

April 19, 2022

Cancelled - due to availability of leads.

April 12, 2022

Total active pull requests: 172 (+10 from last week)

Incoming		Completed
Created:	16	Closed:	5
Updated:	44	Merged:	2

[ehashman] Status of kubernetes/kubernetes#109082 ? (final node bug in the milestone)
- Release was scheduled for Apr. 19 but there is no 1.24 release branch yet; it will likely be delayed: kubernetes/sig-release#1877
- Node 1.24 burndown: https://github.com/pulls?q=org%3Akubernetes+label%3Asig%2Fnode+is%3Aopen+milestone%3Av1.24
- Release blocker: kubernetes/kubernetes#108910 (go 1.18 related)
- We believe #109082 is a regression and should block the release, Dawn to bump priority.. (Mike Brown: i’m looking again..)
[marquiz] Intro and demo of Class resources KEP.
[vinaykul] InPlace Pod Vertical Scaling status
- Derek’s PR review is in progress.
- Can we make an early commit for v1.25?
[adrianreber] Forensic Container Checkpointing (just FYI)
- KEP update PR created as suggested last week
  - kubernetes/enhancements#3264
  - Move from 1.24 to 1.25
  - include CRI API changes in KEP
- Corresponding code PR updated
  - kubernetes/kubernetes#104907
- Ready for review
[mweston & Alexander Kanevskiy] reminder to please review: Kubelet Resource Plugin RFC

April 5, 2022

Total active pull requests: 162 (-17 from two weeks)

Incoming		Completed
Created:	55	Closed:	15
Updated:	122	Merged:	62

[ehashman] 1.24 release blockers/regressions
- kubernetes/kubernetes#109082
- kubernetes/kubernetes#109182
- kubernetes/kubernetes#109281
- We need people assigned/actively working on these
[adrianreber] Forensic Container Checkpointing
[ & Alexander Kanevskiy] Kubelet Resource Plugin Model *
- Ask: please comment so we can figure out how to proceed
[ddebroy] SandboxReady KEP comments kubernetes/enhancements#3087 (comment)

March 29, 2022

No agenda, canceling to focus on code freeze.

For any urgent items for code freeze, please ping sig-node slack channel.

March 22, 2022

Total active pull requests: 179 (+5 from last week)

Incoming		Completed
Created:	25	Closed:	5
Updated:	81	Merged:	15

[Announcements] Code freeze
- Reminder: Mar. 29 - next week!
[danielle] annual report blocked on Derek (final lgtm on contributor doc)
[ehashman] Review of KEPs targeted for release

The following KEPs are expected to be fully merged (Beta/Deprecations):
[Done] 281: DynamicKubeletConfig kubernetes/enhancements#281
[soft cut] 2133: Kubelet Credential Provider kubernetes/enhancements#2133 * e2e tests required for Beta: https://github.com/kubernetes/kubernetes/pull/108651 * Feature gate + API promotion to Beta: kubernetes/kubernetes#108847
[Done] 2221: Dockershim removal kubernetes/enhancements#2221
[Cut] 2371: cAdvisor-less, CRI-full stats kubernetes/enhancements#2371 * david to update the KEP to reflect keeping in alpha
[Cut] 2400: Swap kubernetes/enhancements#2400
[in-progress] 2712: PriorityClassValueBasedGracefulShutdown kubernetes/enhancements#2712 * [mrunal] Awaiting approve for feature
[in-progress, let’s keep in release] 2727: gRPC probes kubernetes/enhancements#2727

The following KEPs should have WIPs up that are ready for review (Alpha/GA):
[only docs left] 688: PodOverhead kubernetes/enhancements#688
[keep] 1287: In-place Pod Vertical Scaling kubernetes/enhancements#1287
[review in progress] 2008: Container Checkpointing (CRIU) kubernetes/enhancements#2008
[Cut] 2535: Ensure Secret Pulled Images kubernetes/enhancements#2535 * [mrunal] Cut from release, pr in progress, design changes may be needed [mikebrow] nod.. extending the KEP to include phase II plans for persistence and other related items. \
[vinaykul] InPlace Pod Vertical Scaling status
- WIP - working on API review issues.

March 15, 2022

Total active pull requests: 174 (+3 from last week)

Incoming		Completed
Created:	18	Closed:	5
Updated:	66	Merged:	15

Reminder: Mar. 29 is code freeze
[vinaykul] InPlace Pod Vertical Scaling status
- WIP. Responding to Derek’s comments and addressing identified issues
[danielle] as part of the sig annual report (https://docs.google.com/document/d/1JAvi8ptbovvjSqh88378YB9irJUqWcT-PQ5YsYmdD3c/edit#) we have various pieces of documentation to update across the sig that need discussion.
- We need to document the progression ladder from new contributor -> reviewer -> approver, asking for dawn/derek to finalize the doc they were working on to move into the community repo.
- CONTRIBUTING.md updates
  - We need to refine our on-ramp for new contributors a little here. What do folks think is important?
    - Today that page is a pile of links, with some helpful intro to building k8s docs from dims at the bottom
[ddebroy] Updates to kubernetes/enhancements#3087
- Addressed comments/concerns from Elana and Derek so far
- Single SandboxReady condition
[dawnchen] Status update on OutOfCpu issue?
- regression since 1.22.
- Clayton is working on a fix
- It’s close, David is testing it. Should merge soon

March 8, 2022

Total active pull requests: 171 (+11 from last week)

Incoming		Completed
Created:	23	Closed:	5
Updated:	59	Merged:	10

[SergeyKanzhelev] Review KEPs which are in soft freeze

The following KEPs are expected to be fully merged (Beta/Deprecations):
[Done] 281: DynamicKubeletConfig kubernetes/enhancements#281
[soft cut] 2133: Kubelet Credential Provider kubernetes/enhancements#2133
[Done] 2221: Dockershim removal kubernetes/enhancements#2221
[keep in alpha] 2371: cAdvisor-less, CRI-full stats kubernetes/enhancements#2371 * david to update the KEP to reflect keeping in alpha
[cut from release] 2400: Swap kubernetes/enhancements#2400
[in-progress] 2712: PriorityClassValueBasedGracefulShutdown kubernetes/enhancements#2712
[in-progress, let’s keep in release] 2727: gRPC probes kubernetes/enhancements#2727

The following KEPs should have WIPs up that are ready for review (Alpha/GA):
[keep it] 688: PodOverhead kubernetes/enhancements#688
[keep] 1287: In-place Pod Vertical Scaling kubernetes/enhancements#1287
[review in progress] 2008: Container Checkpointing (CRIU) kubernetes/enhancements#2008
[pr in progress, design changes may be needed] 2535: Ensure Secret Pulled Images kubernetes/enhancements#2353 * qq: is phase 1 valuable by itself? * [Derek] there may not be enough benefits in phase 1 alone. Especially if long term the logic is moving to the runtime, some benefit if users wipe imagefs on reboot of host. * mrunal and mike to decide whether to keep it in milestone by confirming the value pf phase 1.

29th March 2022: Week 12 — Code Freeze
[vinaykul] InPlace Pod Vertical Scaling status
- My company work has taken priority, but I’ll start addressing Derek’s comments after this coming Friday.
- Won’t be in today’s meeting due to conflict.
[bobbypage] Update on OutOfCPU issue
- Fix is in progress (kubernetes/kubernetes#108366), but it is a tricky fix and need to be careful to avoid introducing new regressions
- Uncovered another related issue with pod lifecycle refactor relating to eviction / graceful node shutdown [Will create a GH issue to track]

Mar 1, 2022

Total active pull requests: 160

Incoming		Completed
Created:	37	Closed:	10
Updated:	76	Merged:	24

Announcements
- Reminder: Soft freeze: Mar. 4
- Anything that won’t make milestone, feel free to remove it now
[derekwaynecarr] sig annual report
- Danielle offers to help out
[derekwaynecarr] got through most of in-place resizing pr, its big!
- kubernetes/kubernetes#102884
- some updates are requested by vinay when he has time in interim
- deferring all things cgroups v2 until this merges, but need to reconcile with memory qos by beta.
[marquiz] Class resources KEP
- Has been in hibernation but want to take it out of draft and proceed
[wenwu449] kubernetes/kubernetes#106884
- kubernetes/kubernetes#108366 also PR#107845
Were autogenerated live captions helpful? We tried that out today for the first time.
- Sentiment in chat is that they were quite helpful, and could be turned off if they were not

February 22, 2022 [cancelled]

[dawnchen] The meeting is cancelled due to no agenda proposed. Thanks!

February 15, 2022

Total active pull requests: 154

Incoming		Completed
Created:	16	Closed:	21
Updated:	101	Merged:	17

[rata] Friendly ping for userns KEP
- Mrunal will review by EOW
[vinaykul] In-Place Pod Vertical Scaling status
- I have a conflict and won't be attending tomorrow’s SIG node meeting - status remains unchanged from last week:
  - @ruiwen-zhao has already created a draft PR adding containerd CRI support for this reporting effective cgroup CPU/mem config. Thanks Ruiwen!
  - PR kubernetes/kubernetes#102884
    - Awaiting Derek’s review.
[ddebroy] Pod sandbox conditions KEP next steps
- kubernetes/enhancements#3087
[mweston] Request for continued discussion re cpu management here (also friendly): https://kubernetes.slack.com/archives/C0BP8PW9G/p1644201307891349
- https://docs.google.com/document/d/1U4jjRR7kw18Rllh-xpAaNTBcPsK5jl48ZAVo7KRqkJk/edit

February 8, 2022

Total active pull requests: 175 (+3 from last week)

Incoming		Completed
Created:	21	Closed:	9
Updated:	52	Merged:	10

[vinaykul] In-Place Pod Vertical Scaling
- PR kubernetes/kubernetes#102884
  - Awaiting Derek’s review. (was out with illness, am better now ;-) will review)
- @ruiwen-zhao has already created a draft PR adding containerd CRI support for this feature. Thanks Ruiwen!
[qiutongs] share the findings of a containerd issue - “failed to reserve container name”
- Issue containerd/containerd#4604
~~[ahg-g] kubernetes/kubernetes#103934 (Derek will review)~~
[ahg-g] kubernetes/kubernetes#106884
[ddebroy] [KEP 2857]

February 1, 2022

Total active pull requests: 172 (+6 from last week)

Incoming		Completed
Created:	22	Closed:	3
Updated:	49	Merged:	11

Announcements:

KEP freeze is on Thursday!
[vinaykul] In-Place Pod Vertical Scaling
- PR kubernetes/kubernetes#102884 in
  - Awaiting Derek’s review.
[ddebroy] Pod sandbox conditions KEP
- Initial comments from Elana addressed
- Awaiting Derek’s review before 1.24 enhancements freeze
[adrianreber] Forensic Container Checkpointing KEP
- PRR approval done (thanks Elana)
- Waiting for final approval for 1.24
[sergeyKanzhelev] CRI versioning - timeline for v1alpha2 deprecation https://docs.google.com/document/d/1Q0_p7xsZts6aA9xNLuXy1YLpllzCKB0skJ2HPfKdDMk/edit#heading=h.6wfgjj2um01 Consumers of CRI API: kubernetes-sigs/cri-tools#883 and containerd/stargz-snapshotter#323

January 25, 2022

Total active pull requests: 166 (+10 since last week)

Incoming		Completed
Created:	21	Closed:	3
Updated:	52	Merged:	11

Announcements
- [ehashman] PRR Freeze Jan. 27, please fill out our PRR questionnaire
- KEP freeze is Feb. 3, next week!
[sallyom] kubelet tracing KEP for review
- kubernetes/enhancements#2832
- kubernetes/kubernetes#105126
[ahg-g] Update on batch WG
- #6299
[vinaykul] In-Place Pod Vertical Scaling
- PR kubernetes/kubernetes#102884
- Enhancements PR (update target to 1.24) needs to be merged before freeze
  - See kubernetes/enhancements#3153
- Implementation PR awaiting Derek’s review.
[vaibhav2107] Rotated container log files size not counted towards ephemeral-storage's limit on CRI-O
- Issue kubernetes/kubernetes#107447
- Need input
- Peter Hunt/Mrunal will take a look..
[SergeyKanzhelev] kubernetes/kubernetes#107190 (please commnet on the doc https://docs.google.com/document/d/1Q0_p7xsZts6aA9xNLuXy1YLpllzCKB0skJ2HPfKdDMk/edit# or the issue itself)
[mweston & swseghal] request to add any last notes to https://docs.google.com/document/d/1U4jjRR7kw18Rllh-xpAaNTBcPsK5jl48ZAVo7KRqkJk/ so we can go through and start work on documentation/clear up use cases
[Mike Tougeron] kubelet & cAdvisor metrics bug with cgroupv2 https://kubernetes.slack.com/archives/C20HH14P7/p1642595289013400
- v1.21 may be too early of the version for cgroupv2.
- What is the problem and what are the expectations?
- [Derek] Perhaps follow up on cgroup v2 KEP.
- [Mrunal] Need to write blog post to raise awareness where we are on cgroupv2.

January 18, 2022

Total active pull requests: 156 (-34 since the last meeting)

Incoming		Completed
Created:	14	Closed:	35
Updated:	116	Merged:	17

Announcements
- Reminder for upcoming dates:
  - PRR freeze (Jan. 27) - PRR questionnaire should be complete
  - KEP freeze (Feb. 3) - KEP should be approved and merged
- PRR team is seeking shadows/new members: https://groups.google.com/a/kubernetes.io/g/dev/c/OjepOATqwD4
[dawnchen/ahg] batch wg: #6299
- Some concerns about fragmentation for ownership of certain features raised on Slack
- Please add any comments on the PR: #6299
[vinaykul] In-Place Pod Vertical Scaling - plan for early 1.24 merge
- PR kubernetes/kubernetes#102884
- Enhancements PR updating target to 1.24 needs to be merged before freeze
  - kubernetes/enhancements#3153
- Alpha-blocker issues:
  - Review claims code in convertToAPIContainerStatus breaks non-mutating guarantees. - My Nov 10 response needs Elana’s followup
    - It is unclear where updates or mutates to any state are happening. Need a response/clarification.
- NodeSwap issue: Not an alpha blocker (No CI test failures seen). Asked @ichbinblau or @cathyhongzhang to file tracking issues.
- Other non-alpha-blocker issues:
  - I plan to file tracking github issues for the remaining (7-10) issues/TODOs and assign them to people that have offered to help. They can be fixed after this PR is merged most likely within the 1.24 timeframe.
[rata] Friendly ping for userns KEP?
- It seems there is agreement to start moving with phase 1 at least
- Didn’t receive any reviews from Derek/Mrunal/Sergey yet
  - Mrunal: Made a pass at the enhancement
[vinayakankugoyal/qiutongs] Ping for CRI changes for ambient capability
- PR kubernetes/kubernetes#104620
  - Mrunal: Reviewing..
- Approved KEP kubernetes/enhancements#2757

January 11, 2022

Total active pull requests: 190 (-21 since the last meeting)

Incoming		Completed
Created:	13	Closed:	13
Updated:	176	Merged:	26

Announcements
- 1.24 schedule finalized
- [ehashman] Proposed date for soft node freeze: Fri. Mar. 4, 2022
  - Applies to beta/deprecations
  - Alpha/GA features must have WIPs up
  - Action: ehashman to send email with announcement
- kubernetes/kubernetes#104143 Welcome wzshiming@ (Shiming Zhang) as SIG Node reviewer!
[derek] conclusion on special resource operator proposal
- see: kubernetes/org#3140
- Approved
[ehashman] 1.24 KEP prioritization
- Action: Elana to send email requesting review/feedback and explaining the prioritization goals
[swsehgal/fromani][heads up] PodResource API watch support to be postponed to 1.25 release due to capacity constraints. We aim to narrow down the design in the 1.24 timeframe and target the implementation in the 1.25 timeframe.
[pacoxu] Quotas for Ephemeral Storage #1029 Fixed a bug and try to adding a metric for this feature. Not sure if it can be promoted beta in 1.24 or 1.25.(I will update the KEP if it most likely will be promoted to beta in 1.24. If not, I may update it for 1.25 or later.)
- [Derek] There were some feature gaps that may need to be addressed before moving to beta.

[vaibhav2107] Rotated container log files size not counted towards ephemeral-storage’s limit

  ( [https://github.com/kubernetes/kubernetes/issues/107447](https://github.com/kubernetes/kubernetes/issues/107447) )


  Discussion on the issue

[ehashman] Hasn’t been triaged yet. Will be looked at tomorrow during Node CI/Triage subproject

[jackfrancis] What to do with ExecProbeTimeout during 1.24 release cycle?

  ( https://github.com/kubernetes/kubernetes/issues/99854 )

[vinaykul] In-Place Pod Vertical Scaling - plan for early 1.24 merge
- PR kubernetes/kubernetes#102884
- Pod resize E2E tests have been “weakened” for alpha - now passing CI
- Alpha-blocker issues:
  - Review claims code in convertToAPIContainerStatus breaks non-mutating guarantees. - My Nov 10 response needs Elana’s followup
    - It is unclear to me what part of the code updates or mutates any state. Need a response/clarification.
  - Container hash excludes Resources with in-place-resize feature-gate enabled, toggling fg can restart containers - Fixed & Reviewed
    - This fix seems acceptable to both Elana & Lantao but Hash annotation naming needs to be more specific. Working on it.
- NodeSwap issue: Not an alpha blocker (No CI test failures seen). Asked @ichbinblau or @cathyhongzhang to file tracking issues.
- Other non-alpha-blocker issues:
  - I’m fixing various issues found in reviews of API, scheduler and kubelet.
  - I’ll file tracking github issues for the remaining (7-10) issues/TODOs and assign them to people that have offered to help. They can be fixed after this PR is merged most likely within the 1.24 timeframe.
[mweston & swsehgal] reminder to review this-continued conversation inline for cpu management cases: https://docs.google.com/document/d/1U4jjRR7kw18Rllh-xpAaNTBcPsK5jl48ZAVo7KRqkJk/edit

January 4, 2022

Total active pull requests: 211 (+4 since the last meeting)

Incoming		Completed
Created:	33	Closed:	26
Updated:	108	Merged:	7

Announcements
- Release dates for 1.24
  - Cycle start next week (Jan 10)
  - Tentative release date (Apr 19)
- Saying hi to James Laverack, Release Lead
[mrunal] 1.24 planning *
[ddebroy] KEP for pod sandbox creation conditions in pod status [kubernetes/enhancements#3087]
[vinaykul] In-Place Pod Vertical Scaling - plan for early 1.24 merge
- PR kubernetes/kubernetes#102884
- Pod resize E2E tests have been “weakened” for alpha.
  - Resize success verified at cgroup instead of pod status.
  - All 31 tests are passing now.
- Alpha-blocker issues:
  - Container hash excludes Resources with in-place-resize feature-gate enabled, toggling fg can restart containers - Fixed
    - Please review this incremental change which addresses it.
    - [Lantao] Some customers may rely on the existing label implementation, even though it wasn’t intended for that use. Want to get feedback on this.
    - Alternative: use the same, current hash field but use it to store both hashes.
    - It might be clearer to write down both hashes separately.
    - [Elana] Some concerns about version skew of labels; if one kubelet is on one version and another is on a different one, they need to know how to use the labels correctly and not accidentally break each other.
    - [Derek] Will focus in and review. People should not assume guarantees for kubelet labels.
    - [Dawn] Adding an additional label reduces complexity because we don’t have to worry about internal versioning of the label scheme.
  - Reviewer claims code in convertToAPIContainerStatus breaks non-mutating guarantees. - My Nov 10 response needs Elana’s followup
    - It is unclear what part of the code updates or mutates any state. Need a response/clarification.
  - Multiple reviewers have felt that the NodeSwap issue is a blocking issue. But in the Dec 07 meeting, we felt this may not be an alpha blocker (No CI test failures seen. After I weakened resize E2E tests and all-alpha tests passed). However, we want to be sure. - Need Elana’s input.
    - Can we identify exact reasons why this would (or would not) be alpha blocker?
- I plan to create issues to track other non-alpha-blocking review items and assign them to folks to fix after PR is merged. A few people have offered to contribute. With help, we should be able to nail most, if not all, of them in the upcoming release.
[mweston & swsehgal] Request for reviewers of CPU doc here:
- [swsehgal] How do we make this more pluggable in the long run? Support more bespoke use cases

Files

meeting-notes-2022.md

Latest commit

History