Skip to content

Latest commit

 

History

History
886 lines (715 loc) · 74.6 KB

meeting-notes-2024.md

File metadata and controls

886 lines (715 loc) · 74.6 KB

SIG Node Meeting Notes

Dec 31, 2024

  • Cancelled

Dec 24, 2024

  • Cancelled

Dec 17, 2024

  • Cancelled

Dec 10, 2024

Dec 3, 2024

  • [minna] asking for some PR feedback kubernetes/kubernetes#125918
    • [Peter] We should add a feature gate beta + on by default
    • [Francesco] + 1 and we should extend
      • Maybe wait for critical pods to be ready and not just started before we try to start non critical pods
    • [Sergey] Similarly we could extend logic for admission
    • [Sergey] It’s possible this PR may switch starting failure to admission failure (if critical pod starts and fails, the pods that rely on them will fail differently)
  • [Sergey] Add agenda items ASAP, as we will cancel the meeting aggressively in December

Nov 26, 2024 [Canceled due to US holiday]

Nov 19, 2024 [Canceled due to lack of agenda]

Nov 12, 2024 [Canceled for KubeCon]

Nov 5, 2024

  • [danwinship/surya] Redesigning Kubelet Probes

    • antonio had opened an issue for runtime to do the checks
      • when kubetlet requests runtime to do probe
        • launching new pods and containers would be heavy
        • can we re-use the container-monitor process here ? instead of adding new ones?
          • tcp/http/grpc types of probes
            • would containerd/cri-o be able to do those probes?
            • [mrunal] containerd would have to do learn the split of daemon
      • [dawn] the pod sounds better than what we have today?
        • cost to the user though here at the application level usage is unpredictable - this is not worse than what we have today but there is a complexity for the user (with per container case)
        • probing pod is part of system overhead
    • will this be a new type of probe? replacement of existing probes?
      • if its a pod probe then some features like ensuring the port is open might be lost?
      • so maybe we should keep both types of probes and users can
    • Performance should not regress
    • checking a file in the filesystem and letting users put what they want?
  • [tallclair] In-Place Pod Resize: status update

`1Oct 29, 2024 (Canceled)

Canceled due to lack of the agenda.

Oct 22, 2024

Oct 15, 2024

Recording: https://www.youtube.com/watch?v=MyOhDhHRRKk

Oct 8, 2024

Recording: https://www.youtube.com/watch?v=_Zexxr4pxr8

Oct 1, 2024

Recording: https://www.youtube.com/watch?v=8YWCql6rLLk

  • KEP planning part 2
  • [ndixita] Pod Level Resources [Critical Scenarios] Pod Level Resources
  • [Lakshmi] IWhen container garbage collection is deprecated? Is there any alternate recommended way for container garbage collection?
  • [tjons] run an initContainer only once per rollout of the deployment, not on every scheduled pod.

Sep 24, 2024

Recording: https://www.youtube.com/watch?v=GkPrY56_gB4

Sep 17, 2024

Recording: https://www.youtube.com/watch?v=iH6KVk9B5DE

Sep 10, 2024

Recording: https://www.youtube.com/watch?v=9AfQA0DYR0E

Sep 3, 2024

Recording: https://www.youtube.com/watch?v=E8sw-fybnKc

  • OOM group -> Pod kill -> in the next iteration of KEP

Aug 27, 2024

Recording: https://www.youtube.com/watch?v=wGbkByo_NBI

  • [lauralorenz] CrashLoopBackOff KEP (slides)
    • updates and changes since 1.31 [5 minutes]
    • some discussion on path forward [10-15 mins if I can get it]
  • [pehunt] KEP wrangler brainstorm

Aug 20, 2024

Recording: https://www.youtube.com/watch?v=KUw2kSFsf2U

Aug 13, 2024 (cancelled)

MEETING IS CANCELED TODAY due to lack of agenda and vacations

Aug 6, 2024

Recording: https://www.youtube.com/watch?v=K-eBDYfHiTM

July 30, 2024

Recording: https://www.youtube.com/watch?v=JGYTQbs6eJk

July 23, 2024

Recording: https://www.youtube.com/watch?v=Wc7yrCLILK8

July 16, 2024

Recording: https://www.youtube.com/watch?v=0iPCt_FZxSk

July 9, 2024

Recording: https://www.youtube.com/watch?v=RTEtVbZPB-E

July 2, 2024 [Canceled for July 4th week]

June 25, 2024

Recording: https://www.youtube.com/watch?v=ExmOu9Twp3A

  • [Filip Krepinsky] Creation of a new WG
  • [Pranav Pandey] Kubelet not releasing idle threads
    - discussed here
    - I think this issue is due to golang, could we confirm this?
    - Could we also confirm if there is a direct way for the kubelet to set the
    maximum thread number by any parameter or something like that?
  • [lubomir] review my small PR that makes a windows/kubelet related change:
    • kubernetes/kubernetes#123137
    • warn instead of error for unsupported options on Windows
    • we don't need to exit the kubelet with an error on Windows just because the user is using a config that works on Linux.
    • old PR where we discussed we should not have different defaults on Windows:

June 18, 2024

Recording: https://www.youtube.com/watch?v=REmtlcXma_M

June 11, 2024

Recording: https://www.youtube.com/watch?v=A1XwOJxBL0c

June 4, 2024

Recording: https://www.youtube.com/watch?v=3dyVRBR7K7k

May 28, 2024

Recording: https://www.youtube.com/watch?v=RDWC4rtQOCo

May 21, 2024

Recording: https://www.youtube.com/watch?v=eSYWzusEZiA

May 14, 2024

No agenda, canceling this week.

May 7, 2024

Recording: https://www.youtube.com/watch?v=_FPa0TVPoY4

Apr 30, 2024

Recording: https://www.youtube.com/watch?v=iuZCxtAeoQ8

  • [SergeyKanzhelev] Annual report last call for comments: https://github.com/kubernetes/community/pull/7831/

  • [lauralorenz] intro on proposed changes to CrashLoopBackoff (slides), this is from Kubernetes#57291

  • [iholder101/Peter Hunt] #124060: Avoid swapping memory-backed volumes with tmpfs’ “noswap” option.

    • How to behave if the option is not supported?
    • If it is not supported, do we want to fallback to ramfs / BRD / zswap?
    • How should it be tested, since the CI runs with an old kernel (5.15 < 6.4)
    • Update KEP and issue with the current state
  • [iholder101/Peter Hunt]: In my time-zone this meeting takes place at 20:00 PM. Is it acceptable to reschedule this meeting for an earlier time? This might significantly help people from the EMEA region to join.

    • Defer to next week, hope for more consensus
    • in the meantime, ask the sig-node mailing list who would be able to make it that previously cannot
  • [ndixita]
    - kubelet archived logs permissions kubernetes/kubernetes#124229
    Solution: 1) Config options for users maybe kubernetes/kubernetes#124228 (comment)
    Have a feature gate that is removed later.
    Sergey: same issue with termination logs. kubernetes/kubernetes#108076
    - cadvisor enumerates memory and hugepages separately
    Issue: kubernetes/kubernetes#84426
    https://github.com/kubernetes/kubernetes/pull/119173/files#r1307246832

  • Can we know if this option is planned to be backported, and to which version?

         Recommended solution: fix in cadvisor, and assess backward compatibility (probably add a new field)
    
  • Question: How will the behavior be if huge pages are changed dynamically?

- [Peter Hunt] Finish KEP Planning

Apr 23, 2024

Recording: https://www.youtube.com/watch?v=-TEdQvF7kUE

Apr 16, 2024

Recording: https://www.youtube.com/watch?v=vjcRUX_vSbU

Apr 9, 2024

Recording: https://www.youtube.com/watch?v=o3AohYi9aQA

Apr 2, 2024

Recording: https://www.youtube.com/watch?v=Ho1kn-1p8Cg

Mar 26, 2024

Recording: https://www.youtube.com/watch?v=5TLp233Bisg

Mar 19, 2024 [Canceled for KubeCon]

Mar 12, 2024

Recording: https://www.youtube.com/watch?v=-435mh2GyGU

Mar 5, 2024

Recording: https://www.youtube.com/Kubernetes SIG Node 20240305watch?v=yBmVPBO9Y9Y

Feb 27, 2024

Recording: https://www.youtube.com/watch?v=3IRepUPQ0CU

Feb 20, 2024

Recording: https://www.youtube.com/watch?v=vEbpXkhm73M

  • [Kevin Hannon] Discuss configuration for pod logs location
  • [Kevin Hannon] KEP-4191 blocked until we have a cadvisor release
    • With freeze coming, is it possible to get a cadvisor release before the freeze?
    • [AI: dawnchen@] Identify the new owner to help? - Done!
  • [Jeffwan/LingyanYin]
    • Need reviewers for this PR - Configure MemoryRequest for InPlace pod resize in cgroupv2 systems kubernetes/kubernetes#121218
    • Dixita Narangdrop a comment and doc link for why memory.min shouldn't be set as yet
  • [AdrianReber] Graduate "Forensic Container Checkpointing" from Alpha to Beta PR
    • PR: kubernetes/kubernetes#123215
    • All changes in the PR are based on the KEP discussions
      • kubernetes/enhancements#4288
      • Mainly added tests for existing features as discussed during PRR
      • Switch from Alpha to Beta feature gate
      • Added separate sub-resource permission to better control access to the kubelet checkpoint API endpoint
    • Looking for reviewers
    • Will probably not be able to make it to the meeting
  • [fromani] Looking for approval review: kubernetes/kubernetes#121778 (for memory manager GA graduation, kubelet observability/visibilty) thanks mrunal!
  • [jsturtevant] KEP 2371 - CRI container and pod stats - Issue with UsageNanoCores calculated in CRI kubernetes/kubernetes#122092 (comment)
  • [kevin hannon] PID Stats issues in both containerd and crio

Feb 13, 2024

Recording: https://www.youtube.com/watch?v=WLm7m-8T82A

  • kannon92: self nominating to be a reviewer in sig-node

  • kubernetes/kubernetes#115201

    • KEP kubernetes/enhancements#4341

      - [Ritika] Discuss on this issue
      - kubernetes/kubernetes#123176

      - Pranav : Kubelet Thread Management and Resource Cleanup Post-High Workload
      - Discuss a scenario where Kubelet retains idle threads post-high workload,
      leading to unnecessary memory consumption.
      - Is there a way in kubernetes to set the number of maximum threads?
      If no, can k8s community implement the new parameter for it?
      kubernetes/kubernetes#123275

  • gathering pprofs of the kubelet would be useful to see if there are stuck goroutines

    • try to restrict the kubelet process in systemd unit file to cpuset:0, to force go runtime to allocate less threads and kill them more aggressively, and repeat the test. This would rule out either Go library vs. kubelet thread leaks.

    golang/go#14592

  • pehunt: imageRef discussion round 2

    • Problem: the public pod API field container.ImageID is constructed from the container status ImageRef field.
    • This ImageID is used to compare against the image.ID of the CRI call for garbage collection.
    • The container.ImageID is considered to be a stable API, but is not compatible with the image.ID field.
    • Options to fix:
      • return same value as image.ID in container.ImageRef (resolved repoDigest)
        • problem: two images tagged with different repos but the same digest would thrash in GC
      • add a resolvedImageID or something to ContainerStatus and pod API for doing GC
        • both CRI and pod API update
      • In GC manager, compare image.RepoDigests in addition to image.ID to find a match
    • TODO:
      • check exactly what is returned for each field in cri-o and containerd
      • investigate if we can put together the needed info in image gc manager without CRI/pod API extension
      • extend them if not
  • kannon92: (if time) kubernetes/kubernetes#123247

    • Discovered reason for flake in eviction
    • Summary stats is sometimes failing and the first sort of activePods is ignored
  • ndixita: highlight from Sig Node CI triage meeting (every Wednesday 10AM PST) kubernetes/kubernetes#122905

Feb 6, 2024

Recording: https://www.youtube.com/watch?v=WiYzo_knwfk

Jan 30, 2024

Recording: https://www.youtube.com/watch?v=LLS3qQgQJ6g

Jan 23, 2024

Recording:

Jan 16, 2024

Recording: https://youtu.be/NAIQGQHrlN0

Jan 9, 2024

Recording: https://youtu.be/b5jaZux0qCo

Agenda:

  • [ pehunt ] kubernetes/kubernetes#117793 ownership. 1.30??
    • tzneal to take on, no KEP needed
  • [kannon92] kubernetes/kubernetes#121834 looking for approver
    • Can we consider backporting this?
    • Agreement
  • [rata]: UserNS KEP: beta migration in 1.30?
    • Open a PR to migrate to beta and reach out to gather more feedback
  • [tallclair]: Kubelet config clean up
    • Now that Dynamic Kubelet config is deprecated & removed, can we move the remaining flags into the Kubelet configuration object?
      • Derek: look into whether there are any differences in whether the Kubelet needs to be drained on update
      • Mrunal: Sync with folks working on conf.d
  • [rst0git] Forensic Container Checkpointing:
  • [fromani] proposal to allow kubelet to allow the kubelet to trigger the rescheduling of pods. (redo from 20240102 because too small audience; presented on batch WG mtg on 20240104 ) - expected 5 minutes presentation + time for questions/discussion maybe 10 mins top?
    • Include a security section about restricting the node to unbind only its own pods.
  • [SergeyKanzelev, Harche] kubernetes/kubernetes#122224 are back copat concerns here valid?

Jan 2, 2024

Recording: https://youtu.be/BHGZs2HJMyU
Agenda:

  • [marquiz] QoS resources KEP, call for reviews, blockers from sig-node perspective(?)
  • [fromani] proposal to allow kubelet to allow the kubelet to trigger the rescheduling of pods. Looking for early feedback/possible concerns.
    • spinoff from DRA conversations; beneficial to improve UX with kubelet admission failures
    • will be presented to batch WG/sig-scheduling mtgs