Hosts:
Tests:
Bugs triaging
Agenda:
- Test-Infra Cleanup
- Dropped NodeSpecialFeature / NodeAlphaFeature
- Fall out: kubernetes/test-infra#33996
- Aiming to deprecate NodeFeature by replicating with NodeFeature
- Dropped NodeSpecialFeature / NodeAlphaFeature
Hosts:
Tests:
Bugs triaging:
Agenda:
- https://github.com/kubernetes/enhancements/tree/master/keps/sig-testing/3041-node-conformance-and-features#goals
- kubernetes/kubernetes#128923
- kubernetes/test-infra#33828
- TODO:
- Nodefeature to feature
- Convert to label filter instead of using skip/focus
- kubernetes/kubernetes#128880
- kubernetes/kubernetes#128889
Action items:
- Follow up on kubernetes/test-infra#32567
Cancelled due to U.S. Holiday
Hosts:
Tests: Kevin
Bugs triaging:
Agenda:
Need to create tickets for this
- Swap Tests are bonked. Need to create a ticket to investigate.
- Huge Page Test Failures
- Device Plugin
Hosts:
Tests:
Bugs triaging:
Agenda:
- [anish] Why are evictions / cpu,memory,topology managers e2e tests not considered release blocking? Even though these are stable features?
Hosts:
Tests:
Bugs triaging:
Agenda:
Recording: https://www.youtube.com/watch?v=Y8bUTXy3FGs
Hosts:
Tests:
Bugs triaging:
Agenda:
- Combine sig node/sig node CI board?
- Originally it was separated to onboard new members to be able to do reviews without needing to worry about production code
- Generally, this meeting should be focused on CI, so maybe defer PR triage
- Add a special label to PRs, when it’s present remove from one board/add it to the other
- CRI proxy PR merged, now more tests can be added to test different CRI scenarios
- kubernetes/kubernetes#127495
- How it can be used: kubernetes/kubernetes#121604
Recording: https://www.youtube.com/watch?v=Y8bUTXy3FGs
Hosts:
Tests:
Bugs triaging:
Agenda:
- [KubeTest] Migration to kubetest
- kubernetes/test-infra#32567
- Presubmits first
- A lot of problems
- https://github.com/elieser1101 is owner for this
- EventedPleg: kubernetes/test-infra#33666
Recording: https://www.youtube.com/watch?v=GKrlW1LDXz0
Hosts:
Tests: Anish
Bugs triaging:
Agenda:
Oct 2
Recording: https://www.youtube.com/watch?v=80260g3EEv8
Sep 25, 2024
Recording: https://www.youtube.com/watch?v=gcX6sDoibM4
Agenda:
- kubernetes/kubernetes#127610
- [ffromani] (can attend only 1st half) Chicken and egg: kubernetes/kubernetes#120661 and kubernetes/kubernetes#127506
- CRI proxy work started: kubernetes/kubernetes#127495 Mostly FYI
Sep 18, 2024
Recording: https://www.youtube.com/watch?v=Y6slxvO6Hv8
Hosts:
Tests:
Bugs triage:
Agenda:
- Another ping about the flake: https://kubernetes.slack.com/archives/C0BP8PW9G/p1726622928011249?thread_ts=1718369837.055379&cid=C0BP8PW9G
- CRI proxy
- Injecting failures is a good idea
- If easy to set up - maybe set up everywhere. If not - let’s only do it per test
- The main concern to not leak tests into each other
Sep 11, 2024
Recording: https://www.youtube.com/watch?v=vmH-6iWjWPM
Hosts:
Tests:
Bugs triaging:
Agenda:
Sep 4, 2024
Recording: https://www.youtube.com/watch?v=1DiDFkYhpi4
Hosts:
Tests:
Bugs triaging: anish
Agenda:
Aug 28, 2024
Recording: https://www.youtube.com/watch?v=tE4uO6Gj4sM
Hosts:
Tests:
Bugs triaging:
Agenda:
- [anish] will join in the second half of the meeting. Taking a shot at deflaking eviction tests - kubernetes/kubernetes#123591
- Cadvisor cache seems to be in sync
Aug 21, 2024
Recording: https://www.youtube.com/watch?v=EgqFB0PDb0g
Aug 7, 2024
Recording: https://www.youtube.com/watch?v=g6U40nR_tRU
Hosts:
Tests:
Bugs triaging:
Agenda:
- Kevinn for approver: kubernetes/test-infra#33255
- Add tests lanes for 1.31 - Kevin will take it
- Sergey: add workflows to auto-populate issues
Jul 31, 2024
Recording: https://www.youtube.com/watch?v=yfSd6ezXWIs
Hosts:
Tests: Peter
Bugs triaging: Anish
Agenda:
- [harche] - are we identifying cgroup v1 and v2 specific CI jobs?
Jul 24, 2024
Recording: https://www.youtube.com/watch?v=Wz0Dzo_f4jg
Hosts:
Tests: Kevin
Bugs triaging: Peter
Agenda:
- AI: migrate to the new project boards.
- AI: ask about perf dashboard
- [harche] - kubernetes/kubernetes#125720
- Also: kubernetes/kubernetes#125409
- Also potential fix for the behavior : https://biriukov.dev/docs/page-cache/6-cgroup-v2-and-page-cache/#writeback-and-io
Jul 17, 2024
Jul 10, 2024
Recording: https://www.youtube.com/watch?v=see5mwuN0YA
Host:
Tests:
Bugs triaging: anish
Agenda:
- [anishshah] New tests failing:
- Podspidlimit - kubernetes/kubernetes#126007
- OOMKiller tests in EC2 jobs - kubernetes/kubernetes#126009
- Due to testsuite timeout - kubernetes/kubernetes#126008
- [sotiris] kubernetes/kubernetes#124296
- [Sergey] Device plugin failure injection tests: kubernetes/kubernetes#125753
Jul 3, 2024 [Cancelled]
Host:
Agenda:
Jun 26, 2024
Recording: https://www.youtube.com/watch?v=Cn-1k0U1kGw
Agenda:
- https://kubernetes.slack.com/archives/C0BP8PW9G/p1718369837055379
- [fromani] Annotating pod to detect leftovers: kubernetes/kubernetes#125434
- Driven by: kubernetes/kubernetes#123468 (PTAL!)
- [alex] Test guidance compliance work
- [Sotiris] Can we do triage for kubernetes/test-infra#32765
E2eNode Suite.[It] [sig-node] CriticalPod [Serial] [Disruptive] [NodeFeature:CriticalPod] when we need to admit a critical pod should add DisruptionTarget condition to the preempted pod [NodeFeature:PodDisruptionConditions]
E2eNode Suite.[It] [sig-node] CriticalPod [Serial] [Disruptive] [NodeFeature:CriticalPod] when we need to admit a critical pod should be able to create and delete a critical pod
E2eNode Suite.[It] [sig-node] MirrorPodWithGracePeriod when create a mirror pod and the container runtime is temporarily down during pod termination [NodeConformance] [Serial] [Disruptive] the mirror pod should terminate successfully
https://testgrid.k8s.io/sig-node-kubelet#kubelet-gce-e2e-swap-ubuntu-serial
https://testgrid.k8s.io/sig-node-containerd#pull-e2e-serial-ec2-canary
Not working
https://testgrid.k8s.io/sig-node-cri-o#node-kubelet-cgroupv1-serial-crio
Not working
https://testgrid.k8s.io/sig-node-cri-o#pr-node-kubelet-serial-crio-cgroupv2
https://testgrid.k8s.io/sig-node-presubmits#pr-node-kubelet-serial-containerd
Should not run the NodeSwap
Jun 19, 2024 [Cancelled for holidays]
Jun 12, 2024 [Cancelled for KEP freeze reviews]
Hosts:
Tests:
Bugs triaging:
Agenda:
Follow up Items:
Jun 5, 2024
Recording: https://www.youtube.com/watch?v=L1rXfz5pJgQ
Hosts:
Tests: Anish
Bugs triage: Peter Hunt
Agenda:
- Release blocking?:
May 29, 2024
Recording: https://www.youtube.com/watch?v=PSq4VpMSlQ0
Hosts:
Tests:
Bugs triaging:
Agenda:
[follow-up] https://testgrid.k8s.io/sig-node-containerd#ci-cgroupv1-containerd-node-arm64-e2e-serial-ec2-eksFiled kubernetes/kubernetes#125173Looks like swap feature was enabled in cgroupv1 jobs but it is cgroupv2 only feature?
- Help to review kubernetes/kubernetes#124617
Follow up:
-
This should have been done: https://testgrid.k8s.io/sig-node-release-blocking#node-kubelet-serial-containerd but still failing
-
Follow up on sidecar meeting: E2eNode Suite.[It] [sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle should terminate sidecars simultaneously if prestop doesn't exit
-
-
The test is broken completely: https://testgrid.k8s.io/sig-node-containerd#cos-cgroupv1-containerd-node-e2e-serial&width=5
-
Was green with no tests, now failing with timeout: https://testgrid.k8s.io/sig-node-cri-o#node-kubelet-cgroupv1-serial-crio&width=5
May 22, 2024
Recording: https://www.youtube.com/watch?v=rPoI3HrTkiM
Hosts:
Host: Peter
Bugs triaging: Peter
Agenda:
- kubernetes/kubernetes#125027 could use approval
- kubernetes/kubernetes#124743 still failing, need to bump cri-o version
- https://testgrid.k8s.io/sig-node-containerd#ci-cgroupv1-containerd-node-arm64-e2e-serial-ec2-eks Has additional failures, needs an issue
Follow up Items:
- [Sergey] https://kubernetes.slack.com/archives/C0BP8PW9G/p1716308390271449
- Looks like something we changed for sidecars
- Matthyx was planning to work on a fix
May 15, 2024
- No meeting, no items on the agenda.
May 8, 2024
Recording:https://www.youtube.com/watch?v=ZlL0yVKJ_o8
Hosts:
Host: Peter
Bugs triaging: Dixita
Agenda:
Follow up Items:
- [Peter] Open an issue for https://testgrid.k8s.io/sig-node-kubelet#kubelet-gce-e2e-swap-fedora failures
- [Dixita] Memory usage beyond node allocatable tests failing again: kubernetes/kubernetes#120646
- kubernetes/kubernetes#124345 follow up with Swati and Francesco
May 1, 2024
Recording: https://www.youtube.com/watch?v=h89s_z-YmIU
Hosts:
Host:
Bugs triaging: Anish
Agenda:
Follow up Items:
Apr 24, 2024
Recording: https://www.youtube.com/watch?v=MlltvJWa1so
Hosts:
Host: Sergey
Bugs triaging: Anish
Agenda:
Apr 17, 2024
Recording: https://www.youtube.com/watch?v=MhMZJvLx3sg
Hosts:
Host: Sergey
Bugs triaging: Anish
Agenda:
-
[Sotiris] Test PR needing approval
-
[Kevin Hannon] NVIDIA K80 out of support in May
-
[Anish] kubernetes/kubernetes#116965
- IIUC, pod status is not updated during graceful node shutdown. Does anyone have historical context on why the pod status is not updated?
- Ryan to reply on issue to explain the expected behavior part of this behavior
- [Ed] ideally we need to extend the e2e test.
- [ryan] kubelet must be killed before networking is shut down
Followup
-
[Sotiris] Seems worth it to Improve cpu manager tests coverage, kubernetes/kubernetes#100145 . What do you think? How should we proceed with this?
\- \[anishshah\] \- v1.30 release report
- github.com/AnishShah/sig-node-flaky-tests/tree/main
- 22/249 sig-node release blocking tests are flaky.
Apr 10, 2024
Recording: https://www.youtube.com/watch?v=NUzCEC4WuL0
Hosts:
Host: ndixita
Bugs triaging: Peter Hunt
Agenda:
-
[kannon92] Test PRs needing approval
-
Cgroup v2 crio jobs
- Deprecating cgroup v1 means that we should have 1on1 coverage for cgroup v1 and cgroup v2
- Add corresponding cgroups v2 for node-crio-e2e-features and node-crio-flaky
- crio huge pages cgroup v2
- Resource managers crio cgroup v2
-
[Ed] Can we consider triaging SIG-Node PRs in this meeting?
Followup
- Check which tests need to have coverage for cgroupv2
- Consider Sig Node PRs triaging : maybe once per month?
- kubernetes/kubernetes#124220
- Sig node : kubernetes/kubernetes#124229
Apr 3, 2024
Recording: https://www.youtube.com/watch?v=wZeHdf3PtMQ
Hosts:
Host: skanzhelev
Bugs triaging: anish
Agenda:
- Bugs dashboard
- [ndixita] Questions to get some context to help deflake the tests and cleanup
- kubernetes/test-infra#32271 why did we remove manager jobs from serial tests
- Duplicate coverage so fine to remove
- Presubmit and periodics are already running these tests
- Kubeadm version skew tests in sig-node-kubelet POC
- The tests are in sig-cluster-lifecycle and sig-node. Send a PR to remove them from sig-node?
- https://github.com/kubernetes/test-infra/blob/d3f9ee6f4d5b185a7b784533d6a36fab9c8409dc/config/jobs/kubernetes/sig-cluster-lifecycle/kubeadm-kinder-kubelet-x-on-y.yaml#L356
- Swap serial tests are flaky while parallel are not
- ideally effort needs to be put to deflake these tests
- History of node-kubernetes-containerd-flaky dashboard - https://testgrid.k8s.io/sig-node-containerd#node-kubelet-containerd-flaky ?
- kubernetes/test-infra#32271 why did we remove manager jobs from serial tests
Follow up Items:
Mar 27, 2024
Recording: https://www.youtube.com/watch?v=fkWV_mqcZzs
Hosts:
Host:
Bugs triaging: peter hunt
-
[SergeyKanzhelev] kubernetes/kubernetes#124009 (comment)
-
sig-node CI v1.30 release report
- [Flaking Test] [sig-node] ☂️ node-kubelet-serial-containerd job multiple flakes🌂 · Issue #120913
- These tests are flaky in these dashboards:
- [sig-node-release-blocking][node-kubelet-serial-containerd]
- [sig-node-kubelet]
- [sig-node-containerd]
- [sig-node-crio]
- Manager’s tests - lets remove them from Serial lane
- kubernetes/test-infra#32271
- Check CI jobs are working for managers
-
- kubernetes/kubernetes#123908
- Lets wait till branch will reopen
Mar 20, 2024
- Canceled due to Kubecon Week
Mar 13, 2024
Recording: https://www.youtube.com/watch?v=itj3vxg23nk
Hosts:
Host: Dixita (Dixi)
Bugs triaging: Anish
Agenda:
- [Dixi] Removing huge pages from allocatable/capacity kubernetes/kubernetes#119173
- Bugs with no priority
- Seeking help to debug Serial crio jobs failures
- kubernetes/kubernetes#123908 (from Sotiris)
Follow up Items:
- Talk about the kubernetes/kubernetes#119173 in Sig node.
- Why /proc/meminfo used to report capacity?
- Change to priority/important-soon after assessing the impact.
Recording: https://www.youtube.com/watch?v=wCoCEAQqMOY
Hosts:
Host: Dixita
Bugs triaging: Anish
Agenda:
- Serial Jobs Failures
- OOM
- kubernetes/kubernetes#123589
- Jobs are OOMing due to dd oom.
- They also run twice
- OOM
- [harche] - kubernetes/kubernetes#123027 (comment)
- Not sure if this is really a bug.
- Follow-up from last week:
- Triaging bugs since we close to 1.30 code freeze:
- Bugs with critical-urgent priority.
- Bugs with important-soon priority.
- Bugs with no priority labels and no owner.
Follow up Items:
Recording: https://www.youtube.com/watch?v=2fqfYwYwRkk
Hosts:
- Tests: ndixita
- Bugs: anish
Agenda:
- [esotsal]
- ndixita@ kubernetes/kubernetes#123313 : [Failing test] pull-kubernetes-local-e2e
- PR kubernetes/test-infra#32025
- [pehunt] quick review request kubernetes/test-infra#32096
Follow up
- ndixita@
- https://testgrid.k8s.io/sig-node-kubelet#kubelet-gce-e2e-swap-ubuntu-serial
- OOM killer test
- Reach out to Ed: https://testgrid.k8s.io/sig-node-containerd#e2e-cos-device-plugin-gpu
- kubernetes/kubernetes#123491
- Create an issue if it doesn’t exist: https://testgrid.k8s.io/sig-node-cri-o#node-kubelet-serial-crio
- Prioritize: Eviction tests: pid returning 0 process count issues: Find related issues
- David Porter: kubernetes/kubernetes#123369
- kubernetes/test-infra#32031 : Add the labels doc
Recording: https://www.youtube.com/watch?v=e4eRbWiIPN4
Hosts:
- Tests: ndixita
- Bugs: ndixita
Agenda:
- [kevin] Few Prs to review/approve
- [ndixita]
- Ubuntu-test-e2e failures: dims@: WIP kubernetes/kubernetes#123236
- Follow up with sig testing infra
https://testgrid.k8s.io/sig-node-containerd#cos-cgroupv1-containerd-node-e2e-serial
https://testgrid.k8s.io/sig-node-containerd#node-e2e-features
@ndixita: I don’t think it’s a test-infra issue anymore. Both tests look flaky, but they’re not failing because of test-infra misconfiguration anymore. That issue seems to be fixed starting Feb 15. - Bugs follow up
kubernetes/kubernetes#122903: Do we provide support for forked repos?
Todd Neal: kubernetes/kubernetes#122902 find and assign
- [esotsal]
- ndixita@ kubernetes/kubernetes#123313 : [Failing test] pull-kubernetes-local-e2e
Recording: https://www.youtube.com/watch?v=g2PgAwVXHwA
Hosts:
- Tests: ndixita
- Bugs: ndixita
Agenda:
- [chris] Adding prow jobs for e2e tests with containerd v2.0
-
(ndixita): RC already shipped, in March release
-
Issues: can’t start using containerd straightaway
-
New features require containerdv2.0 so better to add new test tabs and have both old and new versions running
-
Ndixita Sign node release testing registry related failure: follow up with sig testing infra
-
https://testgrid.k8s.io/sig-node-containerd#node-e2e-features
-
Ed Bartosh: Device plugin test https://testgrid.k8s.io/sig-node-kubelet#kubelet-gce-e2e-swap-fedora-serial
-
- Ndixita kubernetes/kubernetes#122903
- kubernetes/kubernetes#122902 find and assign
Recording: https://www.youtube.com/watch?v=3sCGp_3uU2k
Hosts:
- Tests: ndixita
- Bugs: ndixita
Agenda:
- [Sergey] Are there serial tests for e2e node? Question is from Sidecar WG meeting
- e2e/node: missing tests need to be added
- Check https://testgrid.k8s.io/sig-node-containerd#cos-cgroupv1-inplace-pod-resize-containerd-e2e-serial
- [Ed] GPUDevicePlugin: which tests are targeted with this feature
- File a bug for testgrid failure
- Make test as flaky and move to less important tab
- https://testgrid.k8s.io/sig-node-containerd#cos-cgroupv1-containerd-e2e
- Oom killer tests failing forever
- https://testgrid.k8s.io/sig-node-containerd#cos-cgroupv1-inplace-pod-resize-containerd-e2e-serial
- https://testgrid.k8s.io/sig-node-containerd#cos-cgroupv2-containerd-e2e
- Kevin: https://testgrid.k8s.io/sig-node-containerd#node-kubelet-containerd-eviction
- [ndixita] confirm if Device plugin GA feature doesn’t have periodic jobs
- Graceful nodes shutdown don’t work with daemonsets
- Sig node bugs to discuss
Recording: https://www.youtube.com/watch?v=i92MuHisqUw
Hosts:
- Tests: Sergey
- Bugs: Sergey
Agenda:
- [Kevin] PodReadyToStartContainers e2e test PR looking for approval
- [Kevin] Crio-cgroupv2 adding to release-informing
- [Kevin] ImageFs e2e tests: kubernetes/kubernetes#121832
- Running using gcp instance (remote=True) fine
- CI has node failure with SoftEviction
Test grid:
- https://testgrid.k8s.io/sig-node-containerd#containerd-e2e-ubuntu
- https://testgrid.k8s.io/sig-node-containerd#cos-cgroupv1-containerd-e2e
- https://testgrid.k8s.io/sig-node-cri-o#node-kubelet-serial-crio
- kubernetes/kubernetes#122828
Recording: https://youtu.be/A3y__Ivvo1c
Hosts:
- Tests: tzneal
- Bugs: peter hunt
Agenda:
- Adding CI tests for separate container runtime filesystem and split filesystem
- Debugging kubernetes/kubernetes#121832 has become quite difficult due to hard coding DiskPressure
- Have kubernetes/test-infra#31638 to help (needs review/approver). Will clean up once I finish debugging
- Added a presubmit for split disk work
- Debugging kubernetes/kubernetes#121832 has become quite difficult due to hard coding DiskPressure
- [harche] Should alpha-features blocking test skip Evented PLEG feature temporarily? kubernetes/kubernetes#122721 (comment)
- Tzneal - investigate single group OOM kill failure at https://testgrid.k8s.io/sig-node-containerd#cos-cgroupv1-containerd-node-e2e-serial
- Tzneal - ask sig testing how to cleanup the test grid from old removed test suites
- Kevin
Agenda:
- [swsehgal] Looking for some help in promoting sample-device-plugin image
- kubernetes/kubernetes#118534 was merged a while back but the sample device plugin image is still not promoted.
- In the past, I had promoted the image (kubernetes/k8s.io#4862) but since then we have transitioned to registry.k8s.io so not sure not to obtain the sha of the image corresponding to the latest version of sample-device-plugin.
- Has anyone promoted a test image recently?
Recording: https://youtu.be/nw5IhScZGEY
Hosts:
- Tests: Sergey
- Bugs: Sergey