CI: Do not fail fast. #13120

Gacko · 2025-03-29T10:18:47Z

What this PR does / why we need it:

Currently E2E tests for all Kubernetes versions get canceled as soon as E2E tests for one of them fails. Therefore one always needs to re-run 5 jobs instead of only one.

I know, we should rather fix flakes, but this change is particularly useful for such also.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
CVE Report (Scanner found CVE and adding report)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation only

Checklist:

My change requires a change to the documentation.
I have updated the documentation accordingly.
I've read the CONTRIBUTION guide
I have added unit and/or e2e tests to cover my changes.
All new and existing tests passed.

netlify · 2025-03-29T10:19:02Z

✅ Deploy Preview for kubernetes-ingress-nginx canceled.

Name	Link
🔨 Latest commit	`4027f4c`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-ingress-nginx/deploys/67e7c972d486dd0008e133c5

Gacko · 2025-03-29T10:19:12Z

/triage accepted
/kind cleanup
/priority backlog

Gacko · 2025-03-29T10:19:23Z

/cherry-pick release-1.12

k8s-infra-cherrypick-robot · 2025-03-29T10:19:25Z

@Gacko: once the present PR merges, I will cherry-pick it on top of release-1.12 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Gacko · 2025-03-29T10:19:25Z

/cherry-pick release-1.11

k8s-infra-cherrypick-robot · 2025-03-29T10:19:28Z

@Gacko: once the present PR merges, I will cherry-pick it on top of release-1.11 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

rikatz · 2025-03-31T00:09:55Z

@Gacko the reason for it is because the tests may take a very long time to run, even if they fail, and they consume resources that may at some point be exhausted (IIRC there's a limit on Github actions).

Probably what you can do is instead make fail-fast a flag, and on Github Actions CI allow the usage of a label or some test that sets this flags for another runs

EDIT: I was thinking this is also related with fail-fast from e2e ginkgo tests, but still related to why we usually fail fast

Gacko · 2025-03-31T06:26:12Z

E2E Tests mostly failed due to flakes in the ExternalName Service recently. So chances 4 out of 5 runs (5 Kubernetes versions) complete in the end are high.

If we make them fail fast, we always need to re-run all of them, even if 4 out of 5 would have completed.

rikatz · 2025-03-31T09:13:43Z

https://github.com/kubernetes/ingress-nginx/blob/main/test/e2e-image/e2e.sh#L27

This is what I was also talking about.

Gacko · 2025-03-31T09:40:16Z

Sure, I'm aware of that. This is making a single E2E run fail fast and this absolutely makes sense.

But we are spinning up 5 E2E runs per variation at the moment. And sometimes it seems the nip.io backend we are using for ExternalName services is not replying in time, at least DNS requests are timing out.

Normally one E2E run takes around 45 minutes. If one of them fails at 40 minutes, we kill all 5 (one per Kubernetes version we support), even though the other 4 could have completed successfully. With the current behavior you always need to re-run all 5. With my change you can wait til the other 4 complete successfully and only re-trigger one.

So without my change: 200 minutes of GitHub Actions wasted. With my change it's only 40 minutes.

rikatz · 2025-03-31T09:57:19Z

Ok makes sense.

I am leaving the lgtm and hold and you can unhold as you wish

/lgtm
/hold

k8s-ci-robot · 2025-03-31T09:57:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Gacko, rikatz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Gacko]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-infra-cherrypick-robot · 2025-03-31T10:01:26Z

@Gacko: new pull request created: #13130

In response to this:

/cherry-pick release-1.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-infra-cherrypick-robot · 2025-03-31T10:02:02Z

@Gacko: new pull request created: #13131

In response to this:

/cherry-pick release-1.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 29, 2025

k8s-ci-robot requested review from strongjz and tao12345666333 March 29, 2025 10:18

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority labels Mar 29, 2025

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 29, 2025

CI: Do not fail fast.

4027f4c

Gacko force-pushed the zvjgu branch from 0aa4f90 to 4027f4c Compare March 29, 2025 10:20

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2025

k8s-ci-robot assigned rikatz Mar 31, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 31, 2025

Gacko removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2025

k8s-ci-robot merged commit ad9200e into kubernetes:main Mar 31, 2025
23 checks passed

k8s-infra-cherrypick-robot mentioned this pull request Mar 31, 2025

CI: Do not fail fast. #13130

Merged

k8s-infra-cherrypick-robot mentioned this pull request Mar 31, 2025

CI: Do not fail fast. #13131

Merged

Gacko deleted the zvjgu branch March 31, 2025 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: Do not fail fast. #13120

CI: Do not fail fast. #13120

Gacko commented Mar 29, 2025

netlify bot commented Mar 29, 2025 •

edited

Loading

Gacko commented Mar 29, 2025

Gacko commented Mar 29, 2025

k8s-infra-cherrypick-robot commented Mar 29, 2025

Gacko commented Mar 29, 2025

k8s-infra-cherrypick-robot commented Mar 29, 2025

rikatz commented Mar 31, 2025 •

edited

Loading

Gacko commented Mar 31, 2025

rikatz commented Mar 31, 2025

Gacko commented Mar 31, 2025 •

edited

Loading

rikatz commented Mar 31, 2025

k8s-ci-robot commented Mar 31, 2025

k8s-infra-cherrypick-robot commented Mar 31, 2025

k8s-infra-cherrypick-robot commented Mar 31, 2025

CI: Do not fail fast. #13120

CI: Do not fail fast. #13120

Conversation

Gacko commented Mar 29, 2025

What this PR does / why we need it:

Types of changes

Checklist:

netlify bot commented Mar 29, 2025 • edited Loading

✅ Deploy Preview for kubernetes-ingress-nginx canceled.

Gacko commented Mar 29, 2025

Gacko commented Mar 29, 2025

k8s-infra-cherrypick-robot commented Mar 29, 2025

Gacko commented Mar 29, 2025

k8s-infra-cherrypick-robot commented Mar 29, 2025

rikatz commented Mar 31, 2025 • edited Loading

Gacko commented Mar 31, 2025

rikatz commented Mar 31, 2025

Gacko commented Mar 31, 2025 • edited Loading

rikatz commented Mar 31, 2025

k8s-ci-robot commented Mar 31, 2025

k8s-infra-cherrypick-robot commented Mar 31, 2025

k8s-infra-cherrypick-robot commented Mar 31, 2025

netlify bot commented Mar 29, 2025 •

edited

Loading

rikatz commented Mar 31, 2025 •

edited

Loading

Gacko commented Mar 31, 2025 •

edited

Loading