-
Notifications
You must be signed in to change notification settings - Fork 41.2k
Handle errors when preparing lease for update #119661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle errors when preparing lease for update #119661
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Fri Jul 28 22:12:27 UTC 2023. |
this only modifies the test, it seems WIP |
/retest |
/test pull-kubernetes-e2e-gce |
/retest |
/priority important-soon |
FYI, I've gone through the history and haven't found discussion on why these errors did not block the lease update. Initially added here: https://github.com/kubernetes/kubernetes/pull/70034/files#diff-53e6af5b2caffee205409a79f563db9456102ad124ec18a3de6d98553194c406R194 |
// before every time the lease is created/refreshed(updated). | ||
// Note that an error will block the lease operation. | ||
newLeasePostProcessFunc ProcessLeaseFunc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to make sure changing this assumption is okay.
I only see two usages of this controller, using these newLeasePostProcessFunc
-s:
kube-apiserver
:kubernetes/pkg/controlplane/instance.go
Lines 653 to 682 in cc2f7b3
func labelAPIServerHeartbeatFunc(identity string, peeraddress string) lease.ProcessLeaseFunc { return func(lease *coordinationapiv1.Lease) error { if lease.Labels == nil { lease.Labels = map[string]string{} } if lease.Annotations == nil { lease.Annotations = map[string]string{} } // This label indiciates the identity of the lease object. lease.Labels[IdentityLeaseComponentLabelKey] = identity hostname, err := os.Hostname() if err != nil { return err } // convenience label to easily map a lease object to a specific apiserver lease.Labels[apiv1.LabelHostname] = hostname // Include apiserver network location <ip_port> used by peers to proxy requests between kube-apiservers if utilfeature.DefaultFeatureGate.Enabled(features.UnknownVersionInteroperabilityProxy) { if peeraddress != "" { lease.Annotations[apiv1.AnnotationPeerAdvertiseAddress] = peeraddress } } return nil } } kubelet
:kubernetes/pkg/kubelet/util/nodelease.go
Lines 30 to 54 in cc2f7b3
// SetNodeOwnerFunc helps construct a newLeasePostProcessFunc which sets // a node OwnerReference to the given lease object func SetNodeOwnerFunc(c clientset.Interface, nodeName string) func(lease *coordinationv1.Lease) error { return func(lease *coordinationv1.Lease) error { // Setting owner reference needs node's UID. Note that it is different from // kubelet.nodeRef.UID. When lease is initially created, it is possible that // the connection between master and node is not ready yet. So try to set // owner reference every time when renewing the lease, until successful. if len(lease.OwnerReferences) == 0 { if node, err := c.CoreV1().Nodes().Get(context.TODO(), nodeName, metav1.GetOptions{}); err == nil { lease.OwnerReferences = []metav1.OwnerReference{ { APIVersion: corev1.SchemeGroupVersion.WithKind("Node").Version, Kind: corev1.SchemeGroupVersion.WithKind("Node").Kind, Name: nodeName, UID: node.UID, }, } } else { klog.ErrorS(err, "Failed to get node when trying to set owner ref to the node lease", "node", klog.KRef("", nodeName)) return err } } return nil }
The change doesn't impact kube-apiserver
, so we should be good 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about outside of kube itself? https://grep.app/search?q=coordinationv1.Lease
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current behavior will be always introducing a buggy behavior, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sttts I think we need to look for usages of this controller, not just the API type: https://grep.app/search?q=k8s.io/component-helpers/apimachinery/lease
I only see 1 result (karmada-io/karmada) that isn't k8s or vendored-k8s. It's using this newLeasePostProcessFunc
, which has the same behavior as kubelet's (and so would probably encounter the same bug 😄 ): https://github.com/karmada-io/karmada/blob/e5277b6317ac1a4717f5fac4057caf51a5d248fc/pkg/util/clusterlease.go#L16-L37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This library does have the standard disclaimer that it has no compatibility guarantees, it in direct support of Kubernetes, etc: https://github.com/kubernetes/component-helpers#compatibility
If we're concerned about the subtlety, I could make a breaking cosmetic change in the name or something to flag this at compile time? But I think that'd be more annoying than helpful.
There was a previous attempt to fix this that stalled: #110834 I think the approach taken in that PR has issues. |
/sig node This is apimachinery for reasons, but it's kubelet-critical functionality |
// before every time the lease is created/refreshed(updated). Note that an error will block | ||
// a lease CREATE, causing the controller to retry next time, but an error won't block a | ||
// lease UPDATE. | ||
// before every time the lease is created/refreshed(updated). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an important change in behavior in a public library, another option can be to create a well-known error type and limit the change asserting and that specific error type, and modify the kubelet to return that error type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, putting this code here was contentious in the original PR, because it probably shouldn't be a public library. 😄
I think it's kind of strange for the interface to return an error
but have conditions around when the error is treated as an error. I can't exhaustively verify that this change doesn't break any consumers, but I have verified that it doesn't break the Kubernetes components that utilize it. I can't really think of a scenario in which this would break consumers, because the lease post-process func has no way to distinguish an update from a create (or, at least, this is not expressed in the interface).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fully agree with you, I don't understand it either, if the postProcessFunc generates and error why we should try to update the Lease nevertheless?
I didn't find an explanation in the original pr too #95428
/lgtm I agree with the author that is difficult to understand that if we fail to create a newLease we still want to use it to Update the existing one. Assigning people in the original PR, maybe they have more context https://github.com/kubernetes/kubernetes/pull/119661/files#r1283162922 |
LGTM label has been added. Git tree hash: 03ea88ffdf34a90b80fa022c32aad9abb66198cd
|
/approve This one spans across areas I am happy with the review/engagement it has gotten so far and it looks good to me. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cartermckinnon, dims The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Fixes an issue in which kubelet will remove the owner references from its lease (via an update) if its
Node
does not exist, preventing garbage collection of theLease
.Which issue(s) this PR fixes:
Fixes #109777
More context in dupe: #119660
Special notes for your reviewer:
Background:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: