Description
Is your feature request related to a problem?
We are currently in the middle of migrating from JOSDK 4 to JOSDK 5.
For us it's always crucial that the CustomResource status passed to the main reconciliation loop is the most up-to-date one.
As a best practice, our operator does not manipulate/modify anything in the CustomResource other than the status, which is afterwards used in the next reconciliations to potentially trigger a particular action to act upon based on that status (such as rolling restart of the pods for example).
So what we have just noticed is that when migrating from the JOSDK 4 to JOSDK 5, this is no longer the case as we have experienced so far that sometimes this CustomResource status is outdated. Apparently there might be some informers that triggered the reconciliation loop with an old version of the CustomResource status in the cache.
And it seems that this was handled and cached internally in the JOSDK 4 here, but got removed in the JOSDK 5.
Describe the solution you'd like
I guess, there might be a couple of possible options here:
- JOSDK 5 to provide with an new API option so we can for instance decide whether we want to update the underlying primary cache with the most up-to-date CustomResource status. For example
UpdateControl.forceCacheUpdate().patchStatus(platform); <-- only updates the status in cache
UpdateControl.forceCacheUpdate().patchResource(platform); <-- only updates the spec in cache
UpdateControl.forceCacheUpdate().patchResourceAndStatus(platform) <-- updates both in cache
- An option to enable an internal caching mechanism similarly to what was done in the JOSDK 4.
Describe alternatives you've considered
Since our operator only modify the CustomResource status, at the moment we are just updating the primary cache with the CustomResource status change in every reconciliation loop (nothing else, no Metadata, no Spec change updated in this cache).
private static void updateCache(
MyCustomResource primary,
Context<MyCustomResource> context) {
final var cache = context.getPrimaryCache();
cache.get(new ResourceID(primary.getMetadata().getName(), primary.getMetadata().getNamespace()))
.ifPresent(r -> r.setStatus(primary.getStatus()));
}
Additional context
N/A.
Activity
[-]Provide support for keeping the most up-to-date CustomResource status passed in each reconciliation loop[/-][+]Provide support for keeping the most updated CustomResource status passed in each reconciliation loop[/+][-]Provide support for keeping the most updated CustomResource status passed in each reconciliation loop[/-][+]Either keep the most updated CR status passed in each reconciliation loop or allow to update the primary cache with updated CR status[/+][-]Either keep the most updated CR status passed in each reconciliation loop or allow to update the primary cache with updated CR status[/-][+]Most updated CustomResource status must be passed in each reconciliation loop[/+]csviri commentedon Apr 14, 2025
Hi @afalhambra-hivemq , thx for creating an issue for this, it is already a duplicate, see issue for this: #2746
Just some remarks:
UpdateControl.updateStatus
, that will specially for this case, to usePUT
/ Update method of Kubernetes API with optimistic locking. In this case, it is possible to do an overlay cache and pass the resource to the reconcile on the next reconciliation. We will also do some additional examples and supporting classes to showcase how to do status caching explicitly.Something like this:
UpdateControl.forceCacheUpdate().patchStatus(platform);
is not possible in general. Having properly and deterministically caching resources (without parsing theresourceVersion
, thus bending the contract of k8s API) is not possible, and I would not like to bake it to the core.see also: https://javaoperatorsdk.io/docs/faq/#im-managing-an-external-resource-that-has-a-generated-id-where-should-i-store-that
csviri commentedon Apr 14, 2025
Also feel free to join our community meeting tomorrow, 15:00 CEST where we will discuss this topic.
afalhambra-hivemq commentedon Apr 14, 2025
Thanks @csviri - I will join. Will that be via Zoom, if not mistaken, as documented here
csviri commentedon Apr 14, 2025
Exactly
afalhambra-hivemq commentedon Apr 14, 2025
Using optimistic locking is already problematic for us. We have an edge case, with a race condition where the CR is updated at the same time twice, so an optimistic locking will fail straight away because of the resourceVersion being different.
If we disable the optimistic locking then the patching works fine for this race condition.
If I'm not mistaken, the default retry mechanism didn't work here either, cause the status of the CR there was outdated as well but I need to double check this.
csviri commentedon Apr 14, 2025
You can alyway store the response form the first update, and do the second update using that resource.
afalhambra-hivemq commentedon Apr 14, 2025
I guess, this would mean to add a custom retry mechanism in place for that? Haven't checked out further in detail to see how this is done by the default retry mechanism by the JOSDK.