Skip to content

Trigger primary Custom Resource delete from managed Dependent Resource #1896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rguillens opened this issue May 10, 2023 · 8 comments
Closed

Comments

@rguillens
Copy link

rguillens commented May 10, 2023

Feature request

I have built a Kubernetes Operator based in Java Operator SDK v4.3.1 that handles a primary custom resource and several managed dependent resources.
One of the dependent resources is a Pod running a "critical" process.
If somehow (user/application/crash) this dependent resource is deleted externally, the primary custom resource should be marked for deletion.

What did you do?

This is an implementation example to describe the scenario:

@ControllerConfiguration(
    name = "myresourcereconciler",
    dependents = {
        @Dependent(
            name = "configmapmependentresource",
            type = ConfigMapDependentResource.class,
            reconcilePrecondition = ConfigMapReconcileCondition.class
        ),
        @Dependent(
            name = "criticalpoddependentresource",
            type = CriticalPodDependentResource.class
        ),
        @Dependent(
            name = "servicedependentresource",
            type = ServiceDependentResource.class
        )
    }
)
public class MyResourceReconciler implements Reconciler<MyResource>, Cleaner<MyResource> {
    
    @Override
    public UpdateControl<MyResource> reconcile(MyResource resource, Context<MyResource> context) throws Exception {
        //Reconcile implementation
        return updatedResourceStatus != null ? UpdateControl.patchStatus(updatedResource) : UpdateControl.noUpdate();
    }
    
    @Override
    public DeleteControl cleanup(MyResource resource, Context<MyResource> context) {
        //Cleanup implementation
        return DeleteControl.defaultDelete();
    }
}

...

@KubernetesDependent(labelSelector = MyResource .LABEL_SELECTOR)
public class CriticalPodDependentResource extends CRUDKubernetesDependentResource<Pod, MyResource > {
    
    @Override
    protected Pod desired(MyResource primary, Context<MyResource> context) {
        // Desired Pod creation
        return pod;
    }
    
    @Override
    public void delete(MyResource primary, Context<MyResource> context) {
        //Expected this operation to be called on dependent resource external delete event
        context.getClient().resource(primary).delete();
    }
}

What did you expect to see?

CriticalPodDependentResource.delete() operation called on dependent resource external deletion.

What did you see instead? Under which circumstances?

After deleting the critical dependent resource externally, the primary custom resource reconcile operation is triggered, and it was kind of expected...

@csviri
Copy link
Collaborator

csviri commented May 11, 2023

Hi @rguillens ,

there are multiple things here:

  1. Deleted is called only if a precondition not holds on a DR or the whole Workflow is being cleaned up (thus the custom resource is being deleted). If a resource deleted by someone else the it is not called it is just reconciled and re-created.
  2. There is other aspect of this, how do you know if the resource been already created before. So for example the reconciliation starts and even it creates config map and service DRs, but suddenly the process/pod terminates there will be 2 resources, but not the pod. So as next step the operator starts how do you know, if the pod was there and deleted or just was not created?
    This means you need to store some state somewhere, that the pod was already created. See how state is supported in DR: https://javaoperatorsdk.io/docs/dependent-resources#external-state-tracking-dependent-resources although this is not necessarily your case.

So how I would solve this:

  • store the state after the pod is created
  • before the workflow reconciled check if the pod exists, if not but there is the state it was created before simply call delete on the primary custom resource using the client, and exit the reconciliation. Currently you can do this just by standalone workflows.

Note that some teams store the state in the status (like a flag that it was created), however this has a caveat, if it is in status it might not be present in next reconciliation (cache out of sync) is some rare cases. Therefore they also manage an in memory cache about the status that always has the latest version. Pls study how it is implemented in external state DR, you can easily manage this state correctly with a config map.

@csviri
Copy link
Collaborator

csviri commented May 11, 2023

created issue that will make allow this to cover using the managed workflows: #1898

@rguillens
Copy link
Author

rguillens commented May 11, 2023

Thanks, @csviri for your recommendations

Actually, I do store some state related to some of the DRs and I'm sure when to delete the CR if something happens with a "critical" DR. This state is also reflected in the status at some point, but the state management don't rely on the CR status. Using something like a ConfigMap is way much better option to store this state in my scenario, as stated in:
https://javaoperatorsdk.io/docs/dependent-resources#external-state-tracking-dependent-resources

I was also looking into the KubernetesDepentent annotation implementation and usages, I think this might be a good place to customize the DR reconcile lifecycle.

@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Jul 11, 2023
@openshift-ci openshift-ci bot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed stale labels Jul 12, 2023
@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Sep 10, 2023
@openshift-ci openshift-ci bot removed the stale label Sep 20, 2023
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Nov 20, 2023
@openshift-ci openshift-ci bot removed the stale label Nov 22, 2023
@metacosm metacosm removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2024
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Mar 12, 2024
@csviri
Copy link
Collaborator

csviri commented Mar 12, 2024

As far I can see the explicit invocation will cover this. Feel free to reopen if not.

@csviri csviri closed this as completed Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants