Are multiple GRPC /DeleteVolume being issued? CSI container has many errors. #1154

reefland · 2024-03-04T21:54:50Z

Describe the bug
Using Rook-Ceph v1.13.5 and I have a storage class dedicated to volsync usage. While volsync appears to be working correctly - creating PVCs, snapshots, restic backups to S3 and pruning old snapshots, I'm getting a lot of error messages in Ceph csi-rbdplugin-provsioner pod, csi-rbdplugin container like this:

E0304 20:00:19.186514       1 omap.go:79] ID: 284 Req-ID: 0001-0009-rook-ceph-000000000000000d-c093e953-1fe4-4682-a9e7-7ed660c24b8e omap not found (pool="csi-ceph-blockpool", namespace="", name="csi.volume.c093e953-1fe4-4682-a9e7-7ed660c24b8e"): rados: ret=-2, No such file or directory
W0304 20:00:19.186545       1 voljournal.go:729] ID: 284 Req-ID: 0001-0009-rook-ceph-000000000000000d-c093e953-1fe4-4682-a9e7-7ed660c24b8e unable to read omap keys: pool or key missing: key not found: rados: ret=-2, No such file or directory
E0304 20:00:19.190568       1 rbd_journal.go:689] ID: 284 Req-ID: 0001-0009-rook-ceph-000000000000000d-c093e953-1fe4-4682-a9e7-7ed660c24b8e failed to get image id csi-ceph-blockpool/csi-vol-c093e953-1fe4-4682-a9e7-7ed660c24b8e: image not found: RBD image not found

I do not see any errors or warnings in the volsync logs.

Steps to reproduce
Watch the csi-rbdplugin container logs when the volsync schedule is triggered.

Expected behavior
Not expecting error messages in the container logs.

Actual results
Everything appears to be working fine even with the error messages. As a test, I delete the entire namespace of a test application, and ArgoCD is able to rebuilt it from GitHub & Volsync creates & populates the PVC. Application is back on-line with data.

Additional context
I asked about these messages in the Rook-Ceph discussion area, and they suggest that multiple GRPC /DeleteVolume are being issued (by volsync) rook/rook#13851 Is this expected behavior?

I also asked other volsync users with rook if they see these error messages logged, and they do. I can ask them to chime in here if that would be helpful.

The text was updated successfully, but these errors were encountered:

onedr0p · 2024-03-04T22:04:42Z

I can also confirm I see these logs in the csi-rbdplugin-provsioner pod / csi-rbdplugin container

tesshuflower · 2024-03-05T14:22:48Z

VolSync doesn't call to /DeleteVolume directly, it should be the CSI external provisioner that does that. However, VolSync does create & delete PVC resources, which should then prompt the external provisioner to call a /DeleteVolume.

VolSync runs a controller-client DeleteAllOf() with label selectors to delete pvcs that were created temporarily, so potentially with caching and multiple reconciles could invoke multiple deletes on the same resource. Multiple reconciles is pretty normal, and when it gets to the cleanup step you'll see something like this in the logs: deleting temporary objects. I guess I'm not too concerned about this, as we're trying to get to eventual consistency, and the main thing is that the temporary object is in fact deleted.

We could possibly look at seeing if we're requeuing too often, but I don't think we can eliminate this entirely. Another way would be to do lookups and delete PVCs individually so that we're sure the client cache is updated, but this seems like unnecessary overhead.

I'm also not sure what role the exernal-provisioner takes in all this, and if it's really due to multiple deleteAllOf() calls on pvcs or not.

You could check if you can recreate the same behaviour by doing a single delete on a PVC, or doing kubectl deletes with a label selector multiple times.

reefland added the bug Something isn't working label Mar 4, 2024

JohnStrunk added this to VolSync project tracking Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are multiple GRPC /DeleteVolume being issued? CSI container has many errors. #1154

Are multiple GRPC /DeleteVolume being issued? CSI container has many errors. #1154

reefland commented Mar 4, 2024

onedr0p commented Mar 4, 2024

tesshuflower commented Mar 5, 2024

Are multiple GRPC /DeleteVolume being issued? CSI container has many errors. #1154

Are multiple GRPC /DeleteVolume being issued? CSI container has many errors. #1154

Comments

reefland commented Mar 4, 2024

onedr0p commented Mar 4, 2024

tesshuflower commented Mar 5, 2024