-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restic mover job should fail if restic repo is locked #1429
Comments
I want to add to this that this issue is even worse. The sizing issue was my mistake, but it shouldn't try to repeatedly restart and burn through my S3 funds either. |
Yeah it also feels like backoffLimit should be set to 0 (or at least configurable), if it fails the first time I highly doubt it will ever succeed. |
Yeah maybe 1 retry by default is excusable, but certainly with the S3 backend users can get into BOATLOADS of issues financially if things go horribly wrong. |
Currently the design of volsync is essentially very kubernetes centric, preferring to retry until things work. If we just fail the job then we have no chance to retry for even something like a network hiccup. If we could detect specific errors from restic, this is an interesting idea however. If we did something it would definitely need to be an opt-in feature. One issue is that simply stopping the job from re-trying will not solve the scheduling issue, as volsync will not schedule until the previous synchronization has completed (i.e. the job has completed). VolSync will still reschedule the job after the backupLimit is hit. |
@tesshuflower I agree, a few retries is not a bad thing. |
Yeah, my S3 bill is the way I noticed the whole problem after the fact.. |
That's good to hear and I am glad you agree. There's no point in retrying if we know from restic exit codes if the job will never succeed. Hopefully there can be some improvements in this area. |
Same here, Luckily for me it wasn't the bill but it was the quotum warning |
Important
This request requires restic 0.17.0 for the unlock status code and Kubernetes 1.31 for the stable
podFailurePolicy
feature.Restic mover should not try over and over to backup if the repo is locked, we could utilize the pod failure policy feature and restics exit codes to achieve this.
That should mark the job as failed and not be retried until the next volsync schedule.
The current behavior is the same job will retry over and over until the backoffLimit is reached on a locked restic repo.
Originally posted by @onedr0p in #1415
The text was updated successfully, but these errors were encountered: