-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dask: correctly handle failure to start workflow #617
Labels
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When starting a new workflow, first the Dask cluster (and autoscaler) are deployed, and then the run-batch pod is started.
If for some reason the worklow start fails after creating the Dask cluster (e.g. when creating the ingress for the Dask dashboard), then the Dask cluster will remain deployed. At the next retry to start the workflow, reana-workflow-controller will try to create the cluster again, but this will fail with code 409 (Conflict).
REANA will then retry to start the workflow indefinitely and always failing due to the Dask cluster being already deployed.
The text was updated successfully, but these errors were encountered: