Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dask: correctly handle failure to start workflow #617

Open
mdonadoni opened this issue Nov 27, 2024 · 0 comments · May be fixed by #618
Open

dask: correctly handle failure to start workflow #617

mdonadoni opened this issue Nov 27, 2024 · 0 comments · May be fixed by #618

Comments

@mdonadoni
Copy link
Member

mdonadoni commented Nov 27, 2024

When starting a new workflow, first the Dask cluster (and autoscaler) are deployed, and then the run-batch pod is started.

If for some reason the worklow start fails after creating the Dask cluster (e.g. when creating the ingress for the Dask dashboard), then the Dask cluster will remain deployed. At the next retry to start the workflow, reana-workflow-controller will try to create the cluster again, but this will fail with code 409 (Conflict).

REANA will then retry to start the workflow indefinitely and always failing due to the Dask cluster being already deployed.

@mdonadoni mdonadoni added this to Dask Nov 27, 2024
@mdonadoni mdonadoni moved this to Ready for work in Dask Nov 27, 2024
Alputer added a commit to Alputer/reana-workflow-controller that referenced this issue Nov 28, 2024
Alputer added a commit to Alputer/reana-workflow-controller that referenced this issue Nov 28, 2024
Alputer added a commit to Alputer/reana-workflow-controller that referenced this issue Nov 28, 2024
Alputer added a commit to Alputer/reana-workflow-controller that referenced this issue Nov 28, 2024
Alputer added a commit to Alputer/reana-workflow-controller that referenced this issue Nov 28, 2024
@Alputer Alputer moved this from Ready for work to In review in Dask Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In review
Development

Successfully merging a pull request may close this issue.

1 participant