dask: correctly handle failure to start workflow #617

mdonadoni · 2024-11-27T11:34:06Z

When starting a new workflow, first the Dask cluster (and autoscaler) are deployed, and then the run-batch pod is started.

If for some reason the worklow start fails after creating the Dask cluster (e.g. when creating the ingress for the Dask dashboard), then the Dask cluster will remain deployed. At the next retry to start the workflow, reana-workflow-controller will try to create the cluster again, but this will fail with code 409 (Conflict).

REANA will then retry to start the workflow indefinitely and always failing due to the Dask cluster being already deployed.

Closes reanahub#617

mdonadoni added type/bug workflow/dask labels Nov 27, 2024

mdonadoni added this to Dask Nov 27, 2024

mdonadoni moved this to Ready for work in Dask Nov 27, 2024

Alputer added a commit to Alputer/reana-workflow-controller that referenced this issue Nov 28, 2024

fix(k8s): handle 409 conflict errors for dask resources (reanahub#618)

f56f50c

Closes reanahub#617

Alputer added a commit to Alputer/reana-workflow-controller that referenced this issue Nov 28, 2024

fix(k8s): handle 409 conflict errors for dask resources (reanahub#618)

b9bd76d

Closes reanahub#617

Alputer linked a pull request Nov 28, 2024 that will close this issue

fix(k8s): handle exceptions when creating dask cluster #618

Open

Alputer added a commit to Alputer/reana-workflow-controller that referenced this issue Nov 28, 2024

fix(k8s): handle exceptions when creating dask cluster (reanahub#618)

71cf935

Closes reanahub#617

Alputer added a commit to Alputer/reana-workflow-controller that referenced this issue Nov 28, 2024

fix(k8s): handle exceptions when creating dask cluster (reanahub#618)

1efdce3

Closes reanahub#617

Alputer added a commit to Alputer/reana-workflow-controller that referenced this issue Nov 28, 2024

fix(k8s): handle exceptions when creating dask cluster (reanahub#618)

0eecea0

Closes reanahub#617

Alputer moved this from Ready for work to In review in Dask Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dask: correctly handle failure to start workflow #617

dask: correctly handle failure to start workflow #617

mdonadoni commented Nov 27, 2024 •

edited

Loading

dask: correctly handle failure to start workflow #617

dask: correctly handle failure to start workflow #617

Comments

mdonadoni commented Nov 27, 2024 • edited Loading

mdonadoni commented Nov 27, 2024 •

edited

Loading