Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error related to parallelism when running snap.tl.macs3 when running on the cluster #298

Open
PauBadiaM opened this issue Apr 22, 2024 · 2 comments
Labels
documentation 📖 Improvements or additions to documentation

Comments

@PauBadiaM
Copy link

PauBadiaM commented Apr 22, 2024

Recently I had a weird issue when running snap.tl.macs3 on a cluster, my script would get stuck for a while and then throw an error related to parallelism issues:

0%|          | 0/12 [00:19<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/sds-hd/sd22b002/projects/GRETA/greta_benchmark/callpeaks.py", line 18, in <module>
    snap.tl.macs3(adata, groupby='cell_type', n_jobs=n_jobs, tempdir=tempdir)
  File "/opt/conda/envs/env/lib/python3.10/site-packages/snapatac2/tools/_call_peaks.py", line 155, in macs3
    peaks = _par_map(_call_peaks, [(x,) for x in fragments.values()], n_jobs)
  File "/opt/conda/envs/env/lib/python3.10/site-packages/snapatac2/tools/_call_peaks.py", line 221, in _par_map
    raise RuntimeError("Some worker process has died unexpectedly.")
RuntimeError: Some worker process has died unexpectedly.

Despite this, the same script would work in my local machine.

In case someone has the same issue, I've found that the solution is to wrap the call to macs3 with the __main__ conditional statement. Here is an example:

import snapatac2 as snap

if __name__ == '__main__':
  # Read data
  adata = snap.read(snap.datasets.pbmc5k(type='annotated_h5ad'), backed=None)
  
  # Subset to make things faster
  msk = adata.obs.groupby('cell_type', observed=False).head(50).index
  adata = adata[msk, :].copy()
  
  # Call ATAC-seq peaks using MACS
  snap.tl.macs3(adata, groupby='cell_type', n_jobs=8)
  
  print('Done!')

Maybe this could also be added in the docs.

@kaizhang
Copy link
Owner

As tl.macs3 uses multiprocess for parallel execution under the hood, it would be wrapped in a if __name__ == '__main__': statement. This constraint may be lifted once "nogil" is merged in Python.

@1010shane
Copy link

@PauBadiaM Thank you! Exact same problem, has been driving me nuts. Your solution solved my issue. Previously, I was able to run my pipeline on MacOS, but not on Ubuntu 22.04.

@kaizhang kaizhang added the documentation 📖 Improvements or additions to documentation label May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation 📖 Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants