Options to read & decompress data in parallel #340

takluyver · 2022-09-08T15:10:58Z

This adds read_procs and decomp_threads parameters to KeyData.ndarray() and .xarray(). They control the number of processes used to read data from HDF5 files, and threads used to decompress data in the specific pattern we use for gain/mask datasets in 2D data. They both default to 1, i.e. the status quo, and we avoid launching separate processes/threads when they're 1.

Testing with ~55 GB of JUNGFRAU data, I got a better-than-2x speedup reading uncompressed data with 10 processes (~1 minute -> ~24 seconds), and something like a 10x speedup reading compressed data with decomp_threads=-1, i.e. 1 thread per core, on a 72-core node (~1 min 40 s -> 10 s). The timings are pretty variable - AFAICT, filesystem access always is.

The read_procs option is kind of incompatible with passing in an out array, because the array needs to be in shared memory. I'm not sure how to deal with that in the API - we could reject using out and read_procs together, but you could also pass in an array in shared memory, and I don't know of any way to check for that.

Future work:

Extend this to multi-module detector data (waiting on Multi-module KeyData interface #337)
More efficient decompression, with fewer temporary memory allocations (@JamesWrigley has been experimenting with this).

Closes #49.

JamesWrigley

Some benchmarks 🐎 I selected AGIPD runs with ~200 cells and loaded the first 1000 trains of a single module into memory.

This is loading a compressed uint16 AGIPD module from p3025 with 200 cells:

(same but with a linear scale)

And loading an uncompressed float32 AGIPD module from p3046 with 202 cells:

(same but with a linear scale)

Amusingly, it's faster to load the uncompressed data despite it being ~80.5GB on disk compared to ~2.34GB of compressed data 🙃 (both proc/ directories still being on GPFS).

But still a huge improvement 🎉

JamesWrigley · 2022-12-07T17:20:33Z

extra_data/read_machinery.py

+
+
+# Based on _alloc function in pasha
+def zeros_shared(shape, dtype):


Could we expose this? Or add something like KeyData.allocate_out(shared=True)?

JamesWrigley · 2022-12-07T17:27:27Z

extra_data/keydata.py

+
+        return out
+
+    def ndarray(self, roi=(), out=None, read_procs=1, decomp_threads=1):


decomp_threads failed for me when loading an AGIPD module with shape (200000, 512, 128):

takluyver added 8 commits September 7, 2022 14:12

Option to read data in parallel

0b10a74

Allocate array in shared memory if reading in parallel

1c9b085

Another try at parallelising reading

ac4640e

Add specialised parallel decompression for compressed mask/gain data

7df2fea

Read data in-process with read_procs=1

b21a253

Add test for reading with multiple processes

52f7f66

Document read_procs and decomp_threads parameters

616e740

Expose read_procs and decomp_threads for KeyData.xarray() method

4919783

takluyver added the enhancement New feature or request label Sep 8, 2022

Interpret -1 differently for read_procs

88d2684

JamesWrigley requested changes Dec 7, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Options to read & decompress data in parallel #340

Options to read & decompress data in parallel #340

takluyver commented Sep 8, 2022

JamesWrigley left a comment

JamesWrigley Dec 7, 2022

JamesWrigley Dec 7, 2022



		# Based on _alloc function in pasha
		def zeros_shared(shape, dtype):


		return out

		def ndarray(self, roi=(), out=None, read_procs=1, decomp_threads=1):

Options to read & decompress data in parallel #340

Are you sure you want to change the base?

Options to read & decompress data in parallel #340

Conversation

takluyver commented Sep 8, 2022

JamesWrigley left a comment

Choose a reason for hiding this comment

JamesWrigley Dec 7, 2022

Choose a reason for hiding this comment

JamesWrigley Dec 7, 2022

Choose a reason for hiding this comment