-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved docs for zarr encoding options. #9987
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
Thanks for volunteering to contribute! On the xarray end you can pass whatever is accepted by Zarr. So perhaps the best thing to do is write out a line and example to that effect, and add more detailed docs over at Zarr |
Does the error I am seeing relate to this? I happens when I try to call I noticed that the same error appears in the code block in your documentation here. |
I encountered an issue with that as well just now. This was my journey right now in trying to achieve this: Using this dataset: import xarray as xr
import numpy as np
ds = xr.Dataset(
{
# all zeros to verify by disk size whether it was compressed or not
"temperature": (("x", "y", "time"), np.zeros((50, 60, 1000))),
},
coords={
"x": np.arange(50),
"y": np.arange(60),
"time": np.arange(1000),
},
) My initial attempt, based on a quick search on how to do this: import zarr
ds.to_zarr("my_store", consolidated=False, mode="w", encoding={"temperature": {"compressors": [zarr.Blosc()]}}) gives: Ok - it seems those examples were for import numcodecs
ds.to_zarr("my_store", consolidated=False, mode="w", encoding={"temperature": {"compressors": [numcodecs.Blosc()]}}) gives the following error: Ok - this is strange, after some research and messing around with the import numcodecs.zarr3
ds.to_zarr("my_store", consolidated=False, mode="w", encoding={"temperature": {"compressors": [numcodecs.zarr3.Blosc()]}}) This does, in fact write out a zarr store. However, I do get a very confusing
Checking whether it was compressed or not: >>> du -h my_store
4.0K my_store/time/c
8.0K my_store/time
4.0K my_store/temperature
4.0K my_store/x/c
8.0K my_store/x
4.0K my_store/y/c
8.0K my_store/y
32K my_store Yes it was, otherwise However, if I leave out the compressor
I still get the same result: So I'm assuming it already uses a compressor by default if not otherwise specified? But the default compressor doesn't produce that user warning - so it must be a different one than one of I only found out about the |
What is your issue?
Hi
I'm been trying to set zarr encoding options from xarray. (zarr3)
Figuring out how to do this isn't straightforward. It wasn't too hard to get this working for most zarr compressors, but getting it working for array-to-bytes codecs - ZFPY and PCodec was rather harder. (the 2 ArrayBytesCodecs). It turned out the issue is that array bytes codecs need specifying as serialisers, rather than as compressors in the encoding object.
Anyway - to cut to the chase, I think some better documentation of the format of the encoding object would be useful. - I've not been able to find any, and resorted to source code reading to find the above parameter.
I'm happy to help write this if useful, but could use a pointer for the best place to put the doc. (I'm new to making xarray changes).
should say though - the fact this works at all just a few days after zarr3 release is great!
Thanks
Format strings that seem to be working for me are as follows (arguably maybe the details of codec naming belong more in zarr land, but at least the serializer keyword is as far as I can see a xarray invention, so should be documented in xarray):
For ArrayBytesCodecs:
encoding = {"serializer": numcodecs.zarr3.()}
For ArrayBytesCodecs:
if in numcodecs:
encoding = {"compressor": numcodecs.zarr3.()}
and if native zarr3:
(note different codec name format)
encoding = {"compressor": zarr.codecs.ZstdCodec()}
The text was updated successfully, but these errors were encountered: