Skip to content

DataTree

H. Joe Lee edited this page Dec 16, 2022 · 2 revisions

to_zarr() failure

DataTree can't create Zarr from ATL08_20181014084920_02400109_003_01.h5.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 
100004: invalid continuation byte

DataTree gives Segmentation Fault for SMAP_L3_SM_P_20150406_R14010_001.h5.

tree() difference

DataTree says object for string. Kerchunk has explicit size for string.

bash-3.2$ diff dt.txt ker.txt 
40,42c40,42
<  │   ├── control (1,) object
<  │   ├── data_end_utc (1,) object
<  │   ├── data_start_utc (1,) object
---
>  │   ├── control (1,) |S100000
>  │   ├── data_end_utc (1,) |S27
>  │   ├── data_start_utc (1,) |S27
51,52c51,52
<  │   ├── granule_end_utc (1,) object
<  │   ├── granule_start_utc (1,) object
---
>  │   ├── granule_end_utc (1,) |S27
>  │   ├── granule_start_utc (1,) |S27
114c114
<  │   ├── release (1,) object
---
>  │   ├── release (1,) |S80
123c123
<  │   └── version (1,) object
---
>  │   └── version (1,) |S80

.zmetadata

DataTree has all attributes under .zmetadata.

Kerchunk DataTree
Lines 165 6233

Groups

DataTree stores groups. Zarr from Xarray and Kerchunk doesn't.

Only in ATL08dt.zarr: METADATA
Only in ATL08dt.zarr: ancillary_data

LZ4

DataTree identifies lz4 compressor.

diff -r ATL08.zarr/ds_geosegments/.zarray ATL08dt.zarr/ds_geosegments/.zarray
5c5,11
<     "compressor": null,
---
>     "compressor": {
>         "blocksize": 0,
>         "clevel": 5,
>         "cname": "lz4",
>         "id": "blosc",
>         "shuffle": 1
>     },

Data value might be different.

Binary files ATL08.zarr/ds_geosegments/0 and ATL08dt.zarr/ds_geosegments/0 differ

DataTree uses lz4 compression for this dataset, which increases size of data from 5 to 21 bytes. The original HDF5 dataset is contiguous without compression.

Fill Value

DataTree sets fill value as NULL.

<     "fill_value": 0,
---
>     "fill_value": null,
Clone this wiki locally