Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-sysext fails to install extensions with Direct I/O #17027

Open
ixhamza opened this issue Feb 5, 2025 · 5 comments
Open

systemd-sysext fails to install extensions with Direct I/O #17027

ixhamza opened this issue Feb 5, 2025 · 5 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@ixhamza
Copy link
Member

ixhamza commented Feb 5, 2025

System information

Type Version/Name
Distribution Name Fedora Linux
Distribution Version 41 (Server Edition)
Kernel Version 6.11.4-301.fc41.x86_64
Architecture x86_64
OpenZFS Version zfs-2.3.99-175_g390f6c119, zfs-kmod-2.3.99-175_g390f6c119

Describe the problem you're observing

Squashfs uses a loop device to read data via its AIO handler, which sets the O_DIRECT flag on the request. In zfs_setup_direct(), the conditions here do not account for direct=standard and ioflag & O_DIRECT. This causes most of the requests to fail due to unalignment with the page boundary.

While the Linux block layer typically rejects requests that are not aligned to the block device's logical block size, the loop device bypasses the block layer by directly invoking the file system's read/write handler without enforcing alignment checks. As a result, systemd-sysext fails reliably when attempting to install extensions on a ZFS file system.

Describe how to reproduce the problem

The following script reliably reproduces the issue by attempting to install a simple Hello World extension on a ZFS file system:

truncate -s 64M /tmp/f1
zpool create tank /tmp/f1 -O mountpoint=/root/custom_extensions
mkdir -p /root/custom_extensions/myext/usr/local/bin
echo -e '#!/bin/bash\nexec /usr/bin/echo "Hello World"' > /root/custom_extensions/myext/usr/local/bin/hello
chmod 755 /root/custom_extensions/myext/usr/local/bin/hello
mkdir -p /root/custom_extensions/myext/usr/lib/extension-release.d
echo -e "ID=_any\n" > /root/custom_extensions/myext/usr/lib/extension-release.d/extension-release.hello
mksquashfs /root/custom_extensions/myext /root/custom_extensions/hello.raw
mkdir -p /run/extensions
ln -s /root/custom_extensions/hello.raw /run/extensions/
systemd-sysext refresh

Expected Result:
The extension should install successfully, and the systemd-sysext refresh command should complete without errors.

Actual Result:
systemd-sysext refresh fails with error: Failed to read metadata for image hello: Invalid argument.

Include any warning/errors/backtraces from the system logs

[ 1073.145411] loop0: detected capacity change from 0 to 8
[ 1073.159480] invalid error, dev loop0, sector 0 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
[ 1073.171444] SQUASHFS error: Failed to read block 0x0: -22
[ 1073.174262] unable to read squashfs_super_block
@ixhamza ixhamza added the Type: Defect Incorrect behavior (e.g. crash, hang) label Feb 5, 2025
@useranon350
Copy link

useranon350 commented Feb 7, 2025

From the Direct IO PR:

For both O_DIRECT read and write request the offset and requeset sizes, at a minimum, must be PAGE_SIZE aligned.
In the event they are not, then EINVAL is returned except for in the event the direct property is set to always.
#10018

This seems to be expected behavior based on the reasoning below (from the same PR). It seems solving this would require adding an additional option to the directio property.

I imagine the majority of users will not using direct IO, and if they are, they will be cognizant of the record alignment. No matter what, people would not be happy if they asked to bypass caching and it still winds up in the ARC without them having any say (even if it was more performant to do so). We are trying to follow the fundamental concept of direct IO by bypassing caching and delivering the data directly to the file system by mapping in the user’s pages. I think leaving that decision in the users control is preferable over just deciding for them. That was the idea behind all the dataset properties. If the user does decide to write, read the data to, from the ARC when it is more beneficial to performance, even if direct IO was requested, then the properties will allow that to happen.

See also #16972 for a possible solution for applications using statx.

@ixhamza
Copy link
Member Author

ixhamza commented Feb 7, 2025

I totally agree. systemd does indeed try to open the loop device with the O_DIRECT flag. However, since we have no control over systemd, I think wee need some relaxation from ZFS side, similar to other filesystems. In its current state, systemd-sysext is unusable with ZFS, and we rely on it to install multiple extensions in TrueNAS.

@useranon350
Copy link

I totally agree. systemd does indeed try to open the loop device with the O_DIRECT flag. However, since we have no control over systemd, I think wee need some relaxation from ZFS side, similar to other filesystems. In its current state, systemd-sysext is unusable with ZFS, and we rely on it to install multiple extensions in TrueNAS.

Looking at https://github.com/systemd/systemd/blob/v257.2/src/shared/loop-util.c#L614, I wonder if this could be resolved by setting the environment variable SYSTEMD_LOOP_DIRECT_IO to off.

loop_configure_verify_direct_io is also of interest along with https://github.com/truenas/linux/blob/truenas/linux-6.12/drivers/block/loop.c#L169. It seems like there is logic to enforce a logical block size of the loopback device based on the logical block size of backing device, but this presumably isn't getting applied for some reason.

@ixhamza
Copy link
Member Author

ixhamza commented Feb 9, 2025

Looking at https://github.com/systemd/systemd/blob/v257.2/src/shared/loop-util.c#L614, I wonder if this could be resolved by setting the environment variable SYSTEMD_LOOP_DIRECT_IO to off.

Setting SYSTEMD_LOOP_DIRECT_IO=0 seems to workaround this issue. Thank you!

loop_configure_verify_direct_io is also of interest along with https://github.com/truenas/linux/blob/truenas/linux-6.12/drivers/block/loop.c#L169. It seems like there is logic to enforce a logical block size of the loopback device based on the logical block size of backing device, but this presumably isn't getting applied for some reason.

It appears that lo_bdev_can_use_dio() is only applicable when the backing device for the loop is a block device. It seems that __loop_update_dio() is being called, but lo_bdev_can_use_dio() is not invoked because the backing device for the loop is not a block device.

@IvanVolosyuk
Copy link
Contributor

IvanVolosyuk commented Feb 12, 2025

Stupid question from a person who failed to follow the details of the discussion, does the mentioned statx patch happens to fix the problem? From quick look at systemd sources it looks like alignment will be based on sector size. Also for zvol and loopback device backed by zfs file it seems logical sector size is 512b according to fdisk (in my case pool ashift=12, recordsize/volblocksize >= 16k).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants