Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

live-iso: enable building with squashfs or erofs #4012

Merged
merged 6 commits into from
Feb 13, 2025

Conversation

nikita-dubrovskii
Copy link
Contributor

No description provided.

@nikita-dubrovskii
Copy link
Contributor Author

f-c-c PR: coreos/fedora-coreos-config#3342

@nikita-dubrovskii nikita-dubrovskii changed the title DRAFT: live-iso: switch from squashfs to erofs live-iso: switch from squashfs to erofs Feb 5, 2025
@nikita-dubrovskii
Copy link
Contributor Author

osbuild PR: osbuild/osbuild#2002

build.sh Outdated Show resolved Hide resolved
build.sh Outdated Show resolved Hide resolved
build.sh Outdated Show resolved Hide resolved

# Use erofs by default
live-rootfs-fstype: "erofs"
live-rootfs-fsoptions: "-Eall-fragments,fragdedupe=inode -C131072 --quiet"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coreos/fedora-coreos-tracker#1852 (comment) mentions

-zlzma,6 -Eall-fragments,fragdedupe=inode -C1048576

as the best option I think.

@jlebon do you think it's a good idea to set the defaults here, or should we just put this info just in the config repos?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is changing quite fast currently, so it might be premature to bake it here. I would leave everything off here to just --quiet and let it live in the config for now. Also because those changes are not in RHEL yet AFAIK. Once things stabilize we can add defaults here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-zlzma,6 -Eall-fragments,fragdedupe=inode -C1048576 on my system (F41) that segfaults:

Creating erofs with -zlzma,6 -Eall-fragments,fragdedupe=inode -C1048576 --quiet                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                                                              
Filesystem      Size  Used Avail Use% Mounted on                                                                                                                                                                                                                                                                              
/dev/vdb1        30G  8.7G   22G  29% /srv/cache                                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                                                              
ThreadSanitizer:DEADLYSIGNAL                                                                                                                                                                                                                                                                                                  
==56==ERROR: ThreadSanitizer: SEGV on unknown address 0x000c0001000f (pc 0x7fda1608efb5 bp 0x7fd9d9c9d490 sp 0x7fd9d9c9d460 T73)                                                                                                                                                                                              
==56==The signal is caused by a READ memory access.                                                                                                                                                                                                                                                                           
ThreadSanitizer:DEADLYSIGNAL                                                                                                                                   
ThreadSanitizer: nested bug in the same thread, aborting.  

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave everything off here to just --quiet and let it live in the config for now.

Taking this a step farther I think I'm going to argue to just not include defaults in COSA here at all for now. The OSBuild stage (for now) will default to squashfs/zstd and it will be easier to ratchet in erofs in the streams we want if COSA isn't setting it too I think. My proposal for now is to just put these values here and comment them out:

# an example of setting ISO/PXE rootfs fstype and fsoptions
# live-rootfs-fstype: "erofs"
# live-rootfs-fsoptions: "--quiet"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

erofs/erofs-utils#13

Thanks for filing this! Hopefully we can sort this out and leave the supermin requirement alone (or only slightly bump it).

In practice, if we really have to... I think even if we did bump it to 6G it shouldn't affect our pipeline requests. We already request more there and the point at which we run osbuild, it runs alone. And additionally AIUI osbuild still currently runs pipelines serially even when there is no dependence between two branches.

Though we should at least make it dynamic; e.g. cmd-osbuild already loads image.json so it could trivially check whether EROFS was requested and size the supermin VM correspondingly. This helps ease memory pressure on the multi-arch builders for RHCOS where we'll be building squashfs also for a while and also the developer case. But again, let's see with the maintainer whether we can avoid it at all (or maybe just do it short-term).

@dustymabe
Copy link
Member

dustymabe commented Feb 11, 2025

Can we make sure as part of this cosa diff still works against the new erofs based media?

One option may be to rename some of the options

  • --live-rootfs -> --live-rootfs-img
  • --live-rootfs-ls -> --live-rootfs-img-ls
  • --live-squashfs -> --live-rootfs
  • --live-squashfs-ls -> --live-rootfs-ls

The --live-rootfs-img is the rootfs.img CPIO archive that contains the osmet and root.{squash,ero}fs and the --live-rootfs is now the root.squashfs or root.erofs and it's abstracted over them (i.e. you can compare two builds, one built with squashfs and one built with erofs).

cc @jlebon since he wrote cosa diff.

@jlebon
Copy link
Member

jlebon commented Feb 11, 2025

One option may be to rename some of the options

Those renames make sense to me!

@nikita-dubrovskii nikita-dubrovskii force-pushed the erofs branch 2 times, most recently from 37c49c2 to 202968c Compare February 11, 2025 10:30
build.sh Outdated Show resolved Hide resolved
@jlebon
Copy link
Member

jlebon commented Feb 12, 2025

The last commit there is interesting/potentially concerning. Can you go into more details of what you're seeing there? Was this just the rootfs image itself being larger and so needing to bump RAM requirements to even store it? Or was it e.g. mounting the EROFS itself permanently carving off a lot of RAM?

What was the minimum amount of additional RAM that made it work?

@nikita-dubrovskii
Copy link
Contributor Author

The last commit there is interesting/potentially concerning. Can you go into more details of what you're seeing there? Was this just the rootfs image itself being larger and so needing to bump RAM requirements to even store it? Or was it e.g. mounting the EROFS itself permanently carving off a lot of RAM?

What was the minimum amount of additional RAM that made it work?

On my system testiso ... fails with ram < 5662:

[  OK  ] Finished systemd-tmpfiles-setup-de…Create Static Device Nodes in /dev.                                                                                
[    3.376682] systemd[1]: Finished systemd-tmpfiles-setup-dev.service - Create Static Device Nodes in /dev.                                                   
[    3.378492] systemd[1]: Reached target local-fs-pre.target - Preparation for Local File Systems.                                                            

[  OK  ] Reached target local-fs.target - Local File Systems.
[    3.382504] systemd-tmpfiles[389]: /usr/lib/tmpfiles.d/var.conf:14: Duplicate line for path "/var/log", ignoring.

[FAILED] Failed to start dbus-broker.service - D-Bus System Message Bus.                                                                                                                                                                                                                                                      
See 'systemctl status dbus-broker.service' for details.                                                                                                                                                                                                                                                                       
[    4.289897] dbus-broker-launch[733]:       main @ ../src/launch/main.c +178                                                                                                                                                                                                                                                
[FAILED] Failed to start nm-initrd.service.         

......

:/root# findmnt /sysroot
TARGET   SOURCE     FSTYPE OPTIONS
/sysroot /dev/loop1 erofs  ro,relatime,user_xattr,acl,cache_strategy=readaround

# systemctl list-units --failed
  UNIT                   LOAD   ACTIVE SUB    DESCRIPTION                      >
● dbus-broker.service    loaded failed failed D-Bus System Message Bus
● ignition-fetch.service loaded failed failed Ignition (fetch)
● multipathd.service     loaded failed failed Device-Mapper Multipath Device Co>
● dbus.socket            loaded failed failed D-Bus System Message Bus Socket


@hsiangkao
Copy link

hsiangkao commented Feb 12, 2025

The last commit there is interesting/potentially concerning. Can you go into more details of what you're seeing there? Was this just the rootfs image itself being larger and so needing to bump RAM requirements to even store it? Or was it e.g. mounting the EROFS itself permanently carving off a lot of RAM?
What was the minimum amount of additional RAM that made it work?

On my system testiso ... fails with ram < 5662:

[  OK  ] Finished systemd-tmpfiles-setup-de…Create Static Device Nodes in /dev.                                                                                
[    3.376682] systemd[1]: Finished systemd-tmpfiles-setup-dev.service - Create Static Device Nodes in /dev.                                                   
[    3.378492] systemd[1]: Reached target local-fs-pre.target - Preparation for Local File Systems.                                                            

[  OK  ] Reached target local-fs.target - Local File Systems.
[    3.382504] systemd-tmpfiles[389]: /usr/lib/tmpfiles.d/var.conf:14: Duplicate line for path "/var/log", ignoring.

[FAILED] Failed to start dbus-broker.service - D-Bus System Message Bus.                                                                                                                                                                                                                                                      
See 'systemctl status dbus-broker.service' for details.                                                                                                                                                                                                                                                                       
[    4.289897] dbus-broker-launch[733]:       main @ ../src/launch/main.c +178                                                                                                                                                                                                                                                
[FAILED] Failed to start nm-initrd.service.         

......

:/root# findmnt /sysroot
TARGET   SOURCE     FSTYPE OPTIONS
/sysroot /dev/loop1 erofs  ro,relatime,user_xattr,acl,cache_strategy=readaround

# systemctl list-units --failed
  UNIT                   LOAD   ACTIVE SUB    DESCRIPTION                      >
● dbus-broker.service    loaded failed failed D-Bus System Message Bus
● ignition-fetch.service loaded failed failed Ignition (fetch)
● multipathd.service     loaded failed failed Device-Mapper Multipath Device Co>
● dbus.socket            loaded failed failed D-Bus System Message Bus Socket

Can you show me dmesg there too?

Also for LZMA, Zstd algorithms, the kernel will allocate a internal decompression dictionary for each CPU (just like SQUASHFS_DECOMP_MULTI_PERCPU).

For LZMA, the dictionary size is *8 by default, so -C1048576 will take 8MiB vmalloc() memory for each CPU:
I tend to keep it align with squashfs more on this, so if you really need -C1048576, limit dictsize to 1048576 will reduce resident memory (with little compression ratio loss, I'm considering limiting dictsize to 1M by default in erofs-utils too now rather than just -C * 8)
-zlzma,level=6,dictsize=1048576 -Eall-fragments,fragdedupe=inode -C1048576 --quiet will be better.

Also if if image sizes show little difference, I suggest to use smaller -C if possible.

@hsiangkao
Copy link

The last commit there is interesting/potentially concerning. Can you go into more details of what you're seeing there? Was this just the rootfs image itself being larger and so needing to bump RAM requirements to even store it? Or was it e.g. mounting the EROFS itself permanently carving off a lot of RAM?

I'm wondering the cause of it too, the issue is actually a mkfs issue, and increase tmpfs space will generate the proper image finally.

But the @nikita-dubrovskii recent message is actually a runtime report, are those two different reports actually? Use -C1048576 will take more memory than expected. If squashfs uses 128k or something, I tend to use similar configuration if possible, also as I said above, it would be nice to limit dictionary size too (currently *8 by default, so -C1MiB takes 8MiB for each CPU.)

Also would you mind sharing your produced rootfs to me as well if possible?

What was the minimum amount of additional RAM that made it work?

@nikita-dubrovskii
Copy link
Contributor Author

@hsiangkao , i've built locally FCOS with -Eall-fragments,fragdedupe=inode -C1048576, and here is a log from failed testrun.

Now will rebuild and retest using -zlzma,level=6,dictsize=1048576 -Eall-fragments,fragdedupe=inode -C1048576 --quiet

console.txt

@hsiangkao
Copy link

hsiangkao commented Feb 12, 2025

@hsiangkao , i've built locally FCOS with -Eall-fragments,fragdedupe=inode -C1048576, and here is a log from failed testrun.

Now will rebuild and retest using -zlzma,level=6,dictsize=1048576 -Eall-fragments,fragdedupe=inode -C1048576 --quiet

console.txt

but why
systemd[1]: Failed to write /etc/machine-id: No space left on device
multipathd[351]: Cannot write header to file /etc/multipath/bindings : No space left on device,
does /etc land in memory rather than on disk too?

Also I didn't find any dmesg like erofs: (device xxx): mounted with root inode @ nid 37., does EROFS already mount?

@dustymabe
Copy link
Member

dustymabe commented Feb 12, 2025

@hsiangkao I think maybe you should ignore this conversation for now. We're talking about something very internal to CoreOS (i.e. testing the ISO created that has the erofs baked inside) and it's probably misleading to what actual root causes are.

On my system testiso ... fails with ram < 5662

I think this is because your actual ISO is large because you weren't using compression when creating the erofs (-Eall-fragments,fragdedupe=inode -C1048576) I saw the same thing yesterday but then when I finally got -zlzma,6 to work the ISO was back down to a reasonable size and my testiso tests didn't run out of memory.

@nikita-dubrovskii try with this patch and see if you get the ISO generation to work fine with -zlzma,6 and then testiso to pass tests without bumping the memory:

diff --git a/src/cmdlib.sh b/src/cmdlib.sh
index f83ae98a4..a089a46c7 100755
--- a/src/cmdlib.sh
+++ b/src/cmdlib.sh
@@ -767,7 +767,7 @@ EOF
 
     # There seems to be some false positives in shellcheck
     # https://github.com/koalaman/shellcheck/issues/2217
-    memory_default=2048
+    memory_default=6096
     # shellcheck disable=2031
     case $arch in
     # Power 8 page faults with 2G of memory in rpm-ostree

@nikita-dubrovskii
Copy link
Contributor Author

@dustymabe thx, that memory_default=6096 helped

Currently `testiso` only uses 4Gb and ignores `--qemu-memory`, which leads
to a failure when testing new live-images with `/root.erofs`:
```
[    2.133469] dracut-cmdline[536]: /usr/sbin/initqueue: line 65: echo: write error: No space left on device
```

With this PR it's now possible to override default settings:
```
$ cosa kola testiso pxe* --pxe-append-rootfs --qemu-memory 8192
Running test: pxe-offline-install.bios
PASS: pxe-offline-install.bios (1m27.545s)
...
```
Differ("live-sysroot-ls", "Diff live '/root.[erofs|squash]fs' (embed into live-rootfs) listings",
needs_ostree=False, function=diff_live_sysroot_tree),
Differ("live-sysroot", "Diff live '/root.[ero|squash]fs' (embed into live-rootfs) content",
needs_ostree=False, function=diff_live_sysroot),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these could be named better, but I'll argue that point in a separate PR and see what other people think.

functionally this should work great!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll hold off reviewing this hunk until your PR. :)

Comment on lines 36 to 38
# Use erofs by default
live-rootfs-fstype: "erofs"
live-rootfs-fsoptions: "-zlzma,level=6 -Eall-fragments,fragdedupe=inode -C1048576 --quiet"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still think we should leave this out of COSA for now:

Suggested change
# Use erofs by default
live-rootfs-fstype: "erofs"
live-rootfs-fsoptions: "-zlzma,level=6 -Eall-fragments,fragdedupe=inode -C1048576 --quiet"
# Defaults for the root filesystem in the Live ISO/PXE artifacts. Left unset
# for now as we ratchet in erofs support.
# live-rootfs-fstype: "erofs"
# live-rootfs-fsoptions: "-zlzma,level=6 -Eall-fragments,fragdedupe=inode -C1048576 --quiet"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a new commit here for this. This also happens to make CI pass again. Because of erofs/erofs-utils#13 it won't pass unless we bump the supermin VM memory and I'm not sure if we really want to do that just yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agree, changing the defaults should come later.

Right now we need to ratchet in these changes using image.yaml
from the config repos rather than changing the default for
all consumers off the bat.

Note that the OSBuild stage defaults to squashfs/zstd so this is
just making it so that it what gets picked up and used for anywhere
where it's not otherwise configured.

This also has the side effect of making CI pass again because of
-zlzma,level=6 requiring a lot of memory right now:
erofs/erofs-utils#13
squashfs-compression: zstd will become obsolete soon. Let's drop
a note about it so anyone happening by can clean it up.
@dustymabe dustymabe changed the title live-iso: switch from squashfs to erofs live-iso: enable building with squashfs or erofs Feb 12, 2025
@dustymabe
Copy link
Member

I ran through the various options to cosa diff from a squashfs to an erofs built image and all looks good.

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments but we can address them in follow-ups to not waste CI. Nice work!

@@ -98,3 +98,6 @@ fedora-repos-ostree

# For graphing manifest includes using `manifest_graph`
python-anytree

# For mkfs.erofs
erofs-utils
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor/optional: this is already included in vmdeps.txt so it doesn't need to be here as well.

Comment on lines +261 to +263
def diff_live_sysroot_tree(diff_from, diff_to):
(dir_from, dir_to) = extract_live_sysroot_img(diff_from, diff_to)
diff_cmd_outputs(['find', '{}', '-printf', "%P\n"], dir_from, dir_to)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm right, that's much more expensive now that we have to extract the full squashfs/erofs but there isn't really a listing equivalent for erofs we could use here (there's dump.erofs --ls, but there's no option to make it recurse) which I guess is how you ended up here. And I guess it's nice that it allows comparing across the squashfs -> erofs transition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I think we want to sort the output of find since it doesn't guarantee lexicographic order which is important here for a good diff. We could add an option to diff_cmd_outputs to pipe the output of the cmd to sort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no option to make it recurse) which I guess is how you ended up here

Yes, and anyhow we unpack it in other diff cmd, so it's expensive, but probably we can stay with it. Other option - there are some tools available for erofs, but i haven't tested them

Differ("live-sysroot-ls", "Diff live '/root.[erofs|squash]fs' (embed into live-rootfs) listings",
needs_ostree=False, function=diff_live_sysroot_tree),
Differ("live-sysroot", "Diff live '/root.[ero|squash]fs' (embed into live-rootfs) content",
needs_ostree=False, function=diff_live_sysroot),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll hold off reviewing this hunk until your PR. :)

@jlebon jlebon merged commit e1943d6 into coreos:main Feb 13, 2025
5 checks passed
@nikita-dubrovskii nikita-dubrovskii deleted the erofs branch February 13, 2025 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants