Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to latest musl 1.2.5 (via 1.1.24) #2251

Merged
merged 14 commits into from
Sep 7, 2024

Conversation

MagnusS
Copy link
Member

@MagnusS MagnusS commented Jun 17, 2024

This upgrades to musl 1.2.5, the latest release.

The update has been tested by building the chainloader and booting the example.

The upgrade is done via 1.1.24, using partial patches from the pthread branch (originally written by @fwsGonzo). See commit messages for more details.

musl has introduced _time64 versions of several functions to provide 64-bit time for 32-bit builds without breaking the existing API. It looks like IncludeOS is using 64 bit values already, so I have just kept the existing versions in IncludeOS. It would probably be useful with a read through by someone more familiar with the IncludeOS time code...

@MagnusS MagnusS requested review from alfreb and fwsGonzo June 17, 2024 18:06
@alfreb
Copy link
Contributor

alfreb commented Jun 19, 2024

Awesome! Great to get musl unpinned and up to date, thanks!

  • ✅ Builds
  • ✅ UDP test passes nix-shell --argstr unikernel ./test/net/integration/udp/ --run "./test.py"
  • ✅ SMP test passes nix-shell --argstr unikernel ./test/kernel/integration/smp/ --run "./test.py"
  • ❌ TCP test enters crash loop nix-shell --argstr unikernel ./test/net/integration/tcp/ --run "./test.py"

I'm pretty sure the TCP test failing is unrelated - it fails during heap initialisation, before initializing libc. Incidentally, changing the musl patch to make a_crash emit a message and cli;hlt instead of ud2, catches the error:

[ WARNING ] Running with sudo
<vm> SeaBIOS (version 1.16.3-debian-1.16.3-2)
<vm> Booting from ROM..* Multiboot begin: 0x9500
<vm> * Multiboot cmdline @ 0x23c035: /nix/store/74g3yzkxj9jbli24vhqg38q0psgng2cj-chainloader-static-i686-unknown-linux-musl-dev/bin/chainloader booted with vmrunner
<vm> * Multiboot end: 0x23c0b5
<vm> * Module list @ 0x23c000
<vm> 	 * Module @ 0x23d000
<vm> 	 * Args: net_tcp.elf.bin booted with vmrunner
<vm>  	 * End: 0x4a6d80
<vm>  * Multiboot end: 0x4a6d80
<vm> <Multiboot>Booted with multiboot
<vm> 	* Boot flags: 0x24f
<vm> 	* Valid memory (261631 Kib):
<vm> 	  0x00000000 - 0x0009fbff (639 Kib)
<vm> 	  0x00100000 - 0x0ffdffff (260992 Kib)
<vm> 
<vm> 	* Booted with parameters @ 0x23c035: /nix/store/74g3yzkxj9jbli24vhqg38q0psgng2cj-chainloader-static-i686-unknown-linux-musl-dev/bin/chainloader booted with vmrunner
<vm> 	* Multiboot provided memory map  (8 entries @ 0x9000)
<vm> 	  0x0000000000 - 0x000009fbff FREE (639 Kb.)
<vm> 	  0x000009fc00 - 0x000009ffff RESERVED (1 Kb.)
<vm> 	  0x00000f0000 - 0x00000fffff RESERVED (64 Kb.)
<vm> 	  0x0000100000 - 0x000ffdffff FREE (260992 Kb.)
<vm> 	  0x000ffe0000 - 0x000fffffff RESERVED (128 Kb.)
<vm> 	  0x00feffc000 - 0x00feffffff RESERVED (16 Kb.)
<vm> 	  0x00fffc0000 - 0x00ffffffff RESERVED (256 Kb.)
<vm> 	  0x0000000000 - 0x00ffffffff RESERVED (0 Kb.)
<vm> 
<vm> <Multiboot>OS loaded with 1 modules
<vm> 	* net_tcp.elf.bin booted with vmrunner @ 0x23d000 - 0x4a6d80, size: 2530688b
<vm> * Multiboot begin: 0x9500
<vm> * Multiboot end: 0x3bd2a0
<vm> [x86_64 PC] constructor
<vm> [ Machine ] Initializing heap
<vm> [ Machine ] Main memory detected as 264380224 b
<vm> [ Machine ] Reserving 1048576 b for machine use
<vm> 
<vm> *** Libc crashed 😵 ***

You can bring this in if you'd like, or we can discuss it in a separate PR:
alfreb@e81fb7c

I think this is a great opportunity to maybe find an old ghost; I'll try to find some time to investigate.

@alfreb
Copy link
Contributor

alfreb commented Jun 19, 2024

Here's a backgrace, for reference:

#0  0x0000000000329bc2 in a_crash ()
#1  0x000000000032895e in enframe ()
#2  0x0000000000329840 in alloc_group ()
#3  0x0000000000328853 in alloc_slot ()
#4  0x00000000003297df in alloc_group ()
#5  0x0000000000328853 in alloc_slot ()
#6  0x00000000003297df in alloc_group ()
#7  0x0000000000328853 in alloc_slot ()
#8  0x00000000003285eb in __libc_malloc_impl ()
#9  0x00000000003267a5 in malloc ()
#10 0x000000000023f36b in strdup ()
#11 0x0000000000246f1d in x86::init_libc (magic=<optimized out>, addr=<optimized out>) at /build/source/src/platform/x86_pc/init_libc.cpp:107
#12 0x000000000024769a in long_mode ()
#13 0x0000000000000000 in ?? ()

I've opened a bug for this here: #2252 and will try to chase it down from there.

@MagnusS
Copy link
Member Author

MagnusS commented Jun 28, 2024

Would you like to keep this open while you find the bug or should we merge? I you're (relatively) confident it's unrelated to the updated musl version I think we could merge this. It would be nice to track down before any new release though

@alfreb
Copy link
Contributor

alfreb commented Jun 29, 2024

We don't need the TCP test to work, but I do think we need the SMP test to pass also with all CPU's actually active, since there might be changes to how TLS is initialized in the new libc. I hadn't noticed before but when running the SMP test with the command I posted above it detects 4 CPU's, but doesn't actually bring them online. So we need to:

  • Enable SMP - we can consider having it on by default for now to have fewer code paths to maintain
  • Run the SMP test with both v0.16.0-release and this branch.
    • If v0.16.0 has working SMP and this PR has problems, we ned to investigate to make sure new musl does not change something important, like how TLS is initialized.
    • If v0.16.0 has problems with SMP enabled, we need to go back further, to v0.15.1 or earlier to find when it broke.

As a point of reference, the last time I saw all AP's coming online (although I didn't test them) was in the pthread-kernel branch where I get this output when starting the example with 16 cores:

      [ x86 ] Enabling interrupts
        [ SMP ] Initializing APs
        [ SMP ] Starting APs
                AP 8 started at 0x547000
                AP 9 started at 0x54b000
                AP 15 started at 0x563000
                AP 5 started at 0x53b000
                AP 10 started at 0x54f000
                AP 11 started at 0x553000
                AP 13 started at 0x55b000
                AP 6 started at 0x53f000
                AP 7 started at 0x543000
                AP 4 started at 0x537000
                AP 1 started at 0x52b000
                AP 2 started at 0x52f000
                AP 14 started at 0x55f000
                AP 12 started at 0x557000
                AP 3 started at 0x533000
        [ SMP ] All 16 APs are online now

Note that this is after a couple of reboots, so that particular image has problems - possibly related to #2252 . I'm not able to reproduce that with pthread-kernel-musl-1.1.24 which has a more recent musl; there we get an infinite crash loop, possibly related to the same issue, but it could also be an unresolved incompatibility wth more recent musl.

Steps to reproduce the last known working SMP setup:

git checkout origin/pthread-kernel
export INCLUDEOS_PACKAGE=$(nix-build)
nix-shell
make -j && grubify.sh hello_includeos.elf.bin
sudo qemu-system-x86_64 --nographic --enable-kvm -smp 16 -cpu host -hda hello_includeos.elf.bin.grub.img

@fwsGonzo
Copy link
Member

libc needs to enable locking for SMP to work properly, and that only happens when threads > 0 in musl. See:

https://github.com/includeos/IncludeOS/blob/master/src/kernel/threads.cpp#L19-L23

and

https://github.com/includeos/IncludeOS/blob/master/src/kernel/threads.cpp#L60-L70

This means supporting an arbitrary musl version is not straight-forward. We have to inspect the internal struct for each version in order to set current threads, and to also enable locking.

@MagnusS
Copy link
Member Author

MagnusS commented Jul 2, 2024

Yes, there's definitely an SMP issue here. I've added a few more commits to make it easier to debug:

  • enable SMP by default. We may want to add a parameter to the nix-package later so it's configurable again.
  • disable FOR_PRODUCTION by default and update the warning
  • include Alfred's cli;hlt commit in a_crash
  • add os::shutdown to example to save some pkill qemu invocations

It's now easy to reproduce with nix-build ./example.nix && qemu-system-x86_64 -kernel ./chainloader -initrd result/bin/hello_includeos.elf.bin -smp 4 -nographic:

[...]
      [ xAPIC ] Enabling xAPIC
                ID: 0  Ver: 50014
        [ x86 ] Enabling interrupts
__libc.can_do_threads: 65537  __libc.threaded: 0
__libc.threads_minus_1: 0 -> 1
        [ SMP ] Initializing APs
        [ SMP ] Starting APs
                AP 3 started at 0xa84000
                AP 2 started at 0xa04000
                AP 1 started at 0x984000



***** Lib** Libc crashed 😵 c ***
**crashed 😵 *** Libc crashed 😵 ***

With -smp 1 we get the expected:

================================================================================
 IncludeOS VERY_DIRTY (x86_64 / 64-bit)
 +--> Running [ Hello world - OS included ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +--> WARNING: No good random source found: RDRAND/RDSEED instructions not available.
 +-->        To make this warning fatal, re-compile with FOR_PRODUCTION=ON.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Args = result/bin/hello_includeos.elf.bin
Try giving the service less memory, eg. 10MB in vm.json
Service done. Shutting down...
     [ Kernel ] Stopping service
     [ Kernel ] Powering off

And with -enable-kvm -cpu host -smp 4. So far I haven't seen the libc crash message here.

     [ x2APIC ] Enabling x2APIC @ 0xfee00900
                APIC id: 0  ver: 50014
        [ x86 ] Enabling interrupts
__libc.can_do_threads: 65537  __libc.threaded: 0
__libc.threads_minus_1: 0 -> 1
        [ SMP ] Initializing APs
        [ SMP ] Starting APs
                AP 1 started at 0x984000
                AP 3 started at 0xa84000
                AP 2 started at 0xa04000

>>>> !!! CPU 1 EXCEPTION !!! <<<<
General Protection Fault (13)   EIP  0x2c5bb8   CODE 0

  RAX:  f000ff53f000ff53  R 8:  0000000000000080
  RBX:  0000000000000080  R 9:  0000000000000000
  RCX:  0000000000000000  R10:  8080808080808080
  RDX:  0000000000000000  R11:  0000000000a03a60

  RBP:  0000000000000000  R12:  00000000003ec614
  RSP:  0000000000843fd0  R13:  0000000000000000
  RSI:  0000000000a03e04  R14:  0000000000000000
  RDI:  0000000000000001  R15:  0000000080000010
  RIP:  00000000002c5bb8  FLA:  0000000000010202

  CR0:  0000000080000013  CR4:  0000000000000620
  CR1:  0000000000000000  CR8:  0000000000000000
  CR2:  0000000000000000  GDT:  0000000000323790 (41)
  CR3:  0000000000001000  IDT:  00000000003383c0 (2559)

@MagnusS
Copy link
Member Author

MagnusS commented Jul 3, 2024

I've experimented a bit more with this, but no solution yet. I'll include some notes below.

@fwsGonzo do you think changes are required for tls/threads setup in addition to enabling locks in the internal __libc struct? The code you linked uses the new threads code, which is not (yet) available v15/16.

In 57bdaf7 I updated the __libc struct to match changes in musl. With need_locks = 1 it still crashes with a CPU exception. I also get this crash with -enable-kvm -cpu host.

   [ I/O APIC ] Initializing ID=0
                Addr: 0xfec00000  Version: 0x20  Intr: 0  Entries: 24
      [ xAPIC ] Enabling xAPIC
                ID: 0  Ver: 50014
        [ x86 ] Enabling interrupts
__libc.can_do_threads: 1
__libc.threaded: 0
__libc.threads_minus_1: 0 -> 1
__libc.need_locks: 0 -> 1        [ SMP ] Initializing APs
        [ SMP ] Starting APs
                AP 1 started at 0x784000

>>>> !!! CPU 1 EXCEPTION !!! <<<<
General Protection Fault (13)   EIP  0x2c9407   CODE 0

  RAX:  f000ff53f000ff87  R 8:  0000000000000000
  RBX:  0000000000000026  R 9:  00000000002e0ae3
  RCX:  4b4d564b00000000  R10:  0000000000783db0
  RDX:  0000000000000010  R11:  0000000000783a60

  RBP:  0000000000783f00  R12:  00000000003ec610
  RSP:  0000000000943fd0  R13:  0000000000000000
  RSI:  4743544743544743  R14:  000000000037acc0
  RDI:  ffffffffffffffda  R15:  00000000003ec5e0
  RIP:  00000000002c9407  FLA:  0000000000000286

  CR0:  00000000e0000013  CR4:  0000000000000620
  CR1:  0000000000000000  CR8:  0000000000000000
  CR2:  f000ff53f000ff87  GDT:  0000000000323790 (41)
  CR3:  0000000000001000  IDT:  00000000003383c0 (2559)

With -smp 2 and gdb I've tracked it down to RNG::get().init() on my machine (in revenant_main on CPU 1), but I suspect it is actually caused by something not being set up properly in musl (based on @fwsGonzo's comment above).

If I add more cores (e.g. -smp 4) I get "No timeout" messages (which look like they come from the stubbed sys_futex) and a libc-crash. Perhaps we get contention on a lock when we add more cores and the crash just happens elsewhere. Or maybe we have multiple issues.

        [ x86 ] Enabling interrupts
__libc.can_do_threads: 1
__libc.threaded: 0
__libc.threads_minus_1: 0 -> 1
__libc.need_locks: 0 -> 1        [ SMP ] Initializing APs
        [ SMP ] Starting APs
                AP 1 started at 0x984000
                AP 2 started at 0xa04000
                AP 3 started at 0xa84000
No timeout

*** Libc crashed 😵 ***
No timeout
No timeout

I see a similar crash with -smp 2 on thev0.16.0-release branch with SMP enabled and musl 1.1.18:

        [ x86 ] Enabling interrupts
__libc.can_do_threads: 1  __libc.threaded: 0
__libc.threads_minus_1: 0 -> 1
        [ SMP ] Initializing APs
        [ SMP ] Starting APs
                AP 1 started at 0x77e000

>>>> !!! CPU 1 EXCEPTION !!! <<<<
General Protection Fault (13)   EIP  0x267e87   CODE 0

  RAX:  f000ff53f000ff97  R 8:  0000000000000000
  RBX:  0000000000000026  R 9:  00000000002d9786
  RCX:  4b4d564b00000000  R10:  000000000077ddb0
  RDX:  0000000000000010  R11:  000000000000001c

  RBP:  000000000077df00  R12:  000000000036a350
  RSP:  000000000063dfd0  R13:  0000000000000000
  RSI:  0000000000000000  R14:  000000000036fb40
  RDI:  ffffffffffffffda  R15:  000000000036a320
  RIP:  0000000000267e87  FLA:  0000000000010282

  CR0:  0000000080000013  CR4:  0000000000000620
  CR1:  0000000000000000  CR8:  0000000000000000
  CR2:  0000000000000000  GDT:  0000000000318a90 (41)
  CR3:  0000000000001000  IDT:  000000000032a4c0 (2559)

Testing with v0.15.x would require a working conan setup, which I don't have at the moment unfortunately.

@MagnusS
Copy link
Member Author

MagnusS commented Aug 15, 2024

I rebased on v0.16.0-release and reran the tests. The following tests are failing on this branch:

step 5.9: nix-shell --argstr unikernel ./test/kernel/integration/smp/ --run ./test.py
step 6.1: nix-shell --argstr unikernel ./test/net/integration/bufstore/ --run ./test.py
step 6.3: nix-shell --argstr unikernel ./test/net/integration/dns/ --run ./test.py
step 6.7: nix-shell --argstr unikernel ./test/net/integration/tcp/ --run ./test.py

@MagnusS
Copy link
Member Author

MagnusS commented Aug 16, 2024

With SMP=OFF there are other failures:

step 5.10: nix-shell --argstr unikernel ./test/kernel/integration/term/ --run ./test.py
step 6.2: nix-shell --argstr unikernel ./test/net/integration/configure/ --run ./test.py
step 6.3: nix-shell --argstr unikernel ./test/net/integration/dns/ --run ./test.py
step 6.4: nix-shell --argstr unikernel ./test/net/integration/icmp/ --run ./test.py
step 6.5: nix-shell --argstr unikernel ./test/net/integration/icmp6/ --run ./test.py
step 6.6: nix-shell --argstr unikernel ./test/net/integration/slaac/ --run ./test.py
step 6.7: nix-shell --argstr unikernel ./test/net/integration/tcp/ --run ./test.py

It seems that these failures are related to the new malloc-implementation in musl. Compiling with --with-malloc=oldmalloc makes all tests pass (with SMP=OFF).

With SMP=ON now only two tests fail:

step 5.9: nix-shell --argstr unikernel ./test/kernel/integration/smp/ --run ./test.py
step 6.1: nix-shell --argstr unikernel ./test/net/integration/bufstore/ --run ./test.py

Source for oldmalloc and mallocng (the default): https://git.musl-libc.org/cgit/musl/tree/src/malloc

I've pushed a new commit to use oldmalloc for now -- all tests should be passing with SMP=OFF

@alfreb
Copy link
Contributor

alfreb commented Aug 17, 2024

That's interesting!

I'm noticing that they have different locking mechanics, at least on the surface. oldmalloc calls __wait, in src/thread/__wait.c, which calls __syscall(SYS_futex, addr, FUTEX_WAIT ...), if libc.need_locks (callsite in src/malloc/oldmalloc/malloc.c).

It also calls __wake defined in src/internal/phread_impl.h and is an inline wrapper around __syscall(SYS_futex, addr, FUTEX_WAKE. (Callsite in oldmalloc/malloc.c)

The new mallocng calls

  • rdlock, wrlock, unlock from src/mallocng/glue.h
  • Which are wrappers around LOCK / UNLOCK, which are macros wrapping __lock, __unlock, from src/internal/lock.h
    • __lock calls __futexwait in src/thread/__lock.c, but again also only if libc.need_locks.
      • __futexwait wraps __syscall(SYS_futex, addr, FUTEX_WAIT, in a similar manner to __wait
    • __unlock implemented in the same file, calls __wake, if multithreaded, checked by dereferencing the lock and checking if it has a negative value. See this comment, /* Check l[0] to see if we are multi-threaded. */

So while these look very different, apart from the lock implementation being separated out in the new malloc, while the old one had them defined in malloc.c, I can't see any obvious difference in how they call into futex. Both implementations also call both brk and mmap.

@alfreb
Copy link
Contributor

alfreb commented Aug 17, 2024

Going back to the callstack I posted above, we see that strdup is called here:
https://github.com/MagnusS/IncludeOS/blob/musl1-2-5/src/platform/x86_pc/init_libc.cpp#L107

This is before libc is initialized, when we're setting things up for calling into it for the first time. So it's possible that the new malloc has dependency on some state that the old didn't have, and that is not ready before libc is initialized, causing one of the asserts in enframe to fail. We might want to check that malloc_context is properly zero-initialized as it seems to expect here: https://git.musl-libc.org/cgit/musl/tree/src/malloc/mallocng/malloc.c#n40

@MagnusS
Copy link
Member Author

MagnusS commented Aug 22, 2024

The bufstore-test only fails when spinlocks are used - both scoped_spinlock and regular lock/unlock spinlocks will crash. Removing the alignment here fixes the issue: https://github.com/MagnusS/IncludeOS/blob/musl1-2-5/api/smp_utils#L28

If I add the alignment in BufferStore class' private section instead with alignas(128) it also works, here: https://github.com/MagnusS/IncludeOS/blob/musl1-2-5/api/net/buffer_store.hpp#L89

Should we remove the alignment from the spinlock_t typedef or could there be other ways to fix this? How much do we gain by always aligning spinlock_t?

From the discussion here I'm not sure if __attribute__(aligned) has well defined behaviour for typedef's: https://stackoverflow.com/questions/15788947/where-can-i-use-alignas-in-c11/34006668#34006668

@MagnusS
Copy link
Member Author

MagnusS commented Aug 28, 2024

#2266 should fix the bufstore test.

The SMP failure is likely due to this issue, but may be other issues as well: #2268

@fwsGonzo
Copy link
Member

Modern Linux has some new system calls that may not be implemented. I don't know if that's related here, but a few that I can tihnk of:

clone3
futex64_time-something
futex with WAIT_BITSET && WAKE_BITSET

I've never really understood how to implement futex properly, and I suspect it's just not working right. Any code that leads to futex should be assumed to be problematic with newer libc's.

fwsGonzo and others added 10 commits September 7, 2024 18:02
- Copy patches from github.com/IncludeOS/musl, commit 4f331b7.
- Copy src/musl/nanosleep.ccp with clock_nanosleep from commit 9ceff78 in this repo.

Original author of both patches is fwsGonzo.
adds getrandom + stubs for fchdir, getrusage, sched_setaffinity, sched_setscheduler, sigaltstack
- stub faccessat, faccessat2, fchmodat2, pwritev2, preadv2, statx
- include time64 versions of calls to time functions.

As it looks like IncludeOS is already using 64 bit time on 32 bit,
the *_time64 calls just redirect to the existing calls. On 64 bit systems
there should be no change, as the regular calls are 64 bit.
This makes crash loop detection easier
Missing RDSEED/RDRAND will only emit warning, not terminate
Enable SMP by default.
@MagnusS
Copy link
Member Author

MagnusS commented Sep 7, 2024

I rebased on master and fixed the stack trace test to work with newer musl. With the latest fixes for SMP and the early calls into libc the tests are now passing. I tested both with SMP enabled and disabled.

$ ./test.sh
[...]
🌈✨ Everything is awesome ✨

@MagnusS MagnusS marked this pull request as ready for review September 7, 2024 16:44
@alfreb
Copy link
Contributor

alfreb commented Sep 7, 2024

Fantastic news! I'm adding cores to a few more integration tests to get a run where they not only have SMP enabled but also many cores to wake up. Will get back with review tomorrow.

@alfreb
Copy link
Contributor

alfreb commented Sep 7, 2024

Couldn't wait 😀 The extended test set from #2276 passes without issues. Everything is indeed awesome 💪

The only lingering question for me is why we need to use the old malloc implementation - I couldn't find any good reasons for it when reading the code, but we can get to that later. I've spawned an issue for it here #2277.

@alfreb alfreb merged commit 7d7f30d into includeos:v0.16.0-release Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants