Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mkosi v15+: mkosi.extra/boot/ files missing in /boot, breaks incremental update_existing_rootfs() #88

Open
marc-hb opened this issue Dec 18, 2024 · 10 comments
Assignees

Comments

@marc-hb
Copy link
Collaborator

marc-hb commented Dec 18, 2024

update_existing_rootfs() currently relies on /boot/System.map-N.M being located on the main partition. When it's not, the "incremental" build fails like this:

not found: ./qbuild/mnt/boot/System.map-6.12.0. Try rebuilding with '-r img'

The -r img workaround is correct but obviously much slower.

Note there are multiple places where the ESP partition can be mounted: notably /efi or /boot. Fedora+mkosi seems to always use /efi by default?

https://wiki.archlinux.org/title/EFI_system_partition#Typical_mount_points

cc:

@marc-hb marc-hb self-assigned this Dec 18, 2024
@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 18, 2024

I can reproduce as early as mkosi v15. This was likely caused by the v15 switch to systemd-repart, see giant commit

systemd/mkosi@8bbbd836078a2 "Migrate disk image building to systemd-repart"

Because we don't know up-front anymore where the ESP partition will be
mounted, all boot loader files are installed to /boot. So to populate
an ESP partition, you'd use "CopyFiles=/boot:/" in the partition
definition file of the ESP partition.

@marc-hb

This comment was marked as resolved.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 18, 2024

I think we just need a systemd-repart configuration. It felt great to avoid one and just rely entirely on mkosi defaults but that's just too "volatile" and unpredictable for something like update_existing_rootfs(). Even if update_existing_rootfs() could get smarter and dynamically adjust its System.map logic now to various partition schemes, it would break again somewhere else or for some other, random mkosi version. So let's just bite the systemd-repart configuration bullet. I took a look and it does not look like rocket science. Also, it's still possible to leave a lot of things as default in such a configuration.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 18, 2024

I think we just need a systemd-repart configuration.

... or maybe not. Maybe that's not required after all...

One burning question is: what is the -F System.map argument trying to achieve? It came with the addition of the depmod invocation in commit 2ed0ed3. man depmod says:

       -F, --filesyms System.map
           Supplied with the System.map produced when the kernel was built, this allows the -e
           option to report unresolved symbols. This option is mutually incompatible with -E.

But -e is not currently used! So, -F does nothing at all ?

Also: when invoked by update_existing_rootfs(), setup_depmod() seem to look at the OLD System.map file? This re-enforces the suspicion that it does nothing :-D

Could this -F be another instance of trying to port to mkosi v15+ another update_existing_rootfs() feature that never actually worked with v14- in the first place? Like #76. If yes then let's just (temporarily) delete it to unblock the migration to v15+

Generally speaking, porting to mkosi v15+ is really hard without a clear picture of what: 1) code was supposed to do with mkosi v14- in the first place 2) what it was actually achieving with v14-.

Other complications:
The kernel and the initrd live in potentially 3 different places. Even with a fresh build from scratch, all these have a different initrd file :-(

Status with mkosi v14- and Fedora 40 (v15+ has significant differences)

  • mkosi.extra/usr/lib/modules/6.12.0-dirty/vmlinuz # used when booting with --direct-kernel = the default option
  • mkosi.extra/boot/vmlinuz-6.12.0-dirty # yet another duplicate, yeah! Staging for /boot/
  • ESP partition # usually mounted at /efi, used when booting with --no-direct-kernel
    • 5248fff44e974fce9cc88b89875eb063/6.12.0/linux # usual bzImage. This copy is NOT updated by the update_existing_rootfs() shortcut. Gone or moved with v15+
    • EFI/Linux/linux.efi # copy of the above. systemd-boot default. NOT actually a UKI! Not even an .EFI binary! This generates a bootctl warning. Created and updated by update_rootfs_boot_kernel(): still there with v15+ (with a slightly different name) and still the systemd-boot default. Fixes and renames submitted in EFI System Partition cleanups #98
    • EFI/Linux/mkosi-fedora-6.12.efi # all-in-one UKI with initrd included. Unreliable with mkosi v14? Can be just ignored.
  • /boot on the root partition: the usual vmlinuz+initrd with v14- thanks to install_build_initrd() / make_install_kernel(); EMPTY with mkosi v15+!! Never used at boot time, only at later modprobe time? vmlinuz does get updated by update_existing_rootfs()

The situation with modules is similar but even more varied because in addition to being embedded in initrd files, modules are also in /lib/modules/. Business as usual.

marc-hb added a commit to marc-hb/run_qemu that referenced this issue Dec 18, 2024
@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 19, 2024

Simply dropping the -F System.map argument is enough to build and boot with mkosi v15 https://github.com/pmem/run_qemu/actions/runs/12402741008/job/34624880942?pr=90

@stellarhopper , @weiny2 could you test that -F System.map drop more extensively? I mean with some actual kernel and module changes...

@marc-hb marc-hb changed the title mkosi v15+: mkosi.extra/boot/ files moved to not mounted ESP partition, breaks incremental update_existing_rootfs() mkosi v15+: mkosi.extra/boot/ files missing in /boot, breaks incremental update_existing_rootfs() Dec 19, 2024
@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 20, 2024

I did a lot more testing and dropping "-F System.map" is not good enough. It's just shooting the messenger. It's a "also guilty" messenger but still just a messenger. Dropping "-F System.map" fixes the build but hides a bigger missing /boot problem.

Here's the situation with mkosi v15+ if we drop "-F System.map"

  1. run_qemu.sh from scratch; invokes mkosi:
    /boot/ is totally empty
  2. run_qemu.sh not from scratch: mkosi not used, update_init_rootfs() run instead:
    /boot/ has the latest vmlinuz

The above tested with both v15 and v23.

I think it's better to fail with this "system.map" error message because it can lead people to this bug and issue until the real /boot/ problem is actually fixed rather than silently give them an empty and then mostly empty /boot/ while pretending everything looks fine.

@marc-hb marc-hb assigned stellarhopper and unassigned marc-hb Dec 20, 2024
marc-hb added a commit to marc-hb/run_qemu that referenced this issue Jan 3, 2025
This does not fix pmem#88 at all but clears some of the confusion around it.

Fixes:

```
$ file qbuild/mkosi.extra/boot/*

qbuild/mkosi.extra/boot/System.map:
    broken symbolic link to ./qbuild/mkosi.extra/boot/System.map-6.12.0-dirty
qbuild/mkosi.extra/boot/vmlinuz:
    broken symbolic link to ./qbuild/mkosi.extra/boot/vmlinuz-6.12.0-dirty
```

Fixes commit f9d7330 ("run_qemu: work around new systemd-based
installkernel") which added these symbolic links. They never worked.

Signed-off-by: Marc Herbert <[email protected]>
@stellarhopper
Copy link
Member

Hm, didn't mean to close this - I guess it auto-closed because of the mention in #98

@stellarhopper stellarhopper reopened this Jan 3, 2025
@marc-hb
Copy link
Collaborator Author

marc-hb commented Jan 3, 2025

I guess it auto-closed because of the mention in #98

Most likely yes, please upvote https://github.com/orgs/community/discussions/17308 (and duplicates...)

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jan 4, 2025

Dropping "-F System.map" fixes the build but hides a bigger missing /boot problem.

So the key question is: does anyone or anything uses /boot?

/boot inside the image is used by neither --direct-kernel nor by --no-direct-kernel right now. The former uses the kernel and initrd outside the image. The latter uses the /efi partition.

Maybe /boot/ was used in older, GRUB times but not anymore now? @stellarhopper, @weiny2 , any memories?

If /boot is not used or not used anymore, then we can drop /boot entirely, point -F System.map somewhere else and the problem should be solved!

@stellarhopper
Copy link
Member

@marc-hb yeah I'm pretty sure this is true - /boot is just a holdover from grub days, and likely can be removed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants