Looking at the grub.cfg in the EFI partition I can see that there is provision for using PARTUUID’s instead of device names. However in the mender-convert code i cannot see anything that actually sets the mender_rootfsa_uuid and mender_rootfsb_uuid variables in grub.cfg. I assume this is because these options are driven from meta-mender and there is just not a direct option in mender-convert to activate this option yet. Is my assessment correct?
I assume i could just add these variables in with a a mender script hook, although might it be better to support an option in the config to enable the generation of these variables as part of the build
assume this is because these options are driven from meta-mender and there is just not a direct option in mender-convert to activate this option yet. Is my assessment correct?
looks like any required changes to work/boot and work/rootfs will need to happen as part of the mender-convert-package script as its only then that the PARTUUID’s will be known once the image partition layout is written out (GPT)
so this looks like it will comprise changes being made to grub.cfg in work/boot as already discussed and also fstab in work/rootfs as it references /boot/efi and /data by device name currently
presumably to be complete the references in /etc/fstab are going to need to be changed as well as I am seeing the ESP and data partitions referenced in mine. However this probably means that we need to allow the user to supply consistent partuuids to the build so that the mender artifact updates contain consistently the same partuuids between successive builds otherwise they change on each build.
As the partuuids need to be provided in the config to facilitate reproducible builds, this takes away the restriction of having to make the change in mender-convert-package script
in meta-mender it appears that the following variables are being used to contain paths to partuuids as well as devices, so propose we should use the same in mender-convert
I seem to have uncovered some sort of systemd issue with the data.mount service now that I’m using partuuid device paths in fstab. On the very first boot from a fresh image created, the data.mount unit now fails to mount with an error unable to find the device listed in fstab. On subsequent boots everything just works fine.
So initially I thought this was some kind of race condition between udev creating the device path symlinks and the data.mount systemd unit. However I have a service of my own that runs much early in the boot sequence that uses the exact same entry in fstab to check and fix gpt header errors and extend data partition to fill remaining block device on data partition, and this runs just fine every single time irrespective of whether its first boot or subsequent boots.
So as most examples of using partuuid don’t use the udev device path symlink in fstab, but instead use the key value pair notation I re-tested with this, and the problem no longer presented itself. So the use of the udev symlink paths in fstab seem to not fully be supported correctly under some scenarios/implementations. My hunch is there is possibly a bug in systemd/mount-generator that is not resolving the symlink correctly prior to internal usage as less eyes or on the use case, compared to key/value notation in some circumstances.
My proposal is that I provide a pull request to update the partuuid support recently just merged into master to change the fstab entries to use the key/value notation for partuuid use.
Will this cause and issue for any of your tooling that may happen to parse fstab for any reason? The reason I initially went with device paths was incase any tooling expected this field to be a path.
To further support the hunch, when you compare the status of the data.mount unit in the two scenarios, when using the udev device path by-partuuid symlink notation the ‘What’ is /dev/disk/by-partuuid/26445670-f37c-408b-be2c-3ef419866623 for when it fails, however when using the key/value pair notation the ‘What’ is /dev/sda4 meaning it has correctly mapped the partuuuid in fstab to device.
A possible explanation for the differences, is that systemd-fstab-generator that creates the mount units dynamically runs before either of the services and at that point the udev symlinks don’t exists. When a symlink is used in the mount unit, it cannot be resolved when systemd-fstab-generator runs hence the symlink is set as the What. At run time it just seems like the symlink is used as an absolute path to the block device without any further resolution of the link. When a PARTUUID key value pair is used a different code path is taken that yields the resolved absolute path to the real block device.
Thanks for the investigation. Could it be that systemd-fstab-generator is missing a dependency on the udev unit file? Could be a possible upstream fix, but I’m fine with fixing it the way you proposed as well. Mender does not use the information inside the fstab file, we get all our info from mender.conf.
Looks like I have been given the complete run-around by a spurious race-condition as after many runs I managed to create the same error with the key/value notation as well. After more investigation its starting to indeed look like a dependency issue. The /etc/fstab file created by mender-convert doesn’t set the fs_passno whereas all my servers virtual or physical are setting the fs_passno by default to do fsck checks in correlation with UUID= notation for partitions.
When the fs_passno is set then the systemd-fstab-generator created mount unit files have a dependency on the /lib/systemd/system/systemd-fsck@.service services which in-turn has a BindsTo= and After= dependency on the .device block device in question, so this is why I am seeing it work on all my servers.
Looks like the solution is to also add something like x-systemd.requires= with the by-partuuid device as the parameter which the documentation implies should give us the dependency between the mount unit and device unit without imposing the fs_passno on developers. I’ll give it a test and report back