Can anyone elaborate on that? I would like to learn in which way Mender interfaces and affects the A/B slot mechanism in the NVIDIA bootloader chain. Also, I would like to learn how system boots works in general on that platform. I didn’t quite grok the vanilla A/B slot system, let alone the changes due to Mender.
We are observing the following symptoms:
Mender update fails, when nvbootctrl get-current-slot returns the wrong slot.
The values returned by the nvbootctrl commands change with every boot. They follow the following cyclic pattern:
Boot 1: nvbootctrl get-current-slot : 0 ; Priority of slot 0: 14 ; Priority of slot 1: 14 ;
Boot 2: nvbootctrl get-current-slot : 0 ; Priority of slot 0: 14 ; Priority of slot 1: 15 ;
Boot 3: nvbootctrl get-current-slot : 1 ; Priority of slot 0: 13 ; Priority of slot 1: 15 ;
Boot 4: nvbootctrl get-current-slot : 1 ; Priority of slot 0: 15 ; Priority of slot 1: 14 ;
In each and every of these reboots,
the machine booted from slot 0 according to both fw_printenv mender_boot_part and findmnt /.
retry_count as reported by nvbootctrl dumps-slots-info was 7 for both slots
boot_successful as reported by nvbootctrl dumps-slots-info was 1 for both slots
I don’t really understand how Mender is going to be involved on standalone reboots, since the scripts here should only execute as a part of an update or rollback. So I think the issue to understand is why are slot numbers changing with normal reboots.
It would be interesting to try to reproduce this phenomenon on a build without mender installed at all, for instance stock L4T from NVIDIA or tegra demo distro with tegrademo instead of tegrademo-mender.
Incidentally, we just noticed this same issue on one device today after using mender updates successfully on several devices for months.
A bit more about the difference between u-boot and cboot interaction:
With u-boot the boot partition is chosen here based on mender variables set in uboot environment in response to fw_printenv and fw_setenv from the mender client.
So in a uboot based implementation if software outside mender (ie the nvidia bootloader) decides to switch between between boot slots using the logic in the Update Engine State Machine this will mean there will be a mismatch between the bootloader and the root filesystem and this is what the error is telling you. This would obviously be an issue if it happened across any update which attempted to update the bootloader.
In a cboot based system, the bootloader and root filesystem will always be in sync, so if the Update Engine State Machine decides to roll back to a different boot slot it will also roll back the root filesystem. You won’t get the mismatch error you see with uboot in this case but you will be running a rootfs and bootloader version you likely didn’t expect.
So in both cases it’s bad that the update engine is deciding to switch boot slots on its own, but it’s likely more problematic for uboot.
Created an issue at https://github.com/OE4T/meta-mender-community/issues/7 to track how to deal with this. I’d like to understand whether PREFERRED_PROVIDER_virtual/bootloader = "cboot-prebuilt" is a possible solution, or at least makes the problem easier to deal with.