There is an extra reboot during ArtifactReboot and it causes update failure

Hi,

System information
Board: Raspberry Pi 4 Model B
OS: Raspbian
mender-client: 2.6.0
mender-convert: 2.4.0

Description
I have been using mender-convert with my golden device based on Raspbian Buster Lite 2020-02-14. Recently some of the Raspberry Pi 4 I got require new firmware to boot. So I did a “apt upgrade” from the golden device. The following are some of the packages may be relevant.

raspberrypi-bootloader/testing,now 1:1.20210527-1 armhf [installed]
raspberrypi-kernel/testing,now 1:1.20210527-1 armhf [installed]
raspberrypi-sys-mods/testing,now 20210310 armhf [installed]
raspbian-archive-keyring/stable,now 20120528.2 all [installed]
raspi-config/testing,now 20210212 all [installed]

Based on the updated golden device, I integrated mender-client with mender-convert. Then I deployed a new device with the newly Mender-integrated OS image. Everything works fine. The device can boot normally.

The second test I did is to deploy a system update to this new device. I created the update based the newly upgraded golden device after adding a couple dummy files.

During ArtifactReboot, instead of booting to the other image like it supposed to do, I found a unclear dialog box (attached below) on the screen with “please reboot” text below it. And then it starts reboot the second time. The second reboot goes all the way to the user login without issue, then it starts reboot the third time and rollbacks to the old image. My understanding is the extra reboot make mender-client think ArtifactReboot has failed so the rollback is triggered.

I would really appreciate any help in understanding where this extra reboot come from and how to prevent it.

Lincoln Wang

Here is the mender deployment log

2021-06-12 05:06:35 +0000 UTC info: Running Mender client version: 2.6.0
2021-06-12 05:06:35 +0000 UTC info: State transition: update-fetch [Download_Enter] -> update-store [Download_Enter]
2021-06-12 05:06:35 +0000 UTC info: No public key was provided for authenticating the artifact
2021-06-12 05:06:35 +0000 UTC info: Opening device "/dev/mmcblk0p3" for writing
2021-06-12 05:06:35 +0000 UTC info: Native sector size of block device /dev/mmcblk0p3 is 512 bytes. Mender will write in chunks of 1048576 bytes
2021-06-12 05:25:46 +0000 UTC info: All bytes were successfully written to the new partition
2021-06-12 05:25:46 +0000 UTC info: The optimized block-device writer wrote a total of 8651 frames, where 8095 frames did need to be rewritten (i.e., skipped)
2021-06-12 05:25:47 +0000 UTC info: Wrote 9070298624/9070298624 bytes to the inactive partition
2021-06-12 05:25:47 +0000 UTC info: State transition: update-store [Download_Enter] -> update-after-store [Download_Leave]
2021-06-12 05:25:47 +0000 UTC info: State transition: update-after-store [Download_Leave] -> update-install [ArtifactInstall]
2021-06-12 05:25:47 +0000 UTC info: Enabling partition with new image installed to be a boot candidate: 3
2021-06-12 05:25:47 +0000 UTC info: State transition: update-install [ArtifactInstall] -> reboot [ArtifactReboot_Enter]
2021-06-12 05:25:47 +0000 UTC info: Rebooting device(s)
2021-06-12 05:25:47 +0000 UTC info: Mender rebooting from active partition: /dev/mmcblk0p2
2021-06-12 05:25:48 +0000 UTC info: Daemon terminated with SIGTERM
2021-06-12 05:26:00 +0000 UTC info: Running Mender client version: 2.6.0
2021-06-12 05:26:00 +0000 UTC info: State transition: init [none] -> after-reboot [ArtifactReboot_Leave]
2021-06-12 05:26:00 +0000 UTC error: transient error: Reboot to the new update failed. Expected "upgrade_available" flag to be true but it was false. Either the switch to the new partition was unsuccessful, or the bootloader rolled back
2021-06-12 05:26:00 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> rollback [ArtifactRollback]
2021-06-12 05:26:00 +0000 UTC info: Performing rollback
2021-06-12 05:26:00 +0000 UTC info: No update available, so no rollback needed.
2021-06-12 05:26:00 +0000 UTC info: State transition: rollback [ArtifactRollback] -> rollback-reboot [ArtifactRollbackReboot_Enter]
2021-06-12 05:26:00 +0000 UTC info: Rebooting device(s) after rollback
2021-06-12 05:26:00 +0000 UTC info: Mender rebooting from inactive partition: /dev/mmcblk0p2
2021-06-12 05:26:00 +0000 UTC info: Daemon terminated with SIGTERM
2021-06-12 05:26:13 +0000 UTC info: Running Mender client version: 2.6.0
2021-06-12 05:26:13 +0000 UTC info: Mender shut down in state: rollback-reboot
2021-06-12 05:26:13 +0000 UTC info: State transition: init [none] -> verify-rollback-reboot [ArtifactRollbackReboot_Leave]
2021-06-12 05:26:13 +0000 UTC info: State transition: verify-rollback-reboot [ArtifactRollbackReboot_Leave] -> after-rollback-reboot [ArtifactRollbackReboot_Leave]
2021-06-12 05:26:13 +0000 UTC info: State transition: after-rollback-reboot [ArtifactRollbackReboot_Leave] -> update-error [ArtifactFailure]
2021-06-12 05:26:13 +0000 UTC info: State transition: update-error [ArtifactFailure] -> cleanup [Error]
2021-06-12 05:26:13 +0000 UTC info: State transition: cleanup [Error] -> update-status-report [none]

It turns out the extra reboot has nothing to do with updated raspberry pi firmware.

After I cloned the OS image from the golden device, I used pishrink to shrink the image (I usually don’t do this but my hard drive is running out of space) before running mender-convert. Without passing any argument, pishrink will modify /etc/rc.local to expand rootfs in first boot, and a reboot is triggered in the function (PiShrink/pishrink.sh at 9d9c0dd32cd988f9c3bd9953ca245d33a52fc9e3 · Drewsif/PiShrink · GitHub). Therefore, the Mender artifact created from this modified golden image will reboot an extra time during ArtifactReboot and cause Mender update failure.

So if someone need to shrink their OS image with pishrink before running mender-convert, use option -s to avoid calling the rootfs expansion function.

2 Likes

Thanks for the report