Failure to boot after installation of standalone artifact

I have a working mender image of Debian Stretch 9.4, which is running on a Raspberry Pi 3 with a 128 GB SD card. The mender image was created using mender-convert 2.5.0, and is running client 2.6.1.
After standalone installing a Debian Bullseye mender artifact, the system fails to boot from the new rootfs partition. It just hangs there and displays the rainbow screen.

I’ve done a lot of investigating and have determined a few things:

  • At boot of the new rootfs partition, the Pi is not reading the SD. There’s a quick 2 or 3 flashes of the ACT led, but then nothing.
  • I examined the artifact with 7zip, and directory structure looked like what I expected.
  • Same for the rootfs partition containing the installed artifact, I mounted it and and the directory structure looked good.
  • The rootfs partition showed no system log entries at all after the failure, which is consistent with the ACT led observation that the SD was not read by the Pi.
  • Examined the uboot partition and did not see any logging to indicate a problem.
  • As an experiment, I committed the artifact install, and it failed to boot in the same way as previously.

One thing I have not done is burn the full Debian Bullseye mender image, and verify it will boot up successfully. That is my next test.

Is this boot failure a familiar issue? Related maybe to the mender-convert or client versions I am running? The behavior seems like the new rootfs partition is not readable by the SD, possibly pointing to the Bullseye artifact itself or the Debian partition as cause. I appreciate any suggestions, thanks.

The Debian filesystem looks like this:

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
mmcblk0     179:0    0 119.1G  0 disk
├─mmcblk0p1 179:1    0    44M  0 part /uboot
├─mmcblk0p2 179:2    0   8.3G  0 part /
├─mmcblk0p3 179:3    0   8.3G  0 part
└─mmcblk0p4 179:4    0 102.5G  0 part /data

The Debian Bullseye artifact install:

test@im3-rack1:~ $ sudo mender -install /media/usb0/bullseye_image-raspberrypi3-mender.mender
INFO[0000] Loaded configuration file: /var/lib/mender/mender.conf
WARN[0000] No server URL(s) specified in mender configuration.
WARN[0000] Server entry 1 has no associated server URL.
INFO[0000] Mender running on partition: /dev/mmcblk0p2
INFO[0000] Start updating from local image file: [/media/usb0/bullseye_image-raspberrypi3-mender.mender]
Installing Artifact of size 929918976...
INFO[0000] No public key was provided for authenticating the artifact
INFO[0000] Opening device "/dev/mmcblk0p3" for writing
INFO[0000] Native sector size of block device /dev/mmcblk0p3 is 512 bytes. Mender will write in chunks of 1048576 bytes
.............................................................. - 100 %
INFO[0376] All bytes were successfully written to the new partition
INFO[0376] The optimized block-device writer wrote a total of 4026 frames, where 12 frames did need to be rewritten (i.e., skipped)
INFO[0377] Wrote 4221370368/4221370368 bytes to the inactive partition
INFO[0377] Executing script: Download_Leave_01
INFO[0379] Collected output (stderr) while running script /etc/mender/scripts/Download_Leave_01
artifact_name=rack1_data_v6_runonce: Running Download_Leave_01
/etc/wpa_supplicant/* to newroot partition
/etc/dhcpcd.conf to newroot partition
/etc/hostname to newroot partition
/etc/hosts to newroot partition

---------- end of script output
INFO[0379] Enabling partition with new image installed to be a boot candidate: 3
Use -commit to update, or -rollback to roll back the update.
At least one payload requested a reboot of the device it updated.
test@im3-rack1:~ $

The changes I made to mender_convert_config for the Stretch conversion. The Bullseye artifact is using client 3.0.0:

 - MENDER_ENABLE_SYSTEMD=n
 - MENDER_STORAGE_TOTAL_SIZE_MB="121940" 
 - MENDER_DATA_PART_SIZE_MB="104904" 
 - MENDER_CLIENT_INSTALL="y"  
 - MENDER_CLIENT_VERSION="2.5.0"

I have done some follow-up testing and confirmed the Debian Bullseye mender image is good. I was able to boot and run that image with no issues. So, I would expect the Bullseye artifact to be good as well.

I also re-tested artifact installation with a new SD card, to rule out a hardware issue. The system (a mender converted Buster image) booted and ran fine from its original /dev/mmcblk0p2. But once I installed the Bullseye artifact the same boot failure occurred. Subsequent reboot brought the system back up on mmcblk0p2, so U-boot is doing its job with recovering from a failed boot.

After the fallback to mmcblk0p2, I mounted mmcblk0p3 to examine the rootfs written to the partition from the artifact install. I saw that the /proc directory in this rootfs was empty. I guess that makes sense with a failed boot, as this vfs is populated at boot time. Got me to wondering if there is a race condition of some type. So I am going to increase the U-boot bootdelay so I can interrupt and examine the boot variables, maybe something is not right in that area.

This boot failure of an artifact install is happening every time for me. I’ve tested with installing a Buster artifact on a Debian system, a Bullseye artifact on a Debian system, and a Bullseye artifact on a Buster system. The original mender converted images run fine, but after installing the artifact, boot from mmcblk0p3 always fails and I’m stuck at the rainbow screen. I appreciate any suggestions for other next steps. Thx.

So I have narrowed things down somewhat. I decided to do a test where the artifact I use for the install is of the same OS as the running image. Bullseye image and artifact were handy, so I used them. Burned and booted up the Bullseye mender image, and then I did a mender update using the bullseye artifact. And to my surprise, the system rebooted and is now running from partition 3.

Seems like the issue is related to when the source OS for the artifact and image are not the same.