Migrating from yocto dunfell to kirkstone causes RPI3B+ devices to hang on OTA update

I’m having issues with migrating in-field raspberry pi 3B+s over from Dunfell to Kirkstone yocto builds. They are currently running on kernel 4.14.114 and I’m trying to migrate them to 5.15.36. I’m using u-boot for my bootloader and have not had any issue when flashing a fresh pi with the Kirkstone built image.

I’ve enabled uart console output on my RPI and I find that I’m getting stuck as soon as the kernel is loaded:

U-Boot> boot
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1…
Found U-Boot script /boot.scr
300 bytes read in 1 ms (293 KiB/s)
## Executing script at 02400000
switch to partitions #0, OK
mmc0 is current device
7253840 bytes read in 302 ms (22.9 MiB/s)
## Booting kernel from Legacy Image at 00080000 …
Image Name: Linux-5.15.34-v7
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 7253776 Bytes = 6.9 MiB
Load Address: 00008000
Entry Point: 00008000
Verifying Checksum … OK
## Flattened Device Tree blob at 2eff9600
Booting using the fdt blob at 0x2eff9600
Loading Kernel Image
Using Device Tree in place at 2eff9600, end 2f002f6d

Starting kernel …

I believe the issue lies in some sort of uboot configuration in the Dunfell image that isn’t compatible with the kernel in the Kirkstone image. I’m not familiar with the boot sequence of u-boot and Linux so I’d like to see if anyone else has been able to reproduce this or has any insight as to what might be going wrong.

In case it’s relevant, I’ve also done a diff printout of the difference between the uboot environments of the Dunfell image and the Kirkstone image, respectively, when they are freshly flashed. Here’s what I found
< shows Dunfell uboot env, > shows Kirkstone uboot env

10c10,11
< boot_efi_binary=if fdt addr ${fdt_addr_r}; then bootefi bootmgr ${fdt_addr_r};else bootefi bootmgr ${fdtcontroladdr};fi;load ${devtype} ${devnum}:${distro_bootpart} ${kernel_addr_r} efi/boot/bootarm.efi; if fdt addr ${fdt_addr_r}; then bootefi ${kernel_addr_r} ${fdt_addr_r};else bootefi ${kernel_addr_r} ${fdtcontroladdr};fi
---
> boot_efi_binary=load ${devtype} ${devnum}:${distro_bootpart} ${kernel_addr_r} efi/boot/bootarm.efi; if fdt addr ${fdt_addr_r}; then bootefi ${kernel_addr_r} ${fdt_addr_r};else bootefi ${kernel_addr_r} ${fdtcontroladdr};fi
> boot_efi_bootmgr=if fdt addr ${fdt_addr_r}; then bootefi bootmgr ${fdt_addr_r};else bootefi bootmgr;fi

17,18c18
< boot_targets=mmc0 mmc1 usb0 pxe dhcp
< bootargs=8250.nr_uarts=1 bcm2708_fb.fbwidth=656 bcm2708_fb.fbheight=416 bcm2708_fb.fbswap=1 vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000 dwc_otg.lpm_enable=0 console=ttyS0,115200 rootfstype=ext4 rootwait root=${mender_kernel_root}
---
> boot_targets=mmc0 mmc1 mmc2 usb0 pxe dhcp

20c20
< bootcmd_dhcp=run boot_net_usb_start; if dhcp ${scriptaddr} ${boot_script_dhcp}; then source ${scriptaddr}; fi;setenv efi_fdtfile ${fdtfile}; if test -z “${fdtfile}” -a -n “${soc}”; then setenv efi_fdtfile ${soc}-${board}${boardver}.dtb; fi; setenv efi_old_vci ${bootp_vci};setenv efi_old_arch ${bootp_arch};setenv bootp_vci PXEClient:Arch:00010:UNDI:003000;setenv bootp_arch 0xa;if dhcp ${kernel_addr_r}; then tftpboot ${fdt_addr_r} dtb/${efi_fdtfile};if fdt addr ${fdt_addr_r}; then bootefi ${kernel_addr_r} ${fdt_addr_r}; else bootefi ${kernel_addr_r} ${fdtcontroladdr};fi;fi;setenv bootp_vci ${efi_old_vci};setenv bootp_arch ${efi_old_arch};setenv efi_fdtfile;setenv efi_old_arch;setenv efi_old_vci;
---
> bootcmd_dhcp=devtype=dhcp; run boot_net_usb_start; if dhcp ${scriptaddr} ${boot_script_dhcp}; then source ${scriptaddr}; fi;setenv efi_fdtfile ${fdtfile}; if test -z “${fdtfile}” -a -n “${soc}”; then setenv efi_fdtfile ${soc}-${board}${boardver}.dtb; fi; setenv efi_old_vci ${bootp_vci};setenv efi_old_arch ${bootp_arch};setenv bootp_vci PXEClient:Arch:00010:UNDI:003000;setenv bootp_arch 0xa;if dhcp ${kernel_addr_r}; then tftpboot ${fdt_addr_r} dtb/${efi_fdtfile};if fdt addr ${fdt_addr_r}; then bootefi ${kernel_addr_r} ${fdt_addr_r}; else bootefi ${kernel_addr_r} ${fdtcontroladdr};fi;fi;setenv bootp_vci ${efi_old_vci};setenv bootp_arch ${efi_old_arch};setenv efi_fdtfile;setenv efi_old_arch;setenv efi_old_vci;

22a23
> bootcmd_mmc2=devnum=2; run mmc_boot

27d27
< bootfstype=fat

30d29
< devplist=1

35c34
< fdt_addr=2eff9600
---
> fdt_addr=2eff7b00

38,39c37
< fdtaddr=2eff9600
< fdtcontroladdr=3b3cd030
---
> fdtcontroladdr=3b3db5b0

41,42d38
< fileaddr=2400000
< filesize=12c

46c42
< loadaddr=0x00200000
---
> loadaddr=0x1000000

54d49
< mender_saveenv_canary=1

66,67c61,62
< scan_dev_for_efi=setenv efi_fdtfile ${fdtfile}; if test -z “${fdtfile}” -a -n “${soc}”; then setenv efi_fdtfile ${soc}-${board}${boardver}.dtb; fi; for prefix in ${efi_dtb_prefixes}; do if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${efi_fdtfile}; then run load_efi_dtb; fi;done;if test -e ${devtype} ${devnum}:${distro_bootpart} efi/boot/bootarm.efi; then echo Found EFI removable media binary efi/boot/bootarm.efi; run boot_efi_binary; echo EFI LOAD FAILED: continuing…; fi; setenv efi_fdtfile
< scan_dev_for_extlinux=if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${boot_syslinux_conf}; then echo Found ${prefix}${boot_syslinux_conf}; run boot_extlinux; echo SCRIPT FAILED: continuing…; fi
---
> scan_dev_for_efi=setenv efi_fdtfile ${fdtfile}; if test -z “${fdtfile}” -a -n “${soc}”; then setenv efi_fdtfile ${soc}-${board}${boardver}.dtb; fi; for prefix in ${efi_dtb_prefixes}; do if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${efi_fdtfile}; then run load_efi_dtb; fi;done;run boot_efi_bootmgr;if test -e ${devtype} ${devnum}:${distro_bootpart} efi/boot/bootarm.efi; then echo Found EFI removable media binary efi/boot/bootarm.efi; run boot_efi_binary; echo EFI LOAD FAILED: continuing…; fi; setenv efi_fdtfile
> scan_dev_for_extlinux=if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${boot_syslinux_conf}; then echo Found ${prefix}${boot_syslinux_conf}; run boot_extlinux; echo SCRIPT FAILED: continuing…; fi

Hello @ygu,

Maybe it is related to the workaround we have described here?

Greetz,
Josef

Hi @TheYoctoJester,

Thanks for the response. I tried it with rpi-update-firmware and it works. However, like the disclaimer said, it’s a risky operation and when I rebooted the device before it had a chance to check into mender and commit the update, the unit could not book back into the old rootfs partition.

I was hoping that there was a way to upgrade that didn’t involve modifying u-boot but based on the link you shared it sounds like it’s unavoidable?

Best,
Yutong

Hi @ygu,

Yes, in this particular case it is unavoidable. This stuff being required to sit in the boot partition is one of the prime reasons why the Raspberry Pi is not exactly a platform that is well suited for industrial use cases. The CM versions that bring an eMMC can already improve things, and you can then fine tune things quite a bit - but in the end, the RasPis boot process just is not designed for updates and fallback. Sorry.

Greetz,
Josef

From what I’ve gathered, there are DTBs in the boot partition of the raspberry pi that got packaged in with the original kernel it was built with. There are also DTBs the rootfs partition of the new image built that should be used with the new kernel. However, they aren’t being loaded and uboot is always going for the one in the boot partition which never gets updated.

This is where it gets unclear for me, is it not possible to tell uboot to load from the DTB in the rootfs partition instead of the boot partition?

Technically speaking I don’t see why it should not be possible (I have not tried myself) but for sure you will need to patch u-boot to get this behavior.

We avoid to differ too much from upstream as the best practice is to stick as close as the official repositories from the project are, and try to keep the patching as minimum as possible.

I was not part of the group who decided how to patch the bootloader for Dunfell or Kirkstone, still I would imagine this was the logic when it was implemented and then the subsequent advice @TheYoctoJester made here Migrating from yocto dunfell to kirkstone causes RPI3B+ devices to hang on OTA update - #3 by TheYoctoJester

Did you try already this flag?

Yes the flag works but I don’t like the idea of updating the bootloader/DTB in the boot partition since it increases the chances that the unit may brick.

For instance, when I reboot the unit before it’s had a chance to check into mender, it causes the device to revert which (I assume) causes the new DTB to no longer match the old kernel resulting in it failing to boot. Additionally, I’m worried that rebooting the device while in the middle of updating the DTB/uboot will cause it to brick too.

Is there any way around this to get the chances of bricking down to zero (or as close to zero as possible)?

Hi ygu,

Did you find a way to reduce the risk of turning your RPi3 into a brick? I’m facing the exact same issue here, and have some beta clients that are using our products with RPi3 inside, so I’m a bit stressed with the idea of breaking their products… At the same time, we need to develop a new image, as we already did for main customers, for whome their products use RPi-CM4, and don’t have this problem.

Unfortunately no, I haven’t. I’m in a similar position and I can certainly relate to your pain. For now we’ve decided to keep using Dunfell and manage two branches of our yocto project, one on Dunfell for an our current customers and another on Kirkstone for if we ever decide to switch over and make it available to newer customers. It’s certainly not ideal so I’m still interested in figuring out a better way to migrate.

One idea I had is to create a ArtifactInstall_Leave script which copies over the dtb in the rootfs of the new firmware over to the boot partition under a new name such as ‘bcm2837-rpi-3-b.dtb.new.’ Then it would modify the u-boot environment scripts to attempt to load the dtb with the ‘.new’ suffix if it exists (this can be done using the fw_setenv utility available in Linux). If there does exist a dtb with the ‘.new’ suffix, it would also set an environment variable flag to indicate that it’s in the process of migrating DTBs.

If the new kernel boots up properly, it will execute an ArtifactCommit_Leave script that would 1) copy the bcm2837-rpi-3-b.dtb.new to the old bcm2837-rpi-3-b.dtb and delete the ‘.new’ dtb, 2) unset the environment variable flag that was previously set by the old kernel.

If uboot boots up again and it sees that the environment variable flag is already set, it unsets it and uses the original DTB since it would mean that it did not successfully boot into the new kernel. The .new dtb should also be deleted by uboot or by the original rootfs’ mender on ArtifactFailure_Enter.

Of course, this approach makes changes to the uboot environment which can introduce risk. To mitigate, you can take extra precautionary steps by storing the original value in some of the environment scripts, checking that edits are made properly by comparing hashes, and reverting if not. I suppose that there is a risk that the device is rebooted in the middle of updating the uboot environment. I’m not sure how significant that is though, perhaps someone wiser than me can answer that one.

I’m also not sure if making such changes to the uboot environment variables would break anything in Mender so it’s best to keep changes to a minimum (which this approach takes into account).

I’m curious to get someone from Mender’s take on this workaround. @lramirez , @TheYoctoJester, Could this work?

Hi @ygu,

Thanks a lot for your input. It seems to be a complicate process to make sure that the RPi won’t turn into a brick, and as you mentioned, what if something happens during the modification of uboot files (chances are very low I think, but still). In our case, I think that I will take the risk of just adding INHERIT += ‘rpi-update-firmware’ as suggested by Mender. It only concerns 30-40 devices, and all of them are beta users that we personnaly know, so it won’t be too difficult to help them if necessary.
Thanks for your help anyway, I hope you’ll find a good solution for your problem.