Device going into reboot loop after upgrade

During the upgrade process, one of my devices ended up in a reboot loop.
Mender seems to be stuck in a state that it doesn’t recover from:

time="2025-02-24T22:33:49+01:00" level=info msg="Mender running on partition: /dev/mmcblk0p7"
time="2025-02-24T22:33:50+01:00" level=info msg="State transition: init [none] -> init [none]"
time="2025-02-24T22:33:50+01:00" level=info msg="Handling loaded state: reboot"
time="2025-02-24T22:33:50+01:00" level=info msg="Running Mender client version: 3.5.1-dirty"
time="2025-02-24T22:33:50+01:00" level=info msg="State transition: init [none] -> after-reboot [ArtifactReboot_Leave]"
time="2025-02-24T22:33:50+01:00" level=error msg="transient error: Reboot to the new update failed. Expected \"upgrade_available\" flag to be true but it was false. Either the switch to the new partition was unsuccessful, or the bootloader rolled back"
time="2025-02-24T22:33:50+01:00" level=info msg="State transition: after-reboot [ArtifactReboot_Leave] -> rollback [ArtifactRollback]"
time="2025-02-24T22:33:50+01:00" level=info msg="Performing rollback"
time="2025-02-24T22:33:50+01:00" level=info msg="No update available, so no rollback needed."
time="2025-02-24T22:33:50+01:00" level=info msg="State transition: rollback [ArtifactRollback] -> rollback-reboot [ArtifactRollbackReboot_Enter]"
time="2025-02-24T22:33:50+01:00" level=info msg="Rebooting device(s) after rollback"
time="2025-02-24T22:33:50+01:00" level=info msg="Mender rebooting from inactive partition: /dev/mmcblk0p7"
time="2025-02-24T22:34:02+01:00" level=info msg="Daemon terminated with SIGTERM"

Mender running on partition p7 and mender rebooting from inactive partition p7?
That seems to be a problem.
Any idea on how this state can be avoided?

mender --version
3.5.1-dirty	runtime: go1.22.2

running on a custom board integration.

Thanks,
Stephan

Hi @stwirth,

Can you share a few more details? Especially, what partitions are configured as A and B, plus what gets identified and set during the upgrade process? This should be visible in the log before the given snippet.
Plus, which kind of boot integration? Standard-ish u-boot or grub, or is there something special involved?

Greetz,
Josef

Hi @TheYoctoJester,

A/B partitions are p7/p8.
There is not much more information in the logs before “Mender running on partition…”:

time="2025-02-25T20:05:43+01:00" level=info msg="Loaded configuration file: /etc/mender/mender.conf"
time="2025-02-25T20:05:43+01:00" level=info msg="'UpdateControlMapExpirationTimeSeconds' is not set in the Mender configuration file. Falling back to the default of 2*UpdatePollIntervalSeconds"
time="2025-02-25T20:05:43+01:00" level=info msg="'UpdateControlMapBootExpirationTimeSeconds' is not set in the Mender configuration file. Falling back to the default of 600 seconds"

We’re using u-boot:
u-boot env:

# printenv
altbootcmd=run mender_altbootcmd; run bootcmd
baudrate=115200
bootargs=loglevel=7 console=ttyS1,115200 mem=128M@0x0  rootfstype=ext4 root=${mender_kernel_root} rootdelay=3 rw
bootcmd=gpio set 53; gpio set 35; gpio set 13; run mender_setup; ext2load ${mender_uboot_root} 0x80a00000 /boot/kernel; bootm 0x80a00000
bootcount=1
bootdelay=1
bootlimit=1
ethact=GMAC-9161
ethaddr=00:11:22:33:44:55
gatewayip=192.168.4.1
ipaddr=192.168.4.145
loads_echo=1
mender_altbootcmd=if test ${mender_boot_part} = 7; then setenv mender_boot_part 8; setenv mender_boot_part_hex 8; else setenv mender_boot_part 7; setenv mender_boot_part_hex 7; fi; setenv upgrade_available 0; saveenv; run mender_setup
mender_boot_kernel_type=bootm
mender_boot_part=7
mender_boot_part_hex=7
mender_check_saveenv_canary=1
mender_kernel_name=kernel
mender_saveenv_canary=1
mender_setup=if test "${mender_saveenv_canary}" != "1"; then setenv mender_saveenv_canary 1; saveenv; fi; if test "${mender_pre_setup_commands}" != ""; then run mender_pre_setup_commands; fi; setenv mender_kernel_root /dev/mmcblk0p${mender_boot_part}; if test ${mender_boot_part} = 7; then setenv mender_boot_part_name /dev/mmcblk0p7; else setenv mender_boot_part_name /dev/mmcblk0p8; fi; setenv mender_kernel_root_name ${mender_boot_part_name}; setenv mender_uboot_root mmc 0:${mender_boot_part_hex}; setenv mender_uboot_root_name ${mender_boot_part_name}; setenv expand_bootargs "setenv bootargs \\"${bootargs}\\""; run expand_bootargs; setenv expand_bootargs; if test "${mender_post_setup_commands}" != ""; then run mender_post_setup_commands; fi
mender_try_to_recover=if test ${upgrade_available} = 1; then reset; fi
mender_uboot_boot=mmc 0:0
mender_uboot_dev=0
mender_uboot_if=mmc
netmask=255.255.255.0
serverip=192.168.4.13
softburn=mw 0x100000cc 0x42575302;reset
stderr=serial
stdin=serial
stdout=serial
upgrade_available=0

Environment size: 1928/32763 bytes

I’m still investigating but I’m pretty sure that I made some mistake that messed up credentials or device ID in the update and that created this state.
I’ll try to reproduce to get a clean log.

OK, I think I understand what is happening.
My upgrade contained instructions to move the persistent mender directory from one partition (/data/mender) to another (/auth/mender) which created an inconsistent state:

  1. firmware version A is running on partition 7
  2. mender downloads the upgrade (firmware version B)
  3. mender changes the boot env to boot from partition 8 and issues a reboot
  4. the boot of firmware version B on partition 8 succeeds
  5. my modified mender init script moves /data/mender/ contents into /auth/mender/ (including mender-store) and starts mender
  6. wifi fails to connect (separate issue) and mender can’t commit the upgrade
  7. mender decides the upgrade failed and starts a rollback, changing boot env to boot from partition 7
  8. mender stores state in /auth/mender/ and reboots
  9. partition 7 is booted
  10. when mender starts, it restores its state form /data/mender (not from /auth/mender) finding an inconsistent state: state is recovered from 3. as reboot but upgrade_available is 0.
  11. mender decides to rollback but also says that rollback is not needed as upgrade_available=0, so it doesn’t change the boot env and just reboots. this state then loops.

@TheYoctoJester could you confirm that this makes sense?

Thanks,
Stephan

@stwirth

Hmm interesting situation. Not sure if I really understood all the implications by just reading and thinking about it, but my first conclusion would be that your interpretation sounds correct.

Greetz,
Josef

1 Like