Mender Rollback - Reset occurs before successful rollback

Hi,

I’m using Mender 1.7, latest version of the client software on the board.
The board is an i.MX6Q custom board. I’ve manually ported U-Boot for this board using the freescale community version (u-boot-fslc) and manually added support for Mender.
The OS is a Yocto Based distribution using the Rocko release. Kernel version is 4.1.15 and is the Freescale community version.

Mender is working well but whilst testing rollbacks by patching the kernel to lockup I noticed that it reset and then started to boot from the other partition but then reset again during boot and then on the second attempt successfully booted to the other partition. Here’s the console output annotated with bold

Mender has updated the image with the dodgy kernel that locks up and reboots

Hit any key to stop autoboot:  0 
Booting from mmc ...
53561 bytes read in 175 ms (298.8 KiB/s)
7017656 bytes read in 359 ms (18.6 MiB/s)
## Booting kernel from Legacy Image at 12000000 ...
   Image Name:   Linux-4.1.15-rocko-1.1
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    7017592 Bytes = 6.7 MiB
   Load Address: 10080000
   Entry Point:  10080000
   Verifying Checksum ... OK
## Flattened Device Tree blob at 13000000
   Booting using the fdt blob at 0x13000000
   Loading Kernel Image ... OK
   Using Device Tree in place at 13000000, end 13010138

Starting kernel ...

Kernel locks up and watchdog fires as expected

U-Boot SPL 2017.11+fslc+ga07698f (Feb 28 2019 - 11:58:11)
Booted from eMMC/SD
Trying to boot from MMC1
U-Boot 2017.11+fslc+ga07698f (Feb 28 2019 - 11:58:11 +0000)

CPU:   Freescale i.MX6D rev1.5 996 MHz (running at 792 MHz)
CPU:   Extended Commercial temperature grade (-20C to 105C) at 38C
Reset cause: WDOG
       Watchdog enabled
I2C:   ready
DRAM:  2 GiB
PF3000 initialisation
PMIC: PFUZE3000 DEV_ID=0x10 REV_ID=0x21
MMC:   FSL_SDHC: 0
In:    serial
Out:   serial
Err:   serial
Net:   FEC0
Writing to MMC(0)... done
Warning: Bootlimit (1) exceeded. Using altbootcmd.
Hit any key to stop autoboot:  0 
Saving Environment to MMC...
Writing to redundant MMC(0)... done
Booting from mmc ...
53561 bytes read in 174 ms (299.8 KiB/s)
7019224 bytes read in 358 ms (18.7 MiB/s)
## Booting kernel from Legacy Image at 12000000 ...
   Image Name:   Linux-4.1.15-rocko-1.1
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    7019160 Bytes = 6.7 MiB
   Load Address: 10080000
   Entry Point:  10080000
   Verifying Checksum ... OK
## Flattened Device Tree blob at 13000000
   Booting using the fdt blob at 0x13000000
   Loading Kernel Image ... OK
   Using Device Tree in place at 13000000, end 13010138

Starting kernel ...

[    0.229788] SANTVEND hardware revision 1.2
[    0.248419] anatop_regulator 20c8000.anatop:regulator-3p0@120: Failed to resolve vin-supply for vdd3p0
[    0.248974] anatop_regulator 20c8000.anatop:regulator-3p0@120: Failed to resolve vin-supply for vdd3p0
[    0.267578] anatop_regulator 20c8000.anatop:regulator-vddpu@140: Failed to resolve vin-supply for vddpu
[    0.267612] imx-gpc 20dc000.gpc: failed to get pu regulator: -517
[    0.286863] 2000000.aips-bus:usbphy_nop1 supply vcc not found, using dummy regulator
[    0.287091] 2000000.aips-bus:usbphy_nop2 supply vcc not found, using dummy regulator
[    0.437254] CPU PMU: Failed to parse /soc/pmu/interrupt-affinity[0]
[    0.476108] pwm-backlight.1 supply power not found, using dummy regulator
[    0.681126] pnglogo: No generic logo license found in device tree
[    0.779797] imx-sdma 20ec000.sdma: no event needs to be remapped
[    0.793505] DDR_1V35: Failed to create debugfs directory
[    1.071165] 2184800.usbmisc supply vbus-wakeup not found, using dummy regulator
[    1.079766] imx6q-pinctrl 20e0000.iomuxc: pin MX6Q_PAD_GPIO_0 already requested by regulators:regulator@3; cannot claim for 2184200.usb
[    1.092035] imx6q-pinctrl 20e0000.iomuxc: pin-136 (2184200.usb) status -22
[    1.098985] imx6q-pinctrl 20e0000.iomuxc: could not request pin 136 (MX6Q_PAD_GPIO_0) from group usbh1_vbusgrp  on device 20e0000.iomuxc
[    1.111283] imx_usb 2184200.usb: Error applying setting, reverse things back
[    1.130249] lm85 1-002e: Device configuration is locked
[    1.135522] lm85 1-002e: Device is not ready
[    1.140935] tc74 1-0048: unable to read config register
[    1.146220] tc74: probe of 1-0048 failed with error -5
[    1.151693] tc74 1-004a: unable to read config register
[    1.156972] tc74: probe of 1-004a failed with error -5
[    1.162457] tc74 1-004b: unable to read config register
[    1.167734] tc74: probe of 1-004b failed with error -5
[    1.173361] tmp116 1-0049: tmp116_probe unable to read config register.
����V�V��-���&H\X��C

This is where the spurious reset occurs

U-Boot 2017.11+fslc+ga07698f (Feb 28 2019 - 11:58:11 +0000)

CPU:   Freescale i.MX6D rev1.5 996 MHz (running at 792 MHz)
CPU:   Extended Commercial temperature grade (-20C to 105C) at 38C
Reset cause: POR
       Watchdog enabled
I2C:   ready
DRAM:  2 GiB
PF3000 initialisation
PMIC: PFUZE3000 DEV_ID=0x10 REV_ID=0x21
MMC:   FSL_SDHC: 0
In:    serial
Out:   serial
Err:   serial
Net:   FEC0
Hit any key to stop autoboot:  0 
Booting from mmc ...
53561 bytes read in 173 ms (301.8 KiB/s)
7019224 bytes read in 358 ms (18.7 MiB/s)
## Booting kernel from Legacy Image at 12000000 ...
   Image Name:   Linux-4.1.15-rocko-1.1
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    7019160 Bytes = 6.7 MiB
   Load Address: 10080000
   Entry Point:  10080000
   Verifying Checksum ... OK
## Flattened Device Tree blob at 13000000
   Booting using the fdt blob at 0x13000000
   Loading Kernel Image ... OK
   Using Device Tree in place at 13000000, end 13010138

Starting kernel ...

and the board then goes onto boot fine

I’m not 100% sure but it looks like the watchdog has fired again during the rollback. Have I missed something in the setting up of U-Boot for rollback? Out of interest do the Mender changes to U-Boot do anything particular with the setup of the Watchdog during the rollback scenario.

Many Thanks,
Martin.

Edit: @mirzak: Prettified logs

Out of interest do the Mender changes to U-Boot do anything particular with the setup of the Watchdog during the rollback scenario.

Mender does no configuration of watchdog, nor in U-boot nor in Linux kernel.

I’m not 100% sure but it looks like the watchdog has fired again during the rollback. Have I missed something in the setting up of U-Boot for rollback?

Have you gone trough the integration checklist?

https://docs.mender.io/1.7/devices/yocto-project/bootloader-support/u-boot/integration-checklist

This would be my first step to try to identify if there are any issues with the integration.

Thanks for the reply. I have gone through the checklist already and everything has passed. I’ll take a look at the altbootcmd and also see what U-Boot is doing with the Watchdog in the rollback scenario.

Keep us posted, I am curious now.

Btw, do you have a watchdog reset command in bootcmd which might be missing in altbootcmd?

Will do. Nothing special in bootcmd or altbootcmd that I can see that is watchdog related.
=> env print bootcmd
bootcmd=run mender_setup; run mmcboot; run mender_try_to_recover
=> env print altbootcmd
altbootcmd=run mender_altbootcmd; run bootcmd
=> env print mender_altbootcmd
mender_altbootcmd=if test ${mender_boot_part} = 2; then setenv mender_boot_part 3; setenv mender_boot_part_hex 3; else setenv mender_boot_part 2; setenv mender_boot_part_hex 2; fi; setenv upgrade_available 0; saveenv; run mender_setup

I’ll put some printf’s in the Watchdog setup and the kick macro to see if this uncovers anything.

What is in your mmcboot command?

=> env print mmcboot
mmcboot=echo Booting from mmc …; run mmcargs; load {mender_uboot_root} {fdt_addr_r} /boot/{mender_dtb_name}; load {mender_uboot_root} {kernel_addr_r} /boot/{mender_kernel_name}; {mender_boot_kernel_type} {kernel_addr_r} - ${fdt_addr_r}

=> env print mmcargs
mmcargs=setenv bootargs console={console},{baudrate} loglevel={loglevel} root={mender_kernel_root} rootwait rw

Just an update, I can’t find any reason why this is occurring. Weirdly it doesn’t happen if I run the following commands
fw_setenv upgrade_available 1
fw_setenv bootcount 0

reboot and then pull the power before Mender starts and when power is applied it runs altbootcmd fine with no reset.

I put some debug prints in the Watchdog but everything looks fine with the setting of the i.MX HW Watchdog.

For the moment I’ll leave it as it does eventually boot to the correct partition so it is working even if it does do this weird reset. I’m going to test the watchdog from Userspace so this may throw something up.

I have a question. Once it has booted to a new image, is there a period before it says this update is successful, ie it commits the update?

The default criteria for a successful update (committing it) is for the client to be able to report the update status to the Mender server, meaning that it must be able to connect to it.

You can influence this behavior using state-scripts, e.g to add custom sanity checks before the commit.

https://docs.mender.io/1.7/artifacts/state-scripts

Thanks for the info on state scripts, I missed that when reading the documentation. So I should be able to add something to the “Enter” of the ArtifactCommit state that basically delays it until a certain period of time has passed. Awesome, I’ll try that.