Environment
Client: Mender version 2.1.1
Client Mode: Managed
Server: 2.1
Bootloader: U-boot - ver=U-Boot 2019.01
HW: Custom - based on BBB core reference.
OS: Bespoke/linux based
Build System: Yocto
Background
We have ~48,000 sensors in the field that are currently orphaned due to mender_setup failing to ‘initialize’ u-boot, resulting in a failed canary check when attempting an OTA update.
This can be fixed by simply touching u-boot, for example, set an arbitrary envvar, i.e.
fw_setenv random_var 1
and upon reboot the mender_setup will run successfully, and OTA updates will succeed as expected.
Sadly, the failed canary check exception is raised before the Install_Enter state, so we are unable to implement corrective measures using a state script, so we cant deploy the solution.
Update Modules are not implemented.
Given the cost to roll a truck for all the afflicted devices, we are asking for recommendations, ideas, cheats, voodoo magic, anything to help us bring these orphaned devices back into the family.
Notes
How did this happen?
So, a sad circumstance of the recommended integration testing, which is the first smoke test we perform on new release candidates, is that it prescribes writing an envvar to u-boot, rebooting, and reading the written variable to insure fw_* works. This very action masked this bug, because when our CI system moved on to test OTA updates, they succeed, because u-boot had been ‘touched’, resulting in mender_setup running successfully on subsequent boots.
Unfortunately, in manufacturing, the image is written to the mmc while its on the real, and no subsequent ‘touch’ of the u-boot happens, so these units were all manufactured into inventory and then deployed to the field without detection.
Any help or suggestions are greatly appreciated!
SLR-