Hi,
I’m working for a customer for whom I set up Mender (Yocto). We use it in stand-alone mode, as the target is in a specific industrial environment, where we want to control the update workflow.
One of our problems to solve is to prevent an installation when booting for the first time on new firmware before it has been committed. Indeed, there is no obstacle to this with mender: you can make a new install from a firmware where is not yet a “mender commit” done, and so you could lose your failover/rollback of old firmware as your new firmware would finally need to rollback. Even if this scenario is very unlikely (above all now that we have an ArtifactCommit_Enter state script to check whether the new firmware seems alright, and we do it just after ssh service launch), my customer wants to be completely sure it will never happen.
This is why the idea was to make a condition for installation with the presence of a temporary file (/var/run/mender_ok) exists, and this file is created either if no commit is necessary or if the commit succeeded, this being decided a late boot stage.
However, I observed that the ArtifactInstall_Enter which checks the existence of this file is triggered AFTER the firmware is flashed. When it fails, this prevents the switch to this new firmware (alright!), but we still lose an opportunity for rollback!
Here are the logs when ArtifactInstall_Enter fails:
root@smarcimx8mq4g:/data# mender install firmware-smarcimx8mq4g-20220515111029.mender
INFO[0000] Loaded configuration file: /var/lib/mender/mender.conf
INFO[0000] Loaded configuration file: /etc/mender/mender.conf
INFO[0000] Mender running on partition: /dev/mmcblk0p2
INFO[0000] Start updating from local image file: [firmware-smarcimx8mq4g-20220515111029.mender]
Installing Artifact of size 32771584...
INFO[0000] No public key was provided for authenticating the artifact
INFO[0000] Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directo
ry). Update modules will not be available
INFO[0000] Opening device "/dev/mmcblk0p3" for writing
INFO[0000] Native sector size of block device /dev/mmcblk0p3 is 512 bytes. Mender will write in chunks of 1048576 bytes
.............................................................. - 100 %
INFO[0004] All bytes were successfully written to the new partition
INFO[0004] The optimized block-device writer wrote a total of 201 frames, where 37 frames did need to be rewritten (i.e., skipped)
INFO[0004] Wrote 209715200/209715200 bytes to the inactive partition
INFO[0004] Executing script: ArtifactInstall_Enter_01
ERRO[0004] ArtifactInstall_Enter script failed: statescript: error executing 'ArtifactInstall_Enter_01': 1 : exit status 1
Rolling back Artifact...
INFO[0004] No update available, so no rollback needed.
ERRO[0004] statescript: error executing 'ArtifactInstall_Enter_01': 1 : exit status 1
root@smarcimx8mq4g:/data#
And here are the logs when ArtifactInstall_Enter succeeds:
INFO[0000] Opening device "/dev/mmcblk0p3" for writing
INFO[0000] Native sector size of block device /dev/mmcblk0p3 is 512 bytes. Mender will write in chunks of 1048576 bytes
.............................................................. - 100 %
INFO[0004] All bytes were successfully written to the new partition
INFO[0004] The optimized block-device writer wrote a total of 201 frames, where 2 frames did need to be rewritten (i.e., skipped)
INFO[0004] Wrote 209715200/209715200 bytes to the inactive partition
INFO[0004] Executing script: ArtifactInstall_Enter_01
INFO[0004] Enabling partition with new image installed to be a boot candidate: 3
Use -commit to update, or -rollback to roll back the update.
At least one payload requested a reboot of the device it updated
In both cases, the other firmware partition is flashed. This doesn’t sound logical at all… I think that the absence of a rollback option after a successful commit (which would be so great, instead of doing it manually by hacking u-boot environment) led not to dealing with this situation. Sadly, as we are in standalone mode, we cannot use the download state script either (which would otherwise solve our problem).
One solution would be to encapsulate, but that’s rather ugly. Is there something that we missed here? What would prevent you to execute ArtifactInstall_Enter BEFORE crashing/writing the inactive firmware partition? If there is no obstacle, is it possible to fix this? I could do and push the patch. Mender is great, that would be our contribution to the project.
Thank you very much,
Gilles