ArtifactInstall_Enter strange workflow behavior

Hi,

I’m working for a customer for whom I set up Mender (Yocto). We use it in stand-alone mode, as the target is in a specific industrial environment, where we want to control the update workflow.

One of our problems to solve is to prevent an installation when booting for the first time on new firmware before it has been committed. Indeed, there is no obstacle to this with mender: you can make a new install from a firmware where is not yet a “mender commit” done, and so you could lose your failover/rollback of old firmware as your new firmware would finally need to rollback. Even if this scenario is very unlikely (above all now that we have an ArtifactCommit_Enter state script to check whether the new firmware seems alright, and we do it just after ssh service launch), my customer wants to be completely sure it will never happen.

This is why the idea was to make a condition for installation with the presence of a temporary file (/var/run/mender_ok) exists, and this file is created either if no commit is necessary or if the commit succeeded, this being decided a late boot stage.

However, I observed that the ArtifactInstall_Enter which checks the existence of this file is triggered AFTER the firmware is flashed. When it fails, this prevents the switch to this new firmware (alright!), but we still lose an opportunity for rollback! :frowning:

Here are the logs when ArtifactInstall_Enter fails:

root@smarcimx8mq4g:/data# mender install firmware-smarcimx8mq4g-20220515111029.mender  
INFO[0000] Loaded configuration file: /var/lib/mender/mender.conf  
INFO[0000] Loaded configuration file: /etc/mender/mender.conf  
INFO[0000] Mender running on partition: /dev/mmcblk0p2   
INFO[0000] Start updating from local image file: [firmware-smarcimx8mq4g-20220515111029.mender]  
Installing Artifact of size 32771584...
INFO[0000] No public key was provided for authenticating the artifact  
INFO[0000] Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directo
ry). Update modules will not be available  
INFO[0000] Opening device "/dev/mmcblk0p3" for writing   
INFO[0000] Native sector size of block device /dev/mmcblk0p3 is 512 bytes. Mender will write in chunks of 1048576 bytes  
.............................................................. - 100 %
INFO[0004] All bytes were successfully written to the new partition  
INFO[0004] The optimized block-device writer wrote a total of 201 frames, where 37 frames did need to be rewritten (i.e., skipped)  
INFO[0004] Wrote 209715200/209715200 bytes to the inactive partition  
INFO[0004] Executing script: ArtifactInstall_Enter_01    
ERRO[0004] ArtifactInstall_Enter script failed: statescript: error executing 'ArtifactInstall_Enter_01': 1 : exit status 1  
Rolling back Artifact...
INFO[0004] No update available, so no rollback needed.   
ERRO[0004] statescript: error executing 'ArtifactInstall_Enter_01': 1 : exit status 1  
root@smarcimx8mq4g:/data#

And here are the logs when ArtifactInstall_Enter succeeds:

INFO[0000] Opening device "/dev/mmcblk0p3" for writing   
INFO[0000] Native sector size of block device /dev/mmcblk0p3 is 512 bytes. Mender will write in chunks of 1048576 bytes  
.............................................................. - 100 %
INFO[0004] All bytes were successfully written to the new partition  
INFO[0004] The optimized block-device writer wrote a total of 201 frames, where 2 frames did need to be rewritten (i.e., skipped)  
INFO[0004] Wrote 209715200/209715200 bytes to the inactive partition  
INFO[0004] Executing script: ArtifactInstall_Enter_01    
INFO[0004] Enabling partition with new image installed to be a boot candidate: 3  
Use -commit to update, or -rollback to roll back the update.
At least one payload requested a reboot of the device it updated

In both cases, the other firmware partition is flashed. This doesn’t sound logical at all… I think that the absence of a rollback option after a successful commit (which would be so great, instead of doing it manually by hacking u-boot environment) led not to dealing with this situation. Sadly, as we are in standalone mode, we cannot use the download state script either (which would otherwise solve our problem).

One solution would be to encapsulate, but that’s rather ugly. Is there something that we missed here? What would prevent you to execute ArtifactInstall_Enter BEFORE crashing/writing the inactive firmware partition? If there is no obstacle, is it possible to fix this? I could do and push the patch. Mender is great, that would be our contribution to the project. :slight_smile:

Thank you very much,
Gilles

Hi @Gilles,

After a quick check in [1], it seems Download is executed in standalone mode.
Also, in case in missed it, you have some examples in [2].

[1] - State scripts | Mender documentation
[2] - https://github.com/mendersoftware/mender/tree/master/examples/state-scripts

/PJ

Hi PJ @texierp! (Small world! :wink: )

You’re right, I was confused with the fact that with the workflow we decided, we push ourselves the firmware on the device with scp (when the WiFi would not be overloaded, so it could also be one device at a time) and we make the update later (through ssh command), and the reboot maybe even later (in the middle of the night). Maybe we could consider it as a workaround to use the mender download process. But I already see the problem: the whole update workflow would then be tied together, whereas we want to detach every step, and just be sure not to install at the wrong time (so this is the very state which we want to secure per se).

Gilles

Hi @Gilles,

Yes indeed :wink:

Ah, I see, I better understand your use case now.

I am not so familiar with the standalone mode with state scripts, but I think this is the expected behavior (even in managed mode) … agree it is confusing :confused: .

@kacf any thoughts ?

/PJ