State Script runs successfully but Update Module fails -> Deployment marked as failed

Good day,

I’m making a demo with State Scripts together with the single-file Update Module, and I wanted to clarify something about how failures are handled.

Here’s what happens in my setup:

  • I create an Artifact with a dummy payload (/etc/hostname) and attach a State Script.

  • The State Script ArtifactInstall_Enter_10_run_move.sh runs fine. Inside it I run a Python program (python3 1.move.py) to control some hardware (a small robot car on Raspberry Pi), and the hardware reacts exactly as expected.

  • Please note that this State Script runs at ArtifactInstall_Enter, before the Update Module installation logic itself is executed.

  • After the State Script finishes, the Update Module fails due to missing payload metadata:

2025-09-01 11:09:58.516 +0000 UTC info: Running State Script: /var/lib/mender/scripts/ArtifactInstall_Enter_10_run_move.sh

2025-09-01 11:09:58.52 +0000 UTC info: Collected output (stderr) while running script: + cd /home/tqcs/picar-x/example

2025-09-01 11:09:58.521 +0000 UTC info: Collected output (stderr) while running script: + /usr/bin/python3 1.move.py

2025-09-01 11:10:10.005 +0000 UTC info: Update Module output (stderr): cat: /var/lib/mender/modules/v3/payloads/0000/tree/files/dest_dir{…}

2025-09-01 11:10:10.005 +0000 UTC info: Update Module output (stderr): : No such file or directory

2025-09-01 11:10:10.005 +0000 UTC error: Process returned non-zero exit status: ArtifactInstall: Process exited with status 1

2025-09-01 11:10:10.023 +0000 UTC info: Update Module output (stderr): cat: /var/lib/mender/modules/v3/payloads/0000/tree/files/filename{…}

2025-09-01 11:10:10.023 +0000 UTC info: Update Module output (stderr): : No such file or directory

2025-09-01 11:10:10.023 +0000 UTC error: Process returned non-zero exit status: ArtifactRollback: Process exited with status 1

2025-09-01 11:10:10.044 +0000 UTC info: Sending status update to server

In this case, the error was missing metadata in the Artifact, but the same would apply to other errors in the Update Module. I believe this should be clarified, and I made a pull request on docs: clarify State Script success + Update Module failure scenario by Hoonydony · Pull Request #2660 · mendersoftware/mender-docs · GitHub

  1. does this make sense to clarify in mender.docs?
  2. But I’m also wondering if this affects Mender’s “atomicity” because even rollback cannot retrieve the result of running state script before the Update Module. My hardware has already reacted after being triggered by the state script, and Mender indicated the deployment as failed.

Thanks,

Hi @Hoonydony,

Thanks for reaching out! I don’t have a clear opinion on the PR yet, but I think one thing needs clarification here.

You’re implementing potentially non-revertible behavior in a state script. That is of course possible, you can basically do whatever you want in there - but then it’s also your responsibility to either care for handling failures later accordingly, or be aware that no rollback will happen.

An extreme example would be putting this into the ArtifactInstall_Enter script:

rm -fR /

This will happily wipe your root filesystem (if not readonly) and cause irrevocable damage. How would you expect rollback to handle it?

So in a nutshell: whenever you add actions, either persistent or non-persistent ones (and driving actors is usually called persistent) into state scripts, you are also responsible for handling error and rollback processes in further scripts. The obvious alternative is to postpone all such actions until AFTER successful artifact installation.

Greetz,
Josef

@TheYoctoJester

Thank you for your response. I fully agree that an extreme case like the one you mentioned, wiping the root filesystem, would obviously be a serious problem and should be user’s responsibility.

But even in less risky or reasonable/minor changes(state scripts) made in preparation for an update, users may be surprised when these cannot be rolled back after an unexpected update failure.

My main point is that even if the Mender update itself fails, a state script may still succeed. And if, as you said, this behavior is ultimately the user’s responsibility, then I believe it is important to make users clearly aware of this fact.

In addition, in the cases when actions in a state script are reversible (for example, if motor status changes A to B), it might also be recommended that users provide corresponding counter-scripts to properly handle rollback or failure scenarios(motor status changes B to A). This can be suggested if strict atomicity is required.