State script returning exit status 1 leads to retry loop instead of abort

Hi,

I’m working on a state script (Download_Enter in my case) that shall (depending on stuff) return 0, 1, or 21, depending on if i want the upgrade to go on, abort, or delay until later (following the docs here).

I mostly experimented with a standalone upgrade (running mender install ./local_file.mender) with the desired behaviour. However, when triggering an upgrade from hosted.mender.io, the code branch that returns 1 seems to lead to a retry-loop. I have set the number of tries to 1 in the webui so that’s not where the retries come from. To simplify debugging, I actually reduced the state script to

#!/bin/bash

exit 1

and I tested

/etc/mender/scripts/Download_Enter_00_MyTest
echo $?

prints 1.

In the journal (journalctl -f -t mender) i see the following looping over (minor editting applied)

Feb 15 16:19:59 <hostname> mender[5275]: time="2023-02-15T16:19:59+01:00" level=error msg="transient error: error calling enter script for (error) update-fetch state: error running enter state script(s) for Download_Enter state: statescript: error executing 'Download_Enter_00_MyTest': 1 : exit status 1"
Feb 15 16:19:59 <hostname> mender[5275]: time="2023-02-15T16:19:59+01:00" level=error msg="transient error: error calling enter script for (error) update-fetch state: error running enter state script(s) for Download_Enter state: statescript: error executing 'Download_Enter_00_MyTest': 1 : exit status 1"
Feb 15 16:19:59 <hostname> mender[5275]: time="2023-02-15T16:19:59+01:00" level=info msg="State transition: update-fetch [Download_Enter] -> error [Error]"
Feb 15 16:19:59 <hostname> mender[5275]: time="2023-02-15T16:19:59+01:00" level=info msg="State transition: update-fetch [Download_Enter] -> error [Error]"
Feb 15 16:19:59 <hostname> mender[5275]: time="2023-02-15T16:19:59+01:00" level=info msg="Handling error state, current error: transient error: error calling enter script for (error) update-fetch state: error running enter state script(s) for Download_Enter state: statescript: error executing 'Download_Enter_00_MyTest': 1 : exit status 1"
Feb 15 16:19:59 <hostname> mender[5275]: time="2023-02-15T16:19:59+01:00" level=info msg="State transition: error [Error] -> idle [Idle]"
Feb 15 16:19:59 <hostname> mender[5275]: time="2023-02-15T16:19:59+01:00" level=info msg="Handling error state, current error: transient error: error calling enter script for (error) update-fetch state: error running enter state script(s) for Download_Enter state: statescript: error executing 'Download_Enter_00_MyTest': 1 : exit status 1"
Feb 15 16:19:59 <hostname> mender[5275]: time="2023-02-15T16:19:59+01:00" level=info msg="State transition: error [Error] -> idle [Idle]"
Feb 15 16:19:59 <hostname> mender[5275]: time="2023-02-15T16:19:59+01:00" level=info msg="State transition: idle [Idle] -> check-wait [Idle]"
Feb 15 16:19:59 <hostname> mender[5275]: time="2023-02-15T16:19:59+01:00" level=info msg="State transition: idle [Idle] -> check-wait [Idle]"
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=info msg="State transition: check-wait [Idle] -> inventory-update [Sync]"
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=info msg="State transition: check-wait [Idle] -> inventory-update [Sync]"
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=info msg="State transition: inventory-update [Sync] -> check-wait [Idle]"
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=info msg="State transition: inventory-update [Sync] -> check-wait [Idle]"
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=info msg="State transition: check-wait [Idle] -> update-check [Sync]"
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=info msg="State transition: check-wait [Idle] -> update-check [Sync]"
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=info msg="Validating the Update Info: https://s3.amazonaws.com/hosted-mender-artifacts/... [name: <artifactname>; devices: [<device-type>]]"
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=info msg="State transition: update-check [Sync] -> update-fetch [Download_Enter]"
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=info msg="Validating the Update Info: https://s3.amazonaws.com/hosted-mender-artifacts/... [name: <artifactname>; devices: [<device-type>]]"
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=info msg="State transition: update-check [Sync] -> update-fetch [Download_Enter]"
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=info msg="Executing script: Download_Enter_00_MyTest"
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=info msg="Executing script: Download_Enter_00_MyTest"
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=error msg="transient error: error calling enter script for (error) update-fetch state: error running enter state script(s) for Download_Enter state: statescript: error executing 'Download_Enter_00_MyTest': 1 : exit status 1"                                                                           
Feb 15 16:20:04 <hostname> mender[5275]: time="2023-02-15T16:20:04+01:00" level=error msg="transient error: error calling enter script for (error) update-fetch state: error running enter state script(s) for Download_Enter state: statescript: error executing 'Download_Enter_00_MyTest': 1 : exit status 1"                                             

My understanding from the docs “If a script returns 0 Mender proceeds, but if it returns 1 the update is aborted and rolled back.” was that 1 should abort the upgrade and not keep trying (in fact, I can confirm that the logs are different when I return 21). Am I missing / overlooking something how to abort a hosted.mender.io-triggered upgrade from a Download_Enter script?

PS: running mender client version

3.4.0	runtime: go1.17.13

Hello @pseyfert,

I think this is actually a bug. Still, let me double check with my dev colleagues internally.
I can reproduce your scenario.

Have a nice day!
Luis

1 Like

Hi @lramirez I found this thread and some follow up here
I’d be very interested in any patch to fix this since the application I’m working on needs to be able to abort download if a number of parameters aren’t met.
We have Mender v3.2.1 on runtime: go1.14.7 running on Yocto Linux.
Thanks and best regards,
Tom

Hello @tidley,

Returning 1 in any of the Download_Enter scripts will cause the client to actually abort the update and go back to Idle, but it will not report failure to the server. Therefore the deployment will be retried on the next polling cycle. This behavior may change in the future.

Have a nice day!
Luis

Thanks for the quick reponse @lramirez . Is there a way to report the failure to the server to prevent it from retrying in a continuous loop? I’ve tried “RetryPollAttemptsDownload”: 5 in mender.conf but I think that’s deprecated/unused.
Kind regards,
Tom

Hello @tidley ,

Do you mean RetryPollCount? I don’t know if it will work as the client is not reporting the error itself to the server, I have not tested myself but I encourage you to. I can test it myself in the coming days.

Have a nice day!
Luis

Hi @lramirez ,
I tried this flag but sadly it doesn’t stop retrying after the count.
This means having to create a sort of watchdog opening the Mender client only when a compatible update is pushed to certain devices?
Hopefully you’re having a relaxing weekend.
Kind regards,
Tom