System Software Version after using mender-convert

Hello again.

Thanks for an update.

I do agree that there was a slight start time difference between the the first round of failed deployments (15:56:34) and a second round of successful deployments(16:17:49). Time difference is approx. 20 minutes, with failed one starting earlier. However, on such a huge span of time, this will be equal to only 20 extra retry later requests, while the overall count of retry later requests is already over 500.

I have investigated the exact number of the amount of retry later requests:

  • Failed deployment device requested retry later 593 times
  • Successful deployment devices requested retry later:
    • 573 times for device 1
    • 577 times for device 2

From this round of testing, it appears that the loop counter should not be involved. Otherwise, all 3 of them would have failed.

Now onto last night deployment test (3rd round). We have put 3 devices to a test:

  • One of them was put into a scenario where retry later will NOT be requested - the update will be applied on boot up.
  • Two other devices were put into a scenario where retry later will be requested - the update will be applied at 2am local time.

And here are the findings:

  • The device that had a deployment applied on boot up (no retry later mechanism involved) completed successfully.
  • The two devices that were put into a scenario to update overnight failed.

For two failed devices, here are the statistics:

  • 1st device requested retry later 564 times
  • 2nd device requested retry later 563 times

I have separated logical points within the logs.

1st device log
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-15T01:59:51Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-15T01:59:51Z"}
{"level":"info","message":"Output (stdout) from command \"/var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\": <redacted>-<redacted>-app: signalled","timestamp":"2024-02-15T02:00:53Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nUpdate window reached.\n<redacted> App shutdown signal file NOT found.\nSending SIGUSR1 to <redacted> <redacted> app.\nSleeping 15 seconds before checking for <redacted> <redacted> App status again within the same script execution.\n<redacted> App shutdown signal file found.\nRemoving <redacted> App termianted and termination pending file.\nContinuing update process.\n\n---------- end of script output","timestamp":"2024-02-15T02:01:08Z"}

{"level":"info","message":"State transition: update-install [ArtifactInstall] -\u003e mender-update-control-refresh-maps [none]","timestamp":"2024-02-15T02:01:12Z"}
{"level":"info","message":"State transition: mender-update-control-refresh-maps [none] -\u003e mender-update-control [none]","timestamp":"2024-02-15T02:01:12Z"}
{"level":"info","message":"State transition: mender-update-control [none] -\u003e update-commit [ArtifactCommit_Enter]","timestamp":"2024-02-15T02:01:12Z"}
{"level":"error","message":"Could not write state data to persistent storage: State data stored and retrieved maximum number of times","timestamp":"2024-02-15T02:01:12Z"}
{"level":"error","message":"fatal error: State data stored and retrieved maximum number of times","timestamp":"2024-02-15T02:01:12Z"}
{"level":"error","message":"State transition loop detected in state update-commit: Forcefully aborting update. The system is likely to be in an inconsistent state after this.","timestamp":"2024-02-15T02:01:12Z"}
{"level":"info","message":"State transition: update-commit [ArtifactCommit_Enter] -\u003e update-status-report [none]","timestamp":"2024-02-15T02:01:12Z"}

{"level":"info","message":"Running Mender client version: 3.5.2","timestamp":"2024-02-15T02:01:12Z"}
{"level":"info","message":"State transition: update-fetch [Download_Enter] -\u003e update-store [Download_Enter]","timestamp":"2024-02-15T02:01:13Z"}
{"level":"info","message":"Installer: authenticated digital signature of artifact","timestamp":"2024-02-15T02:01:13Z"}
{"level":"info","message":"State transition: update-store [Download_Enter] -\u003e update-after-store [Download_Leave]","timestamp":"2024-02-15T02:01:19Z"}
{"level":"info","message":"State transition: update-after-store [Download_Leave] -\u003e mender-update-control-refresh-maps [none]","timestamp":"2024-02-15T02:01:19Z"}
{"level":"info","message":"State transition: mender-update-control-refresh-maps [none] -\u003e mender-update-control [none]","timestamp":"2024-02-15T02:01:19Z"}
{"level":"info","message":"State transition: mender-update-control [none] -\u003e update-install [ArtifactInstall]","timestamp":"2024-02-15T02:01:19Z"}
{"level":"info","message":"Executing script: ArtifactInstall_Enter_00_retry_later","timestamp":"2024-02-15T02:01:19Z"}
{"level":"info","message":"Output (stdout) from command \"/var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\": <redacted>-<redacted>-app: ERROR (not  running)","timestamp":"2024-02-15T02:01:21Z"}
2nd device log
{"level":"info","message":"Output (stdout) from command \"/var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\": <redacted>-<redacted>-app: signalled","timestamp":"2024-02-15T02:00:12Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nUpdate window reached.\n<redacted> App shutdown signal file NOT found.\nSending SIGUSR1 to <redacted> <redacted> app.\nSleeping 15 seconds before checking for <redacted> <redacted> App status again within the same script execution.\n<redacted> App shutdown signal file found.\nRemoving <redacted> App termianted and termination pending file.\nContinuing update process.\n\n---------- end of script output","timestamp":"2024-02-15T02:00:28Z"}

{"level":"info","message":"State transition: update-install [ArtifactInstall] -\u003e mender-update-control-refresh-maps [none]","timestamp":"2024-02-15T02:00:31Z"}
{"level":"info","message":"State transition: mender-update-control-refresh-maps [none] -\u003e mender-update-control [none]","timestamp":"2024-02-15T02:00:31Z"}
{"level":"info","message":"State transition: mender-update-control [none] -\u003e update-commit [ArtifactCommit_Enter]","timestamp":"2024-02-15T02:00:31Z"}

{"level":"info","message":"Device unauthorized; attempting reauthorization","timestamp":"2024-02-15T02:00:31Z"}
{"level":"info","message":"Output (stderr) from command \"/usr/share/mender/identity/mender-device-identity\": using interface /sys/class/net/eth0","timestamp":"2024-02-15T02:00:31Z"}
{"level":"info","message":"successfully received new authorization data from server https://eu.hosted.mender.io","timestamp":"2024-02-15T02:00:31Z"}
{"level":"info","message":"shutting down websocket connections","timestamp":"2024-02-15T02:00:31Z"}
{"level":"info","message":"Local proxy stopped","timestamp":"2024-02-15T02:00:31Z"}
{"level":"warning","message":"error while sending close message: websocket: close sent","timestamp":"2024-02-15T02:00:31Z"}
{"level":"error","message":"Proxy Shutdown failed: context canceled\n","timestamp":"2024-02-15T02:00:31Z"}
{"level":"warning","message":"error while sending close message: websocket: close sent","timestamp":"2024-02-15T02:00:31Z"}
{"level":"info","message":"Local proxy started","timestamp":"2024-02-15T02:00:31Z"}
{"level":"info","message":"Reauthorization successful","timestamp":"2024-02-15T02:00:31Z"}

{"level":"info","message":"State transition: update-commit [ArtifactCommit_Enter] -\u003e update-after-first-commit [none]","timestamp":"2024-02-15T02:00:31Z"}
{"level":"error","message":"Could not write state data to persistent storage: State data stored and retrieved maximum number of times","timestamp":"2024-02-15T02:00:31Z"}
{"level":"error","message":"fatal error: State data stored and retrieved maximum number of times","timestamp":"2024-02-15T02:00:31Z"}
{"level":"error","message":"State transition loop detected in state update-after-first-commit: Forcefully aborting update. The system is likely to be in an inconsistent state after this.","timestamp":"2024-02-15T02:00:31Z"}
{"level":"info","message":"State transition: update-after-first-commit [none] -\u003e update-status-report [none]","timestamp":"2024-02-15T02:00:31Z"}

At the moment of writing this reply, here are the provides of:

  • 1st device:
artifact_name=<redacted>-<redacted>-app-4.1.0_INCONSISTENT
rootfs-image.checksum=15ab7a622a92d125d111b9772d95439a84f4d205a4995b6c0ef9d4c4e3979005
rootfs-image.<redacted>-<redacted>-app.version=4.0.0
rootfs-image.version=2024.01.1
  • 2nd device:
artifact_name=<redacted>-<redacted>-app-4.1.0_INCONSISTENT
rootfs-image.checksum=15ab7a622a92d125d111b9772d95439a84f4d205a4995b6c0ef9d4c4e3979005
rootfs-image.<redacted>-<redacted>-app.version=4.1.0
rootfs-image.version=2024.01.1

According to the logs, 1st device failed a deployment, then deployment retry mechanism kicked in (part of Mender Professional), hence the deployment is now pending again as part of the retry.

On the other hand, 2nd device failed a deployment but deployment retry mechanism did not kick in to restart the deployment.

When we first implemented this, we were on a hosted trial plan. As part of a trial plan, we had configure module included in the client (current rootfs version). Now, we transitioned from trial plan to Professional without the configure addon and we are seeing failures. Might that be related?

When we go onto a specific device page within the dashboard, we are constantly presented with a notification that configure module is not present.


The request ID changes every time the pop up shows.

Please note, that all devices used in the tests started from the same rootfs image.

Would it be helpful if I share the state scripts we attach o the artifacts and share the ideas of the deployment methodologies we use (boot time vs night time)?

Ps considering our plan, is there a better way to get support, or does forum remain the best approach?

Considering how elusive the problem is, I’m not convinced you’ll get better help in support unless you have an enterprise subscription.

But at least you’ve confirmed that the problem is recurring. I have some suggestions:

  1. Try to set StateScriptRetryIntervalSeconds to two minutes instead of one, or maybe even higher. This should reduce the number of repetitions of the retry exit code, will still being reasonably responsive; maybe that’s enough to work around the problem.
  2. See if you can try the new Mender-client-4.0. This has been completely rewritten in C++, and the problem might be gone simply because it is implemented slightly differently.

Regarding the configure error message, this is apparently a known issue.

1 Like

Thanks for a response.

I have spent some time testing various scenarios.

So, I have tried increasing StateScriptRetryIntervalSeconds to a much higher number (from 1 min to 15 min). Unfortunately, still no luck. Let me elaborate a bit:

  • I have scheduled a deployment via Mender Dashboard (part of Mender Professional package) to be released outside of the client device update window to make client devices postpone the deployment for later
  • Deployment was retried later for:
    • time-wise - 71213 seconds (19h46m53s)
    • retry-wise - 79 times
  • Both devices failed the deployment during ArtifactCommit (after ArtifactCommit_Leave in particular - see logs below) resulting in INCONSISTENT state

As a result, drastic reduction in retry count (from ~563 to only ~80) did not result in any improvements, hence, ruling out the retry count out of the equation. Unfortunately, it is not possible to increase StateScriptRetryIntervalSeconds any further due to design and requirements from our end. Therefore, we will have to leave it at that.

Device log 1
2024-02-22 06:14:01 +0000 UTC info: Running Mender client version: 3.5.2
2024-02-22 06:14:02 +0000 UTC info: State transition: update-fetch [Download_Enter] -> update-store [Download_Enter]
2024-02-22 06:14:02 +0000 UTC info: Installer: authenticated digital signature of artifact
2024-02-22 06:14:06 +0000 UTC info: State transition: update-store [Download_Enter] -> update-after-store [Download_Leave]
2024-02-22 06:14:06 +0000 UTC info: State transition: update-after-store [Download_Leave] -> mender-update-control-refresh-maps [none]
2024-02-22 06:14:06 +0000 UTC info: State transition: mender-update-control-refresh-maps [none] -> mender-update-control [none]
2024-02-22 06:14:06 +0000 UTC info: State transition: mender-update-control [none] -> update-install [ArtifactInstall]
2024-02-22 06:14:06 +0000 UTC info: Executing script: ArtifactInstall_Enter_00_retry_later
2024-02-22 06:14:07 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
<redacted> App shutdown signal file NOT found.
Postponing to retry update install process later.

---------- end of script output
2024-02-22 06:14:07 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-22 06:29:08 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
<redacted> App shutdown signal file NOT found.
Postponing to retry update install process later.

---------- end of script output
2024-02-22 06:29:08 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-22 06:44:09 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
<redacted> App shutdown signal file NOT found.
Postponing to retry update install process later.

---------- end of script output
2024-02-22 06:44:09 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
.
.
.
<74 retries later truncated>
.
.
.
2024-02-23 01:30:29 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
<redacted> App shutdown signal file NOT found.
Postponing to retry update install process later.

---------- end of script output
2024-02-23 01:30:29 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-23 01:45:30 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
<redacted> App shutdown signal file NOT found.
Postponing to retry update install process later.

---------- end of script output
2024-02-23 01:45:30 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-23 02:00:32 +0000 UTC info: Output (stdout) from command "/var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later": <redacted>-<redacted>-app: signalled
2024-02-23 02:00:47 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Update window reached.
<redacted> App shutdown signal file NOT found.
Sending SIGUSR1 to <redacted> <redacted> app.
Sleeping 15 seconds before checking for <redacted> <redacted> App status again within the same script execution.
<redacted> App shutdown signal file found.
Removing <redacted> App termianted and termination pending file.
Continuing update process.

---------- end of script output
2024-02-23 02:00:50 +0000 UTC info: State transition: update-install [ArtifactInstall] -> mender-update-control-refresh-maps [none]
2024-02-23 02:00:50 +0000 UTC info: State transition: mender-update-control-refresh-maps [none] -> mender-update-control [none]
2024-02-23 02:00:50 +0000 UTC info: State transition: mender-update-control [none] -> update-commit [ArtifactCommit_Enter]
2024-02-23 02:00:50 +0000 UTC info: Device unauthorized; attempting reauthorization
2024-02-23 02:00:50 +0000 UTC info: Output (stderr) from command "/usr/share/mender/identity/mender-device-identity": using interface /sys/class/net/eth0
2024-02-23 02:00:50 +0000 UTC info: successfully received new authorization data from server https://eu.hosted.mender.io
2024-02-23 02:00:50 +0000 UTC info: shutting down websocket connections
2024-02-23 02:00:50 +0000 UTC info: Local proxy stopped
2024-02-23 02:00:50 +0000 UTC warning: error while sending close message: websocket: close sent
2024-02-23 02:00:50 +0000 UTC error: Proxy Shutdown failed: context canceled
2024-02-23 02:00:50 +0000 UTC warning: error while sending close message: websocket: close sent
2024-02-23 02:00:50 +0000 UTC info: Local proxy started
2024-02-23 02:00:50 +0000 UTC info: Reauthorization successful
2024-02-23 02:00:50 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-after-first-commit [none]
2024-02-23 02:00:50 +0000 UTC info: State transition: update-after-first-commit [none] -> update-after-commit [ArtifactCommit_Leave]
2024-02-23 02:00:50 +0000 UTC info: State transition: update-after-commit [ArtifactCommit_Leave] -> cleanup [none]
2024-02-23 02:00:51 +0000 UTC info: Executing script: ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app
2024-02-23 02:00:53 +0000 UTC info: Output (stdout) from command "/var/lib/mender/scripts/ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app": <redacted>-<redacted>-app: started
2024-02-23 02:00:53 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app
<redacted>-<redacted>-app-4.1.0: Running ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app

---------- end of script output
2024-02-23 02:00:54 +0000 UTC info: State transition: cleanup [none] -> update-status-report [none]
2024-02-23 02:00:54 +0000 UTC error: Could not write state data to persistent storage: State data stored and retrieved maximum number of times
2024-02-23 02:00:54 +0000 UTC error: fatal error: State data stored and retrieved maximum number of times
2024-02-23 02:00:54 +0000 UTC error: State transition loop detected in state update-status-report: Forcefully aborting update. The system is likely to be in an inconsistent state after this.
2024-02-23 02:00:54 +0000 UTC info: State transition: update-status-report [none] -> update-status-report [none]
Device log 2
2024-02-22 06:14:28 +0000 UTC info: Running Mender client version: 3.5.2
2024-02-22 06:14:28 +0000 UTC info: State transition: update-fetch [Download_Enter] -> update-store [Download_Enter]
2024-02-22 06:14:28 +0000 UTC info: Installer: authenticated digital signature of artifact
2024-02-22 06:14:31 +0000 UTC info: State transition: update-store [Download_Enter] -> update-after-store [Download_Leave]
2024-02-22 06:14:31 +0000 UTC info: State transition: update-after-store [Download_Leave] -> mender-update-control-refresh-maps [none]
2024-02-22 06:14:31 +0000 UTC info: State transition: mender-update-control-refresh-maps [none] -> mender-update-control [none]
2024-02-22 06:14:31 +0000 UTC info: State transition: mender-update-control [none] -> update-install [ArtifactInstall]
2024-02-22 06:14:31 +0000 UTC info: Executing script: ArtifactInstall_Enter_00_retry_later
2024-02-22 06:14:33 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
<redacted> App shutdown signal file NOT found.
Postponing to retry update install process later.

---------- end of script output
2024-02-22 06:14:33 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-22 06:29:34 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
<redacted> App shutdown signal file NOT found.
Postponing to retry update install process later.

---------- end of script output
2024-02-22 06:29:34 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-22 06:44:35 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
<redacted> App shutdown signal file NOT found.
Postponing to retry update install process later.

---------- end of script output
2024-02-22 06:44:35 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
.
.
.
<74 retries later truncated>
.
.
.
2024-02-23 01:30:54 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
<redacted> App shutdown signal file NOT found.
Postponing to retry update install process later.

---------- end of script output
2024-02-23 01:30:54 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-23 01:45:55 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
<redacted> App shutdown signal file NOT found.
Postponing to retry update install process later.

---------- end of script output
2024-02-23 01:45:55 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-23 02:00:57 +0000 UTC info: Output (stdout) from command "/var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later": <redacted>-<redacted>-app: signalled
2024-02-23 02:01:12 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Update window reached.
<redacted> App shutdown signal file NOT found.
Sending SIGUSR1 to <redacted> <redacted> app.
Sleeping 15 seconds before checking for <redacted> <redacted> App status again within the same script execution.
<redacted> App shutdown signal file found.
Removing <redacted> App termianted and termination pending file.
Continuing update process.

---------- end of script output
2024-02-23 02:01:16 +0000 UTC info: State transition: update-install [ArtifactInstall] -> mender-update-control-refresh-maps [none]
2024-02-23 02:01:16 +0000 UTC info: State transition: mender-update-control-refresh-maps [none] -> mender-update-control [none]
2024-02-23 02:01:16 +0000 UTC info: State transition: mender-update-control [none] -> update-commit [ArtifactCommit_Enter]
2024-02-23 02:01:16 +0000 UTC info: Device unauthorized; attempting reauthorization
2024-02-23 02:01:16 +0000 UTC info: Output (stderr) from command "/usr/share/mender/identity/mender-device-identity": using interface /sys/class/net/eth0
2024-02-23 02:01:16 +0000 UTC info: successfully received new authorization data from server https://eu.hosted.mender.io
2024-02-23 02:01:16 +0000 UTC info: shutting down websocket connections
2024-02-23 02:01:16 +0000 UTC warning: error while sending close message: websocket: close sent
2024-02-23 02:01:16 +0000 UTC info: Local proxy stopped
2024-02-23 02:01:16 +0000 UTC warning: error while sending close message: websocket: close sent
2024-02-23 02:01:16 +0000 UTC error: Proxy Shutdown failed: context canceled
2024-02-23 02:01:16 +0000 UTC info: Local proxy started
2024-02-23 02:01:16 +0000 UTC info: Reauthorization successful
2024-02-23 02:01:16 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-after-first-commit [none]
2024-02-23 02:01:16 +0000 UTC info: State transition: update-after-first-commit [none] -> update-after-commit [ArtifactCommit_Leave]
2024-02-23 02:01:16 +0000 UTC info: State transition: update-after-commit [ArtifactCommit_Leave] -> cleanup [none]
2024-02-23 02:01:16 +0000 UTC info: Executing script: ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app
2024-02-23 02:01:19 +0000 UTC info: Output (stdout) from command "/var/lib/mender/scripts/ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app": <redacted>-<redacted>-app: started
2024-02-23 02:01:19 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app
<redacted>-<redacted>-app-4.1.0: Running ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app

---------- end of script output
2024-02-23 02:01:19 +0000 UTC info: State transition: cleanup [none] -> update-status-report [none]
2024-02-23 02:01:19 +0000 UTC error: Could not write state data to persistent storage: State data stored and retrieved maximum number of times
2024-02-23 02:01:19 +0000 UTC error: fatal error: State data stored and retrieved maximum number of times
2024-02-23 02:01:19 +0000 UTC error: State transition loop detected in state update-status-report: Forcefully aborting update. The system is likely to be in an inconsistent state after this.
2024-02-23 02:01:19 +0000 UTC info: State transition: update-status-report [none] -> update-status-report [none]

In a meanwhile, I was going through this thread multiple times and I was thinking quite a lot about this message:

And that still stands true for the test, completed now - it was retried later for 19h46m53s. Again, this is so much longer then the one done previously here:

This made me perform an analysis based on time perspective. Here are the analysis results:

  • Deployments retried for at least 36234 seconds (10h3m54s) were all failed
  • Deployments retried for 35014 seconds (9h43m34s) and 35260 seconds (9h47m40s) were successful

I will run another set of tests to confirm whether 10 hours (or as close as I can make it) is a magical number.

In a meanwhile, do you believe there are any limitations/constants/defaults that are being set to 10 hours (36000 seconds) that can affect the retry later mechanism?

Hi @kacf,

First of all, thank you for trying to shed some light on the situation. We appreciate the effort.

@dmitrijsan has continued with the scenario testings, and posted the findings above. I do however want to clarify the situation, more from a commercial perspective than anything else.

  1. Updating to Mender 4.0 - unfortunately that is not currently an option, as mender-convert doesn’t seem to be ready to handle it - unless we misunderstand the version “auto” flag description. Maybe we are just unfortunate that our implementation window overlaps this transition moment between Go and C++

  2. Mender 3.6 is deemed as LTS, with support up to July 2024. If we have discovered a real issue, we would assume the fix will be applied as part of the LTS programme.

  3. Regarding the issue at hand, nothing so far points to us doing something wrong. We are heavily inclined to believe that this is an unexpected behaviour/bug given the error raised should never have been reached given the update modules used in our process.

  4. I would like to understand what is the support level we should expect, and how we should “officialise” a support request as part of the Professional plan (happy to share company details privately or via a private email to support).

  5. In the current situation we are finding ourselves in a partly broken deployment strategy. This has caused a hard pause on adopting any more devices on Mender, and leaves us in a limbo state with regards to the reliability of the system we chose.

Don’t take this the wrong way, I am not pointing any fingers, and I am ready to accept that maybe be did something wrong, however I am not sure what are the steps needed to reach such a conclusion. I am also not entirely sure that a public thread on the forum will allow either of us to share more implementation specific details.

I look forward to hearing from you

Any further feedback from @lluiscampos or @TheYoctoJester would also be greatly appreciated

Rob

1 Like

Hey @robertalexa, totally understandable. On Mender Hub we provide help on a voluntary best effort basis, and unfortunately I don’t think I’m able to help further without a deeper investigation. So I think that a request through official support is the best avenue from here.

@kacf thanks for a reply.

Following my last message, I have now got the results from yet another deployment test.

This time, my aim was to see if the deployment is going to fail in case the deployment is retried later for less then 10 hours. So, outcomes are following:

Deployment was scheduled to be released on 23rd Feb at 16:30:00.

Device 1:
Polls an update on 23rd Feb at 16:42:41
Requests retry later from 23rd Feb at 16:42:46 till 24th Feb at 01:59:27
Continues the update process at on 24th Feb at 02:00:44
Overall time spent retrying later: 33478 seconds (9h17m58s)
Update status: Success

Device 1 log
{"level":"info","message":"Running Mender client version: 3.5.2","timestamp":"2024-02-23T16:42:41Z"}
{"level":"info","message":"State transition: update-fetch [Download_Enter] -\u003e update-store [Download_Enter]","timestamp":"2024-02-23T16:42:41Z"}
{"level":"info","message":"Installer: authenticated digital signature of artifact","timestamp":"2024-02-23T16:42:41Z"}
{"level":"info","message":"State transition: update-store [Download_Enter] -\u003e update-after-store [Download_Leave]","timestamp":"2024-02-23T16:42:46Z"}
{"level":"info","message":"State transition: update-after-store [Download_Leave] -\u003e mender-update-control-refresh-maps [none]","timestamp":"2024-02-23T16:42:46Z"}
{"level":"info","message":"State transition: mender-update-control-refresh-maps [none] -\u003e mender-update-control [none]","timestamp":"2024-02-23T16:42:46Z"}
{"level":"info","message":"State transition: mender-update-control [none] -\u003e update-install [ArtifactInstall]","timestamp":"2024-02-23T16:42:46Z"}
{"level":"info","message":"Executing script: ArtifactInstall_Enter_00_retry_later","timestamp":"2024-02-23T16:42:46Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-23T16:42:47Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-23T16:42:47Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-23T16:43:48Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-23T16:43:48Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-23T16:44:49Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-23T16:44:49Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-23T16:45:50Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-23T16:45:50Z"}
.
.
.
.
<541 retry later requests truncated>
.
.
.
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-24T01:57:25Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-24T01:57:25Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-24T01:58:26Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-24T01:58:26Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-24T01:59:27Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-24T01:59:27Z"}
{"level":"info","message":"Output (stdout) from command \"/var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\": <redacted>-<redacted>-app: signalled","timestamp":"2024-02-24T02:00:29Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nUpdate window reached.\n<redacted> App shutdown signal file NOT found.\nSending SIGUSR1 to <redacted> <redacted> app.\nSleeping 15 seconds before checking for <redacted> <redacted> App status again within the same script execution.\n<redacted> App shutdown signal file found.\nRemoving <redacted> App termianted and termination pending file.\nContinuing update process.\n\n---------- end of script output","timestamp":"2024-02-24T02:00:44Z"}
{"level":"info","message":"State transition: update-install [ArtifactInstall] -\u003e mender-update-control-refresh-maps [none]","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"State transition: mender-update-control-refresh-maps [none] -\u003e mender-update-control [none]","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"State transition: mender-update-control [none] -\u003e update-commit [ArtifactCommit_Enter]","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"Device unauthorized; attempting reauthorization","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"Output (stderr) from command \"/usr/share/mender/identity/mender-device-identity\": using interface /sys/class/net/eth0","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"successfully received new authorization data from server https://eu.hosted.mender.io","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"shutting down websocket connections","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"Local proxy stopped","timestamp":"2024-02-24T02:00:48Z"}
{"level":"warning","message":"error while sending close message: websocket: close sent","timestamp":"2024-02-24T02:00:48Z"}
{"level":"error","message":"Proxy Shutdown failed: context canceled\n","timestamp":"2024-02-24T02:00:48Z"}
{"level":"warning","message":"error while sending close message: websocket: close sent","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"Local proxy started","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"Reauthorization successful","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"State transition: update-commit [ArtifactCommit_Enter] -\u003e update-after-first-commit [none]","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"State transition: update-after-first-commit [none] -\u003e update-after-commit [ArtifactCommit_Leave]","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"State transition: update-after-commit [ArtifactCommit_Leave] -\u003e cleanup [none]","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"Executing script: ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app","timestamp":"2024-02-24T02:00:48Z"}
{"level":"info","message":"Output (stdout) from command \"/var/lib/mender/scripts/ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app\": <redacted>-<redacted>-app: started","timestamp":"2024-02-24T02:00:51Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app\n<redacted>-<redacted>-app-4.1.0: Running ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app\n\n---------- end of script output","timestamp":"2024-02-24T02:00:51Z"}
{"level":"info","message":"State transition: cleanup [none] -\u003e update-status-report [none]","timestamp":"2024-02-24T02:00:51Z"}

Device 2:
Polls an update on 23rd Feb at 16:42:21
Requests retry later from 23rd Feb at 16:42:27 till 24th Feb at 01:59:06
Continues the update process on 24th Feb at 02:00:08
Overall time spent retrying later: 33461 seconds (9h17m41s)
Update status: Success

Device 2 log
{"level":"info","message":"Running Mender client version: 3.5.2","timestamp":"2024-02-23T16:42:21Z"}
{"level":"info","message":"State transition: update-fetch [Download_Enter] -\u003e update-store [Download_Enter]","timestamp":"2024-02-23T16:42:21Z"}
{"level":"info","message":"Installer: authenticated digital signature of artifact","timestamp":"2024-02-23T16:42:21Z"}
{"level":"info","message":"State transition: update-store [Download_Enter] -\u003e update-after-store [Download_Leave]","timestamp":"2024-02-23T16:42:25Z"}
{"level":"info","message":"State transition: update-after-store [Download_Leave] -\u003e mender-update-control-refresh-maps [none]","timestamp":"2024-02-23T16:42:25Z"}
{"level":"info","message":"State transition: mender-update-control-refresh-maps [none] -\u003e mender-update-control [none]","timestamp":"2024-02-23T16:42:25Z"}
{"level":"info","message":"State transition: mender-update-control [none] -\u003e update-install [ArtifactInstall]","timestamp":"2024-02-23T16:42:25Z"}
{"level":"info","message":"Executing script: ArtifactInstall_Enter_00_retry_later","timestamp":"2024-02-23T16:42:25Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-23T16:42:27Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-23T16:42:27Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-23T16:43:28Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-23T16:43:28Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-23T16:44:29Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-23T16:44:29Z"}
.
.
.
<542 retry later requests truncated>
.
.
.
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-24T01:57:04Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-24T01:57:04Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-24T01:58:05Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-24T01:58:05Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-24T01:59:06Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-24T01:59:06Z"}
{"level":"info","message":"Output (stdout) from command \"/var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\": <redacted>-<redacted>-app: signalled","timestamp":"2024-02-24T02:00:08Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nUpdate window reached.\n<redacted> App shutdown signal file NOT found.\nSending SIGUSR1 to <redacted> <redacted> app.\nSleeping 15 seconds before checking for <redacted> <redacted> App status again within the same script execution.\n<redacted> App shutdown signal file found.\nRemoving <redacted> App termianted and termination pending file.\nContinuing update process.\n\n---------- end of script output","timestamp":"2024-02-24T02:00:23Z"}
{"level":"info","message":"State transition: update-install [ArtifactInstall] -\u003e mender-update-control-refresh-maps [none]","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"State transition: mender-update-control-refresh-maps [none] -\u003e mender-update-control [none]","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"State transition: mender-update-control [none] -\u003e update-commit [ArtifactCommit_Enter]","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"Device unauthorized; attempting reauthorization","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"Output (stderr) from command \"/usr/share/mender/identity/mender-device-identity\": using interface /sys/class/net/eth0","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"successfully received new authorization data from server https://eu.hosted.mender.io","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"shutting down websocket connections","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"Local proxy stopped","timestamp":"2024-02-24T02:00:27Z"}
{"level":"warning","message":"error while sending close message: websocket: close sent","timestamp":"2024-02-24T02:00:27Z"}
{"level":"error","message":"Proxy Shutdown failed: context canceled\n","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"Local proxy started","timestamp":"2024-02-24T02:00:27Z"}
{"level":"warning","message":"error while sending close message: websocket: close sent","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"Reauthorization successful","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"State transition: update-commit [ArtifactCommit_Enter] -\u003e update-after-first-commit [none]","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"State transition: update-after-first-commit [none] -\u003e update-after-commit [ArtifactCommit_Leave]","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"State transition: update-after-commit [ArtifactCommit_Leave] -\u003e cleanup [none]","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"Executing script: ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app","timestamp":"2024-02-24T02:00:27Z"}
{"level":"info","message":"Output (stdout) from command \"/var/lib/mender/scripts/ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app\": <redacted>-<redacted>-app: started","timestamp":"2024-02-24T02:00:30Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app\n<redacted>-<redacted>-app-4.1.0: Running ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app\n\n---------- end of script output","timestamp":"2024-02-24T02:00:30Z"}
{"level":"info","message":"State transition: cleanup [none] -\u003e update-status-report [none]","timestamp":"2024-02-24T02:00:30Z"}

So, this leaves me thinking there is some kind of a hard limit of 10 hours somewhere in the system because correlation is quite clear:

  • Updates, retrying later for less than 10 hours are successful
  • Updates retrying later for more than 10 hours are failing

With all this in mind, I will raise a support ticket with a link to this forum thread.

1 Like

Hello everyone, its me again.

By the looks of it, my previous reply consumed all of my luck. Let me elaborate.

I performed some more tests. Now, I changed some of the fields within mender.conf file, in particular:

  • StateScriptRetryTimeoutSeconds which I have decreased from 36hrs to 8hrs as per my previous findings which stated that retrying later under 10hrs is safe
  • StateScriptRetryIntervalSeconds which was reverted to a default of 60 seconds.

All of that would allow me to purposefully expire a deployment within 8hrs to avoid mentioned above 10hrs threshold. Then, thanks to Mender Professional - it will be retried (up to a 100 times max, set manually). This approach could work for us.

Here is my approach angle now:

  • I start a deployment via Mender Dashboard at 11am.
  • The update is retried later for a maximum of 8hrs
  • Because it will be 7pm after 8hrs of retrying later, it will still be outside of the update window, hence StateScriptRetryTimeoutSeconds will be reached, failing a deployment on purpose
  • A deployment will fail and revert gracefully
  • Thanks to Mender Professional, a deployment will be retried, consuming 2nd retry attempt (from 100 that I have set manually)
  • Due to a retry later timeout of 8hrs while deployment being (re)scheduled at ~7pm, it will safely reach an update window at 2am as timeout will be at 3am this time
  • Update installs successfully
  • Deployment for a device marked as successful

Well, the expectation did not quite align with reality:

  • After reaching StateScriptRetryTimeoutSeconds timeout (at ~7:30pm) within the first try, device gave the exact same error as before:
{"level":"error","message":"Could not write state data to persistent storage: State data stored and retrieved maximum number of times","timestamp":"2024-02-28T19:26:24Z"}
{"level":"error","message":"fatal error: State data stored and retrieved maximum number of times","timestamp":"2024-02-28T19:26:24Z"}
{"level":"error","message":"State transition loop detected in state cleanup: Forcefully aborting update. The system is likely to be in an inconsistent state after this.","timestamp":"2024-02-28T19:26:24Z"}
  • This resulted in device being in the INCONSISTENT state
  • However, after server rescheduling a deployment for 2nd time, it went fine at 2am - deployment was applied successfully
Log
{"level":"info","message":"Running Mender client version: 3.5.2","timestamp":"2024-02-28T11:25:56Z"}
{"level":"info","message":"State transition: update-fetch [Download_Enter] -\u003e update-store [Download_Enter]","timestamp":"2024-02-28T11:25:56Z"}
{"level":"info","message":"Installer: authenticated digital signature of artifact","timestamp":"2024-02-28T11:25:57Z"}
{"level":"info","message":"State transition: update-store [Download_Enter] -\u003e update-after-store [Download_Leave]","timestamp":"2024-02-28T11:26:01Z"}
{"level":"info","message":"State transition: update-after-store [Download_Leave] -\u003e mender-update-control-refresh-maps [none]","timestamp":"2024-02-28T11:26:01Z"}
{"level":"info","message":"State transition: mender-update-control-refresh-maps [none] -\u003e mender-update-control [none]","timestamp":"2024-02-28T11:26:01Z"}
{"level":"info","message":"State transition: mender-update-control [none] -\u003e update-install [ArtifactInstall]","timestamp":"2024-02-28T11:26:01Z"}
{"level":"info","message":"Executing script: ArtifactInstall_Enter_00_retry_later","timestamp":"2024-02-28T11:26:01Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-28T11:26:02Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-28T11:26:02Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-28T11:27:03Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-28T11:27:03Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-28T11:28:04Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-28T11:28:04Z"}
.
.
.
< truncated >
.
.
.
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-28T19:23:19Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-28T19:23:19Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-28T19:24:20Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-28T19:24:20Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-28T19:25:21Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-28T19:25:21Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\nrootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-28T19:26:22Z"}
{"level":"error","message":"transient error: error calling enter script for (error) update-install state: error running enter state script(s) for ArtifactInstall state: statescript: retry time-limit exceeded exit status 21","timestamp":"2024-02-28T19:26:22Z"}
{"level":"info","message":"State transition: update-install [ArtifactInstall] -\u003e rollback [ArtifactRollback]","timestamp":"2024-02-28T19:26:22Z"}
{"level":"info","message":"Performing rollback","timestamp":"2024-02-28T19:26:22Z"}
{"level":"info","message":"State transition: rollback [ArtifactRollback] -\u003e update-error [ArtifactFailure]","timestamp":"2024-02-28T19:26:22Z"}
{"level":"info","message":"State transition: update-error [ArtifactFailure] -\u003e cleanup [Error]","timestamp":"2024-02-28T19:26:22Z"}
{"level":"info","message":"Executing script: ArtifactFailure_Leave_00_start_<redacted>_<redacted>_app","timestamp":"2024-02-28T19:26:22Z"}
{"level":"info","message":"Output (stdout) from command \"/var/lib/mender/scripts/ArtifactFailure_Leave_00_start_<redacted>_<redacted>_app\": <redacted>-<redacted>-app: ERROR (already started)","timestamp":"2024-02-28T19:26:24Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactFailure_Leave_00_start_<redacted>_<redacted>_app\nrootfs-2024.01.1: Running ArtifactFailure_Leave_00_start_<redacted>_<redacted>_app\n\n---------- end of script output","timestamp":"2024-02-28T19:26:24Z"}
{"level":"error","message":"Could not write state data to persistent storage: State data stored and retrieved maximum number of times","timestamp":"2024-02-28T19:26:24Z"}
{"level":"error","message":"fatal error: State data stored and retrieved maximum number of times","timestamp":"2024-02-28T19:26:24Z"}
{"level":"error","message":"State transition loop detected in state cleanup: Forcefully aborting update. The system is likely to be in an inconsistent state after this.","timestamp":"2024-02-28T19:26:24Z"}
{"level":"info","message":"State transition: cleanup [Error] -\u003e update-status-report [none]","timestamp":"2024-02-28T19:26:24Z"}
{"level":"info","message":"Running Mender client version: 3.5.2","timestamp":"2024-02-28T19:26:24Z"}
{"level":"info","message":"State transition: update-fetch [Download_Enter] -\u003e update-store [Download_Enter]","timestamp":"2024-02-28T19:26:25Z"}
{"level":"info","message":"Installer: authenticated digital signature of artifact","timestamp":"2024-02-28T19:26:25Z"}
{"level":"info","message":"State transition: update-store [Download_Enter] -\u003e update-after-store [Download_Leave]","timestamp":"2024-02-28T19:26:29Z"}
{"level":"info","message":"State transition: update-after-store [Download_Leave] -\u003e mender-update-control-refresh-maps [none]","timestamp":"2024-02-28T19:26:29Z"}
{"level":"info","message":"State transition: mender-update-control-refresh-maps [none] -\u003e mender-update-control [none]","timestamp":"2024-02-28T19:26:29Z"}
{"level":"info","message":"State transition: mender-update-control [none] -\u003e update-install [ArtifactInstall]","timestamp":"2024-02-28T19:26:29Z"}
{"level":"info","message":"Executing script: ArtifactInstall_Enter_00_retry_later","timestamp":"2024-02-28T19:26:29Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\n<redacted>-<redacted>-app-4.1.0_INCONSISTENT: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-28T19:26:30Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-28T19:26:30Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\n<redacted>-<redacted>-app-4.1.0_INCONSISTENT: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-28T19:27:31Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-28T19:27:31Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\n<redacted>-<redacted>-app-4.1.0_INCONSISTENT: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-28T19:28:32Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-28T19:28:32Z"}
.
.
.
< truncated >
.
.
.
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\n<redacted>-<redacted>-app-4.1.0_INCONSISTENT: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-29T01:57:17Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-29T01:57:17Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\n<redacted>-<redacted>-app-4.1.0_INCONSISTENT: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-29T01:58:18Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-29T01:58:18Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\n<redacted>-<redacted>-app-4.1.0_INCONSISTENT: Running ArtifactInstall_Enter_00_retry_later\nOutside of update window.\n<redacted> App shutdown signal file NOT found.\nPostponing to retry update install process later.\n\n---------- end of script output","timestamp":"2024-02-29T01:59:19Z"}
{"level":"info","message":"statescript: ArtifactInstall_Enter_00_retry_later requested a retry","timestamp":"2024-02-29T01:59:19Z"}
{"level":"info","message":"Output (stdout) from command \"/var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\": <redacted>-<redacted>-app: signalled","timestamp":"2024-02-29T02:00:21Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later\n<redacted>-<redacted>-app-4.1.0_INCONSISTENT: Running ArtifactInstall_Enter_00_retry_later\nUpdate window reached.\n<redacted> App shutdown signal file NOT found.\nSending SIGUSR1 to <redacted> <redacted> app.\nSleeping 15 seconds before checking for <redacted> <redacted> App status again within the same script execution.\n<redacted> App shutdown signal file found.\nRemoving <redacted> App termianted and termination pending file.\nContinuing update process.\n\n---------- end of script output","timestamp":"2024-02-29T02:00:36Z"}
{"level":"info","message":"State transition: update-install [ArtifactInstall] -\u003e mender-update-control-refresh-maps [none]","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"State transition: mender-update-control-refresh-maps [none] -\u003e mender-update-control [none]","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"State transition: mender-update-control [none] -\u003e update-commit [ArtifactCommit_Enter]","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"Device unauthorized; attempting reauthorization","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"Output (stderr) from command \"/usr/share/mender/identity/mender-device-identity\": using interface /sys/class/net/eth0","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"successfully received new authorization data from server https://eu.hosted.mender.io","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"shutting down websocket connections","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"Local proxy stopped","timestamp":"2024-02-29T02:00:40Z"}
{"level":"warning","message":"error while sending close message: websocket: close sent","timestamp":"2024-02-29T02:00:40Z"}
{"level":"error","message":"Proxy Shutdown failed: context canceled\n","timestamp":"2024-02-29T02:00:40Z"}
{"level":"warning","message":"error while sending close message: websocket: close sent","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"Local proxy started","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"Reauthorization successful","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"State transition: update-commit [ArtifactCommit_Enter] -\u003e update-after-first-commit [none]","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"State transition: update-after-first-commit [none] -\u003e update-after-commit [ArtifactCommit_Leave]","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"State transition: update-after-commit [ArtifactCommit_Leave] -\u003e cleanup [none]","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"Executing script: ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app","timestamp":"2024-02-29T02:00:40Z"}
{"level":"info","message":"Output (stdout) from command \"/var/lib/mender/scripts/ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app\": <redacted>-<redacted>-app: started","timestamp":"2024-02-29T02:00:43Z"}
{"level":"info","message":"Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app\n<redacted>-<redacted>-app-4.1.0: Running ArtifactCommit_Leave_00_start_<redacted>_<redacted>_app\n\n---------- end of script output","timestamp":"2024-02-29T02:00:43Z"}
{"level":"info","message":"State transition: cleanup [none] -\u003e update-status-report [none]","timestamp":"2024-02-29T02:00:44Z"}

Following this testing round, it appears the initial suspicion that 10hrs can send the device into an INCONSISTENT state does not hold anymore. It is clear now that the same error will be thrown even after 8hrs, less than our initial successful tests at 9hrs. The more curious thing is why this error is thrown, once the client already decided that the timeout was reached and the deployment should be marked as failed. At that point, the device would be expected to carry on working as before the failed deployment.

@kacf any thoughts on my latest discovery?

Also, could you please share your thoughts on us transitioning to Mender client 4?

Our device is based on the raspberry pi 3. We base our images on official raspberry pi images, which in turns get prepared using mender-convert. Once prepared, the resulting image is used to commission new clients. In a similar manner, A/B updates are dealt with the same way.

Looking at the latest version of mender-convert, the MENDER_CLIENT_VERSION config seems to suggest that Mender client 4 will NOT be installed by default on debian bullseye and older. In our case, we base our images on bullseye, this decision being taken based on the supported images on your forum (Raspberry Pi 3 Model B/B+ Raspbian), as well as the image you guys provide in the Docs under Prepare a Raspberry Pi Device (Prepare a Raspberry Pi device | Mender documentation).

With all the information above, what is your suggestion for us to start using Mender client 4, and of course attempt to test if the new client is also affected by the issue we are facing?

Thank you very much, we appreciate your patience in dealing with this.

Hello,

With the new “auto” option is true that the default for bullseye and older will be Mender 3.5.x client, but you can override it to “latest” or an specific version to flavor the 4.x client. See also this reply in GitHub.

Note that we have not finalized the release of Mender client 4 and hence has not been officially announced. We are working on it as we speak :slight_smile:

Lluis