Hello again,
We have implemented the structure of the application as discussed above. A trial deployment was released, but it ended up in an INCONSISTENT
state. Below are some details of the deployment:
- Update module used:
directory
- Mender client version: 3.5.2
- Mender server version: hosted (Professional), latest
Device deployment logs
2024-02-12 15:56:33 +0000 UTC info: Running Mender client version: 3.5.2
2024-02-12 15:56:34 +0000 UTC info: State transition: update-fetch [Download_Enter] -> update-store [Download_Enter]
2024-02-12 15:56:34 +0000 UTC info: Installer: authenticated digital signature of artifact
2024-02-12 15:56:38 +0000 UTC info: State transition: update-store [Download_Enter] -> update-after-store [Download_Leave]
2024-02-12 15:56:38 +0000 UTC info: State transition: update-after-store [Download_Leave] -> mender-update-control-refresh-maps [none]
2024-02-12 15:56:38 +0000 UTC info: State transition: mender-update-control-refresh-maps [none] -> mender-update-control [none]
2024-02-12 15:56:38 +0000 UTC info: State transition: mender-update-control [none] -> update-install [ArtifactInstall]
2024-02-12 15:56:38 +0000 UTC info: Executing script: ArtifactInstall_Enter_00_retry_later
2024-02-12 15:56:39 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
Postponing to retry update install process later.
---------- end of script output
2024-02-12 15:56:39 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-12 15:57:40 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
Postponing to retry update install process later.
---------- end of script output
2024-02-12 15:57:40 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-12 15:58:41 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
Postponing to retry update install process later.
---------- end of script output
2024-02-12 15:58:41 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-12 15:59:43 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
Postponing to retry update install process later.
---------- end of script output
.
.
.
.
<It was retried for approx. 500 more times. So, truncated.>
.
.
.
.
2024-02-13 01:57:04 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-13 01:58:05 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
Postponing to retry update install process later.
---------- end of script output
2024-02-13 01:58:05 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-13 01:59:06 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
Postponing to retry update install process later.
---------- end of script output
2024-02-13 01:59:06 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-13 02:00:23 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Update window reached.
Continuing update process.
---------- end of script output
2024-02-13 02:00:27 +0000 UTC info: State transition: update-install [ArtifactInstall] -> mender-update-control-refresh-maps [none]
2024-02-13 02:00:27 +0000 UTC info: State transition: mender-update-control-refresh-maps [none] -> mender-update-control [none]
2024-02-13 02:00:27 +0000 UTC info: State transition: mender-update-control [none] -> update-commit [ArtifactCommit_Enter]
2024-02-13 02:00:27 +0000 UTC info: Device unauthorized; attempting reauthorization
2024-02-13 02:00:27 +0000 UTC info: Output (stderr) from command "/usr/share/mender/identity/mender-device-identity": using interface /sys/class/net/eth0
2024-02-13 02:00:27 +0000 UTC info: successfully received new authorization data from server https://eu.hosted.mender.io
2024-02-13 02:00:27 +0000 UTC info: shutting down websocket connections
2024-02-13 02:00:27 +0000 UTC info: Local proxy stopped
2024-02-13 02:00:27 +0000 UTC warning: error while sending close message: websocket: close sent
2024-02-13 02:00:27 +0000 UTC error: Proxy Shutdown failed: context canceled
2024-02-13 02:00:27 +0000 UTC warning: error while sending close message: websocket: close sent
2024-02-13 02:00:27 +0000 UTC info: Local proxy started
2024-02-13 02:00:27 +0000 UTC info: Reauthorization successful
2024-02-13 02:00:27 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-after-first-commit [none]
2024-02-13 02:00:27 +0000 UTC error: Could not write state data to persistent storage: State data stored and retrieved maximum number of times
2024-02-13 02:00:27 +0000 UTC error: fatal error: State data stored and retrieved maximum number of times
2024-02-13 02:00:27 +0000 UTC error: State transition loop detected in state update-after-first-commit: Forcefully aborting update. The system is likely to be in an inconsistent state after this.
2024-02-13 02:00:27 +0000 UTC info: State transition: update-after-first-commit [none] -> update-status-report [none]
Artifact details
Mender artifact:
Name: app-4.1.0
Format: mender
Version: 3
Signature: signed and verified correctly
Compatible devices: '[raspberrypi3]'
Provides group:
Depends on one of artifact(s): []
Depends on one of group(s): []
State scripts:
ArtifactCommit_Leave_00_start_app
ArtifactFailure_Leave_00_start_app
ArtifactInstall_Enter_00_retry_later
Updates:
0:
Type: directory
Provides:
rootfs-image.app.version: 4.1.0
Depends:
rootfs-image.version: 2024.01.1
Clears Provides: ["rootfs-image.app.*"]
Metadata: Nothing
Files:
name: update.tar
size: 26685440
modified: 2024-02-12 15:48:43 +0000 GMT
checksum: <redacted>
name: dest_dir
size: 17
modified: 2024-02-12 15:48:43 +0000 GMT
checksum: <redacted>
mender.conf settings
{
"ArtifactVerifyKey": "<redacted>",
"Connectivity": {
"DisableKeepAlive": false,
"IdleConnTimeoutSeconds": 30
},
"RetryPollIntervalSeconds": 300,
"InventoryPollIntervalSeconds": 28800,
"UpdatePollIntervalSeconds": 1800,
"ModuleTimeoutSeconds": 14400,
"ServerURL": "<redacted>",
"TenantToken": "<redacted>",
"UpdateLogPath": "/data/logs/mender/updates.log",
"DeviceTypeFile": "/data/mender/device_type",
"StateScriptRetryIntervalSeconds": 60,
"StateScriptRetryTimeoutSeconds": 129600
}
I am unsure of the reason device ended up in the inconsistent state. I can clearly see that ArtifactInstall
step succeeded (new files were placed on a device successfully, confirmed that manually), however it failed at ArtifactCommit_Enter
. As a result, no state scripts were run at states ArtifactCommit_Leave
and ArtifactFailure_Leave
leaving a device with a non-running application.
I have researched the issue a bit and came across following post: Artifact installs failing on "Tried maximum amount of times"
In particular, those two comments:
- Artifact installs failing on "Tried maximum amount of times" - #2 by kacf
- Artifact installs failing on "Tried maximum amount of times" - #12 by kacf
Does that mean I must increase RetryPollCount
? Currently, as per config above, it is not specified, so defaults to 3 * ceil(log2(RetryPollIntervalSeconds) + 1)
. As per mender.conf
above, RetryPollIntervalSeconds
is 300
, meaning that RetryPollCount
=3 * ceil(log2(300) + 1) => RetryPollCount
=30. However, it was approx. 550 requests to retry later from a script (so approx. 550 minutes/9 hours of “delay”). Could this be the reason?
If not, what could be a potential cause of a failure in delivering an update as per scenario above?
Looking forward to hearing from you soon.
LE Could it be ModuleTimeoutSeconds
?