System Software Version after using mender-convert

Hello again,

We have implemented the structure of the application as discussed above. A trial deployment was released, but it ended up in an INCONSISTENT state. Below are some details of the deployment:

  • Update module used: directory
  • Mender client version: 3.5.2
  • Mender server version: hosted (Professional), latest
Device deployment logs
2024-02-12 15:56:33 +0000 UTC info: Running Mender client version: 3.5.2
2024-02-12 15:56:34 +0000 UTC info: State transition: update-fetch [Download_Enter] -> update-store [Download_Enter]
2024-02-12 15:56:34 +0000 UTC info: Installer: authenticated digital signature of artifact
2024-02-12 15:56:38 +0000 UTC info: State transition: update-store [Download_Enter] -> update-after-store [Download_Leave]
2024-02-12 15:56:38 +0000 UTC info: State transition: update-after-store [Download_Leave] -> mender-update-control-refresh-maps [none]
2024-02-12 15:56:38 +0000 UTC info: State transition: mender-update-control-refresh-maps [none] -> mender-update-control [none]
2024-02-12 15:56:38 +0000 UTC info: State transition: mender-update-control [none] -> update-install [ArtifactInstall]
2024-02-12 15:56:38 +0000 UTC info: Executing script: ArtifactInstall_Enter_00_retry_later
2024-02-12 15:56:39 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
Postponing to retry update install process later.

---------- end of script output
2024-02-12 15:56:39 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-12 15:57:40 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
Postponing to retry update install process later.

---------- end of script output
2024-02-12 15:57:40 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-12 15:58:41 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
Postponing to retry update install process later.

---------- end of script output
2024-02-12 15:58:41 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-12 15:59:43 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
Postponing to retry update install process later.

---------- end of script output
.
.
. 
.
<It was retried for approx. 500 more times. So, truncated.>
.
.
.
.
2024-02-13 01:57:04 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-13 01:58:05 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
Postponing to retry update install process later.

---------- end of script output
2024-02-13 01:58:05 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-13 01:59:06 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Outside of update window.
Postponing to retry update install process later.

---------- end of script output
2024-02-13 01:59:06 +0000 UTC info: statescript: ArtifactInstall_Enter_00_retry_later requested a retry
2024-02-13 02:00:23 +0000 UTC info: Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Enter_00_retry_later
rootfs-2024.01.1: Running ArtifactInstall_Enter_00_retry_later
Update window reached.
Continuing update process.

---------- end of script output
2024-02-13 02:00:27 +0000 UTC info: State transition: update-install [ArtifactInstall] -> mender-update-control-refresh-maps [none]
2024-02-13 02:00:27 +0000 UTC info: State transition: mender-update-control-refresh-maps [none] -> mender-update-control [none]
2024-02-13 02:00:27 +0000 UTC info: State transition: mender-update-control [none] -> update-commit [ArtifactCommit_Enter]
2024-02-13 02:00:27 +0000 UTC info: Device unauthorized; attempting reauthorization
2024-02-13 02:00:27 +0000 UTC info: Output (stderr) from command "/usr/share/mender/identity/mender-device-identity": using interface /sys/class/net/eth0
2024-02-13 02:00:27 +0000 UTC info: successfully received new authorization data from server https://eu.hosted.mender.io
2024-02-13 02:00:27 +0000 UTC info: shutting down websocket connections
2024-02-13 02:00:27 +0000 UTC info: Local proxy stopped
2024-02-13 02:00:27 +0000 UTC warning: error while sending close message: websocket: close sent
2024-02-13 02:00:27 +0000 UTC error: Proxy Shutdown failed: context canceled
2024-02-13 02:00:27 +0000 UTC warning: error while sending close message: websocket: close sent
2024-02-13 02:00:27 +0000 UTC info: Local proxy started
2024-02-13 02:00:27 +0000 UTC info: Reauthorization successful
2024-02-13 02:00:27 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-after-first-commit [none]
2024-02-13 02:00:27 +0000 UTC error: Could not write state data to persistent storage: State data stored and retrieved maximum number of times
2024-02-13 02:00:27 +0000 UTC error: fatal error: State data stored and retrieved maximum number of times
2024-02-13 02:00:27 +0000 UTC error: State transition loop detected in state update-after-first-commit: Forcefully aborting update. The system is likely to be in an inconsistent state after this.
2024-02-13 02:00:27 +0000 UTC info: State transition: update-after-first-commit [none] -> update-status-report [none]
Artifact details
Mender artifact:
  Name: app-4.1.0
  Format: mender
  Version: 3
  Signature: signed and verified correctly
  Compatible devices: '[raspberrypi3]'
  Provides group: 
  Depends on one of artifact(s): []
  Depends on one of group(s): []
  State scripts:
    ArtifactCommit_Leave_00_start_app
    ArtifactFailure_Leave_00_start_app
    ArtifactInstall_Enter_00_retry_later

Updates:
    0:
    Type:   directory
    Provides:
	rootfs-image.app.version: 4.1.0
    Depends:
	rootfs-image.version: 2024.01.1
    Clears Provides: ["rootfs-image.app.*"]
    Metadata: Nothing
    Files:
      name:     update.tar
      size:     26685440
      modified: 2024-02-12 15:48:43 +0000 GMT
      checksum: <redacted>
      name:     dest_dir
      size:     17
      modified: 2024-02-12 15:48:43 +0000 GMT
      checksum: <redacted>
mender.conf settings
{
    "ArtifactVerifyKey": "<redacted>",
    "Connectivity": {
    "DisableKeepAlive": false,
    "IdleConnTimeoutSeconds": 30
    },
    "RetryPollIntervalSeconds": 300,
    "InventoryPollIntervalSeconds": 28800,
    "UpdatePollIntervalSeconds": 1800,
    "ModuleTimeoutSeconds": 14400,
    "ServerURL": "<redacted>",
    "TenantToken": "<redacted>",
    "UpdateLogPath": "/data/logs/mender/updates.log",
    "DeviceTypeFile": "/data/mender/device_type",
    "StateScriptRetryIntervalSeconds": 60,
    "StateScriptRetryTimeoutSeconds": 129600
}

I am unsure of the reason device ended up in the inconsistent state. I can clearly see that ArtifactInstall step succeeded (new files were placed on a device successfully, confirmed that manually), however it failed at ArtifactCommit_Enter. As a result, no state scripts were run at states ArtifactCommit_Leave and ArtifactFailure_Leave leaving a device with a non-running application.

I have researched the issue a bit and came across following post: Artifact installs failing on "Tried maximum amount of times"

In particular, those two comments:

Does that mean I must increase RetryPollCount? Currently, as per config above, it is not specified, so defaults to 3 * ceil(log2(RetryPollIntervalSeconds) + 1). As per mender.conf above, RetryPollIntervalSeconds is 300, meaning that RetryPollCount=3 * ceil(log2(300) + 1) => RetryPollCount=30. However, it was approx. 550 requests to retry later from a script (so approx. 550 minutes/9 hours of “delay”). Could this be the reason?

If not, what could be a potential cause of a failure in delivering an update as per scenario above?

Looking forward to hearing from you soon.

LE Could it be ModuleTimeoutSeconds?