Update fails because device cant report

Good Day Friends!

We have an issue where an update is considered failed if the device can not report back. This one testdevice is in a restrictive network and report needs several attempts. If the report fails, a rollback is triggered and resulting in INCONSISTENT.

Is there a way to decouple this behaviour? Making report to backend as optional.

2023-12-06 13:58:55 +0000 UTC info: Device unauthorized; attempting reauthorization
2023-12-06 13:59:06 +0000 UTC error: Failure occurred while executing authorization request: Method: Post, URL: https://mender.<url>.com/api/devices/v1/authentication/auth_requests
2023-12-06 13:59:06 +0000 UTC error: Failed to authorize with "https://mender.<url>com": Unknown url.Error type: dial tcp: lookup mender.<url>.com on 8.8.8.8:53: dial udp 8.8.8.8:53: connect: network is unreachable
2023-12-06 13:59:06 +0000 UTC warning: Reauthorization failed with error: transient error: authorization request failed
2023-12-06 13:59:06 +0000 UTC error: Failed to report status: transient error: authorization request failed
2023-12-06 13:59:06 +0000 UTC error: error reporting update status: reporting status failed: transient error: authorization request failed
2023-12-06 13:59:06 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: transient error: authorization request failed
2023-12-06 13:59:06 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
2023-12-06 13:59:36 +0000 UTC info: State transition: update-pre-commit-status-report-retry [ArtifactCommit_Enter] -> update-commit [ArtifactCommit_Enter]
2023-12-06 13:59:36 +0000 UTC info: Device unauthorized; attempting reauthorization
2023-12-06 13:59:29 +0000 UTC info: Running Mender client version: 3.4.0
2023-12-06 13:59:29 +0000 UTC error: Mender shut down in state: update-commit
2023-12-06 13:59:29 +0000 UTC info: State transition: init [none] -> update-error [ArtifactFailure]
2023-12-06 13:59:29 +0000 UTC info: Output (stdout) from command "/usr/share/mender/modules/v3/install-wifi-bridge": ArtifactFailure: restore old binary
2023-12-06 13:59:30 +0000 UTC info: State transition: update-error [ArtifactFailure] -> cleanup [Error]
2023-12-06 13:59:30 +0000 UTC info: State transition: cleanup [Error] -> update-status-report [none]
2023-12-06 13:59:30 +0000 UTC info: Device unauthorized; attempting reauthorization
2023-12-06 13:59:30 +0000 UTC error: Failure occurred while executing authorization request: Method: Post, URL: https://mender.<url>.com/api/devices/v1/authentication/auth_requests
2023-12-06 13:59:30 +0000 UTC error: Failed to authorize with "https://mender.<url>.com": Unknown url.Error type: dial tcp: lookup mender.<url>.com on 8.8.8.8:53: dial udp 8.8.8.8:53: connect: network is unreachable
2023-12-06 13:59:30 +0000 UTC warning: Reauthorization failed with error: transient error: authorization request failed
2023-12-06 13:59:30 +0000 UTC error: Failed to report status: transient error: authorization request failed
2023-12-06 13:59:30 +0000 UTC error: error reporting update status: reporting status failed: transient error: authorization request failed
2023-12-06 13:59:30 +0000 UTC error: Failed to send status to server: transient error: reporting status failed: transient error: authorization request failed
2023-12-06 13:59:30 +0000 UTC info: State transition: update-status-report [none] -> update-retry-report [none]
2023-12-06 14:00:25 +0000 UTC info: State transition: update-retry-report [none] -> update-status-report [none]
2023-12-06 14:00:25 +0000 UTC info: Device unauthorized; attempting reauthorization
2023-12-06 14:00:26 +0000 UTC info: successfully received new authorization data from server https://mender.<url>.com
2023-12-06 14:00:26 +0000 UTC info: Local proxy started
2023-12-06 14:00:26 +0000 UTC info: Reauthorization successful

On demand I will publish the whole deployment log.

There was some talk about adding a AllowCommitWithoutContactingServer configuration option, but this is still on the backlog (private customer ticket), and it hasn’t seen a lot of activity lately. It would make sense though.

It comes to my attention when poll intervals are high enough, that this is no longer a problem:

mender setup \
--device-type $DEVICE_TYPE \
--server-url $SERVER_URL \
--server-cert /etc/mender/server.crt \
--config /mnt/mender/mender.conf \
--data /mnt/mender \
--update-poll 120 \
--inventory-poll 120 \
--retry-poll 30

So if --update-poll 120 --inventory-poll 120 --retry-poll 30 are high enough, that the reporting is working as expected.

Considered resolved from my side.