Mender client report 14 hours later successful update

Good Day, Friends!

After a software update (that actually is a system update) the mender client sometimes reports 14 hours later. The update is successful but the mender client hangs on the reporting .

mender client 3.4.0 runtime: go1.19.3

Performing update, before reboot

time="2024-07-22T15:49:28Z" level=info msg="State transition: update-check [Sync] -> update-fetch [Download_Enter]"
time="2024-07-22T15:49:28Z" level=info msg="Running Mender client version: 3.4.0"
time="2024-07-22T15:49:29Z" level=info msg="State transition: update-fetch [Download_Enter] -> update-store [Download_Enter]"
time="2024-07-22T15:49:29Z" level=info msg="Installer: authenticated digital signature of artifact"
time="2024-07-22T15:49:52Z" level=info msg="State transition: update-store [Download_Enter] -> update-after-store [Download_Leave]"
time="2024-07-22T15:49:52Z" level=info msg="State transition: update-after-store [Download_Leave] -> mender-update-control-refresh-maps [none]"
time="2024-07-22T15:49:52Z" level=info msg="State transition: mender-update-control-refresh-maps [none] -> mender-update-control [none]"
time="2024-07-22T15:49:52Z" level=info msg="State transition: mender-update-control [none] -> update-install [ArtifactInstall]"
time="2024-07-22T15:49:52Z" level=info msg="Output (stdout) from command \"/usr/share/mender/modules/v3/install-system-update\": pre-install.sh present, executing..."
time="2024-07-22T15:49:52Z" level=info msg="Output (stdout) from command \"/usr/share/mender/modules/v3/install-system-update\": Changing password for root"
time="2024-07-22T15:49:52Z" level=info msg="Output (stdout) from command \"/usr/share/mender/modules/v3/install-system-update\": New password: "
time="2024-07-22T15:49:52Z" level=info msg="Output (stderr) from command \"/usr/share/mender/modules/v3/install-system-update\": passwd: password for root changed by root"
time="2024-07-22T15:50:22Z" level=info msg="Output (stdout) from command \"/usr/share/mender/modules/v3/install-system-update\": Retype password: "
time="2024-07-22T15:50:22Z" level=info msg="Output (stdout) from command \"/usr/share/mender/modules/v3/install-system-update\": Resizing the filesystem on /dev/mmcblk1p2 to 3418112 (1k) blocks."
time="2024-07-22T15:50:22Z" level=info msg="Output (stdout) from command \"/usr/share/mender/modules/v3/install-system-update\": The filesystem on /dev/mmcblk1p2 is now 3418112 (1k) blocks long.\n\n"
time="2024-07-22T15:50:22Z" level=info msg="State transition: update-install [ArtifactInstall] -> mender-update-control-refresh-maps [none]"
time="2024-07-22T15:50:22Z" level=info msg="State transition: mender-update-control-refresh-maps [none] -> mender-update-control [none]"
time="2024-07-22T15:50:22Z" level=info msg="State transition: mender-update-control [none] -> reboot [ArtifactReboot_Enter]"
time="2024-07-22T15:50:22Z" level=info msg="Rebooting device(s)"
time="2024-07-22T15:50:22Z" level=info msg="Daemon terminated with SIGTERM"

After reboot there is a huge time gap.

time="2024-07-22T15:50:26Z" level=info msg="Loaded configuration file: /var/lib/mender/mender.conf"
time="2024-07-22T15:50:26Z" level=info msg="Loaded configuration file: /etc/mender/mender.conf"
time="2024-07-22T15:50:26Z" level=info msg="'UpdateControlMapExpirationTimeSeconds' is not set in the Mender configuration file. Falling back to the default of 2*UpdatePollIntervalSeconds"
time="2024-07-22T15:50:26Z" level=info msg="'UpdateControlMapBootExpirationTimeSeconds' is not set in the Mender configuration file. Falling back to the default of 600 seconds"
time="2024-07-22T15:50:26Z" level=info msg="No dual rootfs configuration present"
time="2024-07-22T15:50:27Z" level=info msg="State transition: init [none] -> init [none]"
time="2024-07-22T15:50:27Z" level=info msg="Handling loaded state: reboot"
time="2024-07-22T15:50:27Z" level=info msg="Running Mender client version: 3.4.0"
time="2024-07-22T15:50:27Z" level=info msg="State transition: init [none] -> after-reboot [ArtifactReboot_Leave]"
time="2024-07-22T15:50:27Z" level=info msg="State transition: after-reboot [ArtifactReboot_Leave] -> after-reboot [ArtifactReboot_Leave]"
time="2024-07-22T15:50:27Z" level=info msg="State transition: after-reboot [ArtifactReboot_Leave] -> mender-update-control-refresh-maps [none]"
time="2024-07-22T15:50:27Z" level=info msg="State transition: mender-update-control-refresh-maps [none] -> mender-update-control [none]"
time="2024-07-22T15:50:27Z" level=info msg="State transition: mender-update-control [none] -> update-commit [ArtifactCommit_Enter]"
time="2024-07-22T15:50:27Z" level=info msg="Device unauthorized; attempting reauthorization"
time="2024-07-22T15:50:27Z" level=error msg="Failure occurred while executing authorization request: Method: Post, URL: https://<url>.com/api/devices/v1/authentication/auth_requests"
time="2024-07-22T15:50:27Z" level=error msg="Failed to authorize with \"https://<url>.com\": Unknown url.Error type: dial tcp: lookup <url>.com on 8.8.4.4:53: dial udp 8.8.4.4:53: connect: network is unreachable"
time="2024-07-22T15:50:27Z" level=warning msg="Reauthorization failed with error: transient error: authorization request failed"
time="2024-07-22T15:50:27Z" level=error msg="Failed to report status: transient error: authorization request failed"
time="2024-07-22T15:50:27Z" level=error msg="error reporting update status: reporting status failed: transient error: authorization request failed"
time="2024-07-22T15:50:27Z" level=error msg="Failed to send status report to server: transient error: reporting status failed: transient error: authorization request failed"
time="2024-07-22T15:50:27Z" level=info msg="State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]"
time="2024-07-23T06:14:24Z" level=info msg="State transition: update-pre-commit-status-report-retry [ArtifactCommit_Enter] -> update-commit [ArtifactCommit_Enter]"
time="2024-07-23T06:14:24Z" level=info msg="Device unauthorized; attempting reauthorization"
time="2024-07-23T06:14:25Z" level=info msg="successfully received new authorization data from server https://<url>.com"
time="2024-07-23T06:14:25Z" level=info msg="Local proxy started"
time="2024-07-23T06:14:25Z" level=info msg="Reauthorization successful"
time="2024-07-23T06:14:25Z" level=info msg="State transition: update-commit [ArtifactCommit_Enter] -> update-after-first-commit [none]"
time="2024-07-23T06:14:25Z" level=info msg="State transition: update-after-first-commit [none] -> update-after-commit [ArtifactCommit_Leave]"
time="2024-07-23T06:14:25Z" level=info msg="State transition: update-after-commit [ArtifactCommit_Leave] -> cleanup [none]"
time="2024-07-23T06:14:25Z" level=info msg="State transition: cleanup [none] -> update-status-report [none]"
time="2024-07-23T06:14:25Z" level=info msg="State transition: update-status-report [none] -> idle [Idle]"

How can I handle this? Is there a mender client configuration for this?

I think a similar question is, how to surpress ArtifactFailure to be triggered if the device comes up after an update and has not internet connection.
Especially bothers me this line that sometimes causing trouble.

time="2024-07-22T15:50:27Z" level=error msg="Failed to authorize with \"https://<url>.com\": Unknown url.Error type: dial tcp: lookup <url>.com on 8.8.4.4:53: dial udp 8.8.4.4:53: connect: network is unreachable"

Hi @mister_kanister,

I guess that this is two effects combining here.

  1. the client tries to reach the server, and fails because of network not reachable or not ready
  2. a polling cycle being in the range of 12hrs would re-trigger after that time and be successful then.

So the things I’d look at is:

  • why is the network connection failing? Is there maybe some kind of requirement to be up and running? If so, make sure that this is started before the Mender Client
  • the configurations, how do the polling cycles look like?

Greets,
Josef

Thank you for your answer!

This is how mender is configured in production. 14h gap is one of the biggest gaps. I think the polling is ok.

mender setup \
--device-type $DEVICE_TYPE \
--server-url $SERVER_URL \
--server-cert /etc/mender/server.crt \
--config /mnt/mender/mender.conf \
--data /mnt/mender \
--update-poll 120 \
--inventory-poll 120 \
--retry-poll 30

What we recently discovered is, that this behaviour occurs if the internet provider of the customer is forcing using IPv6 (e.g. Vodafone, Germany). How could this affect the mender client?

The OS is using SystemV and init scripts.