Mender 2.0 update failure because wifi network not immediately available

Hi,

I just upgraded from mender version 1.6 to 2.0 but I noticed something different and caused the update to fail.

Everything looks good until rebooting, mender client failed to connect to the mender server initially because the wifi is not available when it is trying to report status. The wifi sometimes take some time to connect, but it will be available a bit later. Since mender client cannot report to the server and decided to roll back and the whole update fails.

I tried with hardwired ethernet, update succeeded without any issue.

mender 1.6 seems not having this problem, I always updated well with the wifi network.

Do I miss anything? Is there a configuration to modify the behavior? if yes, how?

Thanks,
Hunter

You should probably look over your polling intervals in mender.conf which could result in such behavior as you have you explained if they are configured to aggressively.

Alternatively you could setup a post-reboot or pre-sync state script to explicitly delay until wifi connectivity is established.

I have changed the polling interval to 180 seconds from default 60 seconds, but still failed.

Here are the full logs, it does successfully write the other partition, after reboot, mender client decides to roll back.

{                                                                                                                       
  "ClientProtocol": "http",
  "HttpsClient": {
    "Certificate": "",
    "Key": ""
  },
  "RootfsPartA": "/dev/mmcblk0p2",
  "RootfsPartB": "/dev/mmcblk0p3",
  "UpdatePollIntervalSeconds": 180,
  "InventoryPollIntervalSeconds": 180,
  "RetryPollIntervalSeconds": 180,
  "ServerURL": "mender.paydia.com",
  "ServerCertificate": "/etc/mender/server.crt"
}

Logs:

2019-06-17 23:32:51 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:32:51 +0000 UTC debug: handle update fetch state
2019-06-17 23:32:51 +0000 UTC debug: status reported, response 204 No Content
2019-06-17 23:32:51 +0000 UTC debug: Received fetch update response &{200 OK 200 HTTP/1.1 1 1 map[Accept-Ranges:[bytes] Connection:[keep-alive] Content-Length:[624986624] Content-Security-Policy:[block-all-mixed-content] Content-Type:[application/vnd.mender-artifact] Date:[Mon, 17 Jun 2019 23:28:33 GMT] Etag:["13fe5398f1d41eef2f7a0536e556f68f"] Last-Modified:[Mon, 17 Jun 2019 23:22:48 GMT] Server:[openresty/1.13.6.2] Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] Vary:[Origin] X-Amz-Request-Id:[15A9202AC83DF205] X-Content-Type-Options:[nosniff] X-Frame-Options:[DENY] X-Xss-Protection:[1; mode=block]] 0xdce7a0 624986624 [] false false map[] 0xfb8300 0xf4aba0}+
2019-06-17 23:32:51 +0000 UTC info: State transition: update-fetch [Download_Enter] -> update-store [Download_Enter]
2019-06-17 23:32:51 +0000 UTC debug: handle update install state
2019-06-17 23:32:51 +0000 UTC debug: status reported, response 204 No Content
2019-06-17 23:32:51 +0000 UTC debug: Read data from device manifest file: device_type=paydia-a20-none
2019-06-17 23:32:51 +0000 UTC debug: Current manifest data: paydia-a20-none
2019-06-17 23:32:51 +0000 UTC info: no public key was provided for authenticating the artifact
2019-06-17 23:32:51 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2019-06-17 23:32:52 +0000 UTC debug: checking if device [paydia-a20-none] is on compatibile device list: [paydia-a20-none]
2019-06-17 23:32:52 +0000 UTC debug: installer: successfully read artifact [name: paydia-a20-none-release-mender-1007-3-180s-polling; version: 3; compatible devices: [paydia-a20-none]]
2019-06-17 23:32:52 +0000 UTC debug: Trying to install update of size: 4294967296
2019-06-17 23:32:52 +0000 UTC debug: Active partition: /dev/mmcblk0p2
2019-06-17 23:32:52 +0000 UTC debug: Detected inactive partition /dev/mmcblk0p3, based on active partition /dev/mmcblk0p2
2019-06-17 23:32:52 +0000 UTC info: native sector size of block device /dev/mmcblk0p3 is 512, we will write in chunks of 1048576
2019-06-17 23:32:52 +0000 UTC info: opening device /dev/mmcblk0p3 for writing
2019-06-17 23:32:52 +0000 UTC info: partition /dev/mmcblk0p3 size: 4294967296
2019-06-17 23:56:33 +0000 UTC info: wrote 4294967296/4294967296 bytes of update to device /dev/mmcblk0p3
2019-06-17 23:56:35 +0000 UTC debug: status reported, response 204 No Content
2019-06-17 23:56:35 +0000 UTC info: State transition: update-store [Download_Enter] -> update-after-store [Download_Leave]
2019-06-17 23:56:35 +0000 UTC info: State transition: update-after-store [Download_Leave] -> update-install [ArtifactInstall]
2019-06-17 23:56:35 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:56:35 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:56:35 +0000 UTC debug: status reported, response 204 No Content
2019-06-17 23:56:35 +0000 UTC debug: Inactive partition: /dev/mmcblk0p3
2019-06-17 23:56:35 +0000 UTC debug: Marking inactive partition (/dev/mmcblk0p3) as the new boot candidate.
2019-06-17 23:56:35 +0000 UTC info: Enabling partition with new image installed to be a boot candidate: 3
2019-06-17 23:56:35 +0000 UTC debug: Marking inactive partition as a boot candidate successful.
2019-06-17 23:56:35 +0000 UTC info: State transition: update-install [ArtifactInstall] -> reboot [ArtifactReboot_Enter]
2019-06-17 23:56:35 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:56:35 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:56:35 +0000 UTC debug: handling reboot state
2019-06-17 23:56:35 +0000 UTC debug: status reported, response 204 No Content
2019-06-17 23:56:35 +0000 UTC info: rebooting device(s)
2019-06-17 23:56:35 +0000 UTC info: Mender rebooting from active partition: /dev/mmcblk0p2
2019-06-17 23:57:29 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:57:29 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2019-06-17 23:57:29 +0000 UTC info: State transition: init [none] -> after-reboot [ArtifactReboot_Leave]
2019-06-17 23:57:29 +0000 UTC debug: Have U-Boot variable: upgrade_available=1
2019-06-17 23:57:29 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:1]
2019-06-17 23:57:29 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> after-reboot [ArtifactReboot_Leave]
2019-06-17 23:57:30 +0000 UTC debug: handling state after reboot
2019-06-17 23:57:30 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> update-commit [ArtifactCommit_Enter]
2019-06-17 23:57:30 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:57:30 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:57:30 +0000 UTC debug: handle update commit state
2019-06-17 23:57:30 +0000 UTC error: failed to report status: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
2019-06-17 23:59:05 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:59:05 +0000 UTC error: Update was interrupted in state: update-commit
2019-06-17 23:59:05 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2019-06-17 23:59:05 +0000 UTC info: State transition: init [none] -> rollback [ArtifactRollback]
2019-06-17 23:59:05 +0000 UTC debug: transitioning to error state
2019-06-17 23:59:05 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:05 +0000 UTC info: performing rollback
2019-06-17 23:59:05 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2019-06-17 23:59:05 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2019-06-17 23:59:05 +0000 UTC debug: will try to rollback reboot the device
2019-06-17 23:59:05 +0000 UTC info: State transition: rollback [ArtifactRollback] -> rollback-reboot [ArtifactRollbackReboot_Enter]
2019-06-17 23:59:05 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:05 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:06 +0000 UTC info: rebooting device(s) after rollback
2019-06-17 23:59:06 +0000 UTC info: Mender rebooting from inactive partition: /dev/mmcblk0p2
2019-06-17 23:59:48 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:59:48 +0000 UTC info: Update was interrupted in state: rollback-reboot
2019-06-17 23:59:48 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2019-06-17 23:59:48 +0000 UTC info: State transition: init [none] -> verify-rollback-reboot [ArtifactRollbackReboot_Leave]
2019-06-17 23:59:48 +0000 UTC debug: transitioning to error state
2019-06-17 23:59:48 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2019-06-17 23:59:48 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2019-06-17 23:59:48 +0000 UTC info: State transition: verify-rollback-reboot [ArtifactRollbackReboot_Leave] -> after-rollback-reboot [ArtifactRollbackReboot_Leave]
2019-06-17 23:59:48 +0000 UTC debug: handling state after rollback reboot
2019-06-17 23:59:48 +0000 UTC info: State transition: after-rollback-reboot [ArtifactRollbackReboot_Leave] -> update-error [ArtifactFailure]
2019-06-17 23:59:48 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:48 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:48 +0000 UTC debug: handle update error state
2019-06-17 23:59:48 +0000 UTC info: State transition: update-error [ArtifactFailure] -> cleanup [Error]
2019-06-17 23:59:48 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:48 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:48 +0000 UTC debug: Handling Cleanup state
2019-06-17 23:59:48 +0000 UTC info: State transition: cleanup [Error] -> update-status-report [none]
2019-06-17 23:59:48 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:48 +0000 UTC debug: handle update status report state
2019-06-17 23:59:48 +0000 UTC error: failed to report status: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:59:48 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:59:48 +0000 UTC error: failed to send status to server: transient error: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:59:48 +0000 UTC info: State transition: update-status-report [none] -> update-retry-report [none]
2019-06-18 00:00:48 +0000 UTC debug: wait complete
2019-06-18 00:00:48 +0000 UTC info: State transition: update-retry-report [none] -> update-status-report [none]
2019-06-18 00:00:48 +0000 UTC debug: handle update status report state
2019-06-18 00:00:48 +0000 UTC debug: status reported, response 204 No Content
2019-06-18 00:00:48 +0000 UTC debug: attempting to upload deployment logs for failed update

Thanks for any insight,
Hunter

Edit: @mirzak: apply syntax highlighting for code/logs

2019-06-17 23:56:35 +0000 UTC debug: handling reboot state
2019-06-17 23:56:35 +0000 UTC debug: status reported, response 204 No Content
2019-06-17 23:56:35 +0000 UTC info: rebooting device(s)
2019-06-17 23:56:35 +0000 UTC info: Mender rebooting from active partition: /dev/mmcblk0p2
2019-06-17 23:57:29 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:57:29 +0000 UTC info: Update Module path “/usr/share/mender/modules/v3” could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2019-06-17 23:57:29 +0000 UTC info: State transition: init [none] -> after-reboot [ArtifactReboot_Leave]
2019-06-17 23:57:29 +0000 UTC debug: Have U-Boot variable: upgrade_available=1
2019-06-17 23:57:29 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:1]
2019-06-17 23:57:29 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> after-reboot [ArtifactReboot_Leave]
2019-06-17 23:57:30 +0000 UTC debug: handling state after reboot
2019-06-17 23:57:30 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> update-commit [ArtifactCommit_Enter]
2019-06-17 23:57:30 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:57:30 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:57:30 +0000 UTC debug: handle update commit state
2019-06-17 23:57:30 +0000 UTC error: failed to report status: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
2019-06-17 23:59:05 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:59:05 +0000 UTC error: Update was interrupted in state: update-commit
2019-06-17 23:59:05 +0000 UTC info: Update Module path “/usr/share/mender/modules/v3” could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2019-06-17 23:59:05 +0000 UTC info: State transition: init [none] -> rollback [ArtifactRollback]
2019-06-17 23:59:05 +0000 UTC debug: transitioning to error state

This is the interesting part which triggers the rollback, and specifically these lines,

2019-06-17 23:57:30 +0000 UTC error: failed to report status: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
2019-06-17 23:59:05 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:59:05 +0000 UTC error: Update was interrupted in state: update-commit

But it looks like the Mender client is restarted?

@kacf, maybe you have additional insights

Not really any additional insights, other than confirming that yes, it looks like the client restarts in between those lines. There is a good minute and a half in between those lines, so maybe the whole system restarted.

I think I need a post-reboot state script to wait for a connection, any pointers or examples?

After writing the script, where do I put it?

Thanks,
Hunter

We needed to connect to a VPN to get to our mender server and were having the same problem. We ended up adding a Sync Enter script that basically waits for the tunnel to come up (which is handled by a different systemd service). See below for example. This lives at “/etc/mender/scripts/Sync_Enter_00_netup-wait”. Make sure its executable too

#!/bin/sh
maxtm=60
cnt=0

set -e

while [ $(ifconfig | grep tun0 | wc -l) -eq 0 ]; do
  cnt=$((($cnt + 1)))
  if [ $cnt -gt $maxtm ]; then
    echo "[$(date)] Reached max timeout! ($maxtm)" >> /data/mender/netlog
    exit 1
  fi
  echo $cnt
  sleep 1
done

exit 0
1 Like

Thanks nick. mender log says a version file has to be present like /etc/mender/script/version, I am not sure what should be put inside it? what is the purpose of this file?

Indeed, after I put a /etc/mender/scripts/ArtifactReboot_Leave_00 in the Artifact file, the update process succeeded without any problem, the script is actually from latest master branch commit

commit 6049b7666022eedc143bf99e4d225008d329eb45
Author: Mirza Krak <mirza.krak@northern.tech>
Date:   Tue May 21 13:20:48 2019 +0200

1 #!/bin/sh
2 # Wait until the network is connected and the time is synced
3 # before commiting an update
4
5 RETRY_LATER=21
6 OK=0
7
8 ping 8.8.8.8 -c1 > /dev/null 2>&1
9 if [ ? != 0 ]; then 10 exit {RETRY_LATER}
11 fi
12
13 TIME_SYNC_STATUS=$(timedatectl | grep NTP | awk '{print NF}') 14 if [ "{TIME_SYNC_STATUS}" != “yes” ] && [ "{TIME_SYNC_STATUS}" != "active" ]; then 15 exit {RETRY_LATER}
16 fi
17
18 exit ${OK}

Thanks Nick for the hint, make the script executable is another hiccup during the testing :-).

I put 2 inside the /etc/mender/script/version, and it works with mender 2.0 client, not sure if the version file has to match the mender client version, I didn’t find any documentation on it. But hey it works.

Thanks for all your helpful replies, I appreciate them.

-Hunter

1 Like