I just upgraded from mender version 1.6 to 2.0 but I noticed something different and caused the update to fail.
Everything looks good until rebooting, mender client failed to connect to the mender server initially because the wifi is not available when it is trying to report status. The wifi sometimes take some time to connect, but it will be available a bit later. Since mender client cannot report to the server and decided to roll back and the whole update fails.
I tried with hardwired ethernet, update succeeded without any issue.
mender 1.6 seems not having this problem, I always updated well with the wifi network.
Do I miss anything? Is there a configuration to modify the behavior? if yes, how?
You should probably look over your polling intervals in mender.conf which could result in such behavior as you have you explained if they are configured to aggressively.
2019-06-17 23:32:51 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:32:51 +0000 UTC debug: handle update fetch state
2019-06-17 23:32:51 +0000 UTC debug: status reported, response 204 No Content
2019-06-17 23:32:51 +0000 UTC debug: Received fetch update response &{200 OK 200 HTTP/1.1 1 1 map[Accept-Ranges:[bytes] Connection:[keep-alive] Content-Length:[624986624] Content-Security-Policy:[block-all-mixed-content] Content-Type:[application/vnd.mender-artifact] Date:[Mon, 17 Jun 2019 23:28:33 GMT] Etag:["13fe5398f1d41eef2f7a0536e556f68f"] Last-Modified:[Mon, 17 Jun 2019 23:22:48 GMT] Server:[openresty/1.13.6.2] Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] Vary:[Origin] X-Amz-Request-Id:[15A9202AC83DF205] X-Content-Type-Options:[nosniff] X-Frame-Options:[DENY] X-Xss-Protection:[1; mode=block]] 0xdce7a0 624986624 [] false false map[] 0xfb8300 0xf4aba0}+
2019-06-17 23:32:51 +0000 UTC info: State transition: update-fetch [Download_Enter] -> update-store [Download_Enter]
2019-06-17 23:32:51 +0000 UTC debug: handle update install state
2019-06-17 23:32:51 +0000 UTC debug: status reported, response 204 No Content
2019-06-17 23:32:51 +0000 UTC debug: Read data from device manifest file: device_type=paydia-a20-none
2019-06-17 23:32:51 +0000 UTC debug: Current manifest data: paydia-a20-none
2019-06-17 23:32:51 +0000 UTC info: no public key was provided for authenticating the artifact
2019-06-17 23:32:51 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2019-06-17 23:32:52 +0000 UTC debug: checking if device [paydia-a20-none] is on compatibile device list: [paydia-a20-none]
2019-06-17 23:32:52 +0000 UTC debug: installer: successfully read artifact [name: paydia-a20-none-release-mender-1007-3-180s-polling; version: 3; compatible devices: [paydia-a20-none]]
2019-06-17 23:32:52 +0000 UTC debug: Trying to install update of size: 4294967296
2019-06-17 23:32:52 +0000 UTC debug: Active partition: /dev/mmcblk0p2
2019-06-17 23:32:52 +0000 UTC debug: Detected inactive partition /dev/mmcblk0p3, based on active partition /dev/mmcblk0p2
2019-06-17 23:32:52 +0000 UTC info: native sector size of block device /dev/mmcblk0p3 is 512, we will write in chunks of 1048576
2019-06-17 23:32:52 +0000 UTC info: opening device /dev/mmcblk0p3 for writing
2019-06-17 23:32:52 +0000 UTC info: partition /dev/mmcblk0p3 size: 4294967296
2019-06-17 23:56:33 +0000 UTC info: wrote 4294967296/4294967296 bytes of update to device /dev/mmcblk0p3
2019-06-17 23:56:35 +0000 UTC debug: status reported, response 204 No Content
2019-06-17 23:56:35 +0000 UTC info: State transition: update-store [Download_Enter] -> update-after-store [Download_Leave]
2019-06-17 23:56:35 +0000 UTC info: State transition: update-after-store [Download_Leave] -> update-install [ArtifactInstall]
2019-06-17 23:56:35 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:56:35 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:56:35 +0000 UTC debug: status reported, response 204 No Content
2019-06-17 23:56:35 +0000 UTC debug: Inactive partition: /dev/mmcblk0p3
2019-06-17 23:56:35 +0000 UTC debug: Marking inactive partition (/dev/mmcblk0p3) as the new boot candidate.
2019-06-17 23:56:35 +0000 UTC info: Enabling partition with new image installed to be a boot candidate: 3
2019-06-17 23:56:35 +0000 UTC debug: Marking inactive partition as a boot candidate successful.
2019-06-17 23:56:35 +0000 UTC info: State transition: update-install [ArtifactInstall] -> reboot [ArtifactReboot_Enter]
2019-06-17 23:56:35 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:56:35 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:56:35 +0000 UTC debug: handling reboot state
2019-06-17 23:56:35 +0000 UTC debug: status reported, response 204 No Content
2019-06-17 23:56:35 +0000 UTC info: rebooting device(s)
2019-06-17 23:56:35 +0000 UTC info: Mender rebooting from active partition: /dev/mmcblk0p2
2019-06-17 23:57:29 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:57:29 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2019-06-17 23:57:29 +0000 UTC info: State transition: init [none] -> after-reboot [ArtifactReboot_Leave]
2019-06-17 23:57:29 +0000 UTC debug: Have U-Boot variable: upgrade_available=1
2019-06-17 23:57:29 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:1]
2019-06-17 23:57:29 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> after-reboot [ArtifactReboot_Leave]
2019-06-17 23:57:30 +0000 UTC debug: handling state after reboot
2019-06-17 23:57:30 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> update-commit [ArtifactCommit_Enter]
2019-06-17 23:57:30 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:57:30 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:57:30 +0000 UTC debug: handle update commit state
2019-06-17 23:57:30 +0000 UTC error: failed to report status: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
2019-06-17 23:59:05 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:59:05 +0000 UTC error: Update was interrupted in state: update-commit
2019-06-17 23:59:05 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2019-06-17 23:59:05 +0000 UTC info: State transition: init [none] -> rollback [ArtifactRollback]
2019-06-17 23:59:05 +0000 UTC debug: transitioning to error state
2019-06-17 23:59:05 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:05 +0000 UTC info: performing rollback
2019-06-17 23:59:05 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2019-06-17 23:59:05 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2019-06-17 23:59:05 +0000 UTC debug: will try to rollback reboot the device
2019-06-17 23:59:05 +0000 UTC info: State transition: rollback [ArtifactRollback] -> rollback-reboot [ArtifactRollbackReboot_Enter]
2019-06-17 23:59:05 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:05 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:06 +0000 UTC info: rebooting device(s) after rollback
2019-06-17 23:59:06 +0000 UTC info: Mender rebooting from inactive partition: /dev/mmcblk0p2
2019-06-17 23:59:48 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:59:48 +0000 UTC info: Update was interrupted in state: rollback-reboot
2019-06-17 23:59:48 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2019-06-17 23:59:48 +0000 UTC info: State transition: init [none] -> verify-rollback-reboot [ArtifactRollbackReboot_Leave]
2019-06-17 23:59:48 +0000 UTC debug: transitioning to error state
2019-06-17 23:59:48 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2019-06-17 23:59:48 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2019-06-17 23:59:48 +0000 UTC info: State transition: verify-rollback-reboot [ArtifactRollbackReboot_Leave] -> after-rollback-reboot [ArtifactRollbackReboot_Leave]
2019-06-17 23:59:48 +0000 UTC debug: handling state after rollback reboot
2019-06-17 23:59:48 +0000 UTC info: State transition: after-rollback-reboot [ArtifactRollbackReboot_Leave] -> update-error [ArtifactFailure]
2019-06-17 23:59:48 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:48 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:48 +0000 UTC debug: handle update error state
2019-06-17 23:59:48 +0000 UTC info: State transition: update-error [ArtifactFailure] -> cleanup [Error]
2019-06-17 23:59:48 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:48 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:48 +0000 UTC debug: Handling Cleanup state
2019-06-17 23:59:48 +0000 UTC info: State transition: cleanup [Error] -> update-status-report [none]
2019-06-17 23:59:48 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:59:48 +0000 UTC debug: handle update status report state
2019-06-17 23:59:48 +0000 UTC error: failed to report status: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:59:48 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:59:48 +0000 UTC error: failed to send status to server: transient error: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:59:48 +0000 UTC info: State transition: update-status-report [none] -> update-retry-report [none]
2019-06-18 00:00:48 +0000 UTC debug: wait complete
2019-06-18 00:00:48 +0000 UTC info: State transition: update-retry-report [none] -> update-status-report [none]
2019-06-18 00:00:48 +0000 UTC debug: handle update status report state
2019-06-18 00:00:48 +0000 UTC debug: status reported, response 204 No Content
2019-06-18 00:00:48 +0000 UTC debug: attempting to upload deployment logs for failed update
Thanks for any insight,
Hunter
Edit: @mirzak: apply syntax highlighting for code/logs
2019-06-17 23:56:35 +0000 UTC debug: handling reboot state
2019-06-17 23:56:35 +0000 UTC debug: status reported, response 204 No Content
2019-06-17 23:56:35 +0000 UTC info: rebooting device(s)
2019-06-17 23:56:35 +0000 UTC info: Mender rebooting from active partition: /dev/mmcblk0p2
2019-06-17 23:57:29 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:57:29 +0000 UTC info: Update Module path “/usr/share/mender/modules/v3” could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2019-06-17 23:57:29 +0000 UTC info: State transition: init [none] -> after-reboot [ArtifactReboot_Leave]
2019-06-17 23:57:29 +0000 UTC debug: Have U-Boot variable: upgrade_available=1
2019-06-17 23:57:29 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:1]
2019-06-17 23:57:29 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> after-reboot [ArtifactReboot_Leave]
2019-06-17 23:57:30 +0000 UTC debug: handling state after reboot
2019-06-17 23:57:30 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> update-commit [ArtifactCommit_Enter]
2019-06-17 23:57:30 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:57:30 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-06-17 23:57:30 +0000 UTC debug: handle update commit state
2019-06-17 23:57:30 +0000 UTC error: failed to report status: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
2019-06-17 23:59:05 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:59:05 +0000 UTC error: Update was interrupted in state: update-commit
2019-06-17 23:59:05 +0000 UTC info: Update Module path “/usr/share/mender/modules/v3” could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2019-06-17 23:59:05 +0000 UTC info: State transition: init [none] -> rollback [ArtifactRollback]
2019-06-17 23:59:05 +0000 UTC debug: transitioning to error state
This is the interesting part which triggers the rollback, and specifically these lines,
2019-06-17 23:57:30 +0000 UTC error: failed to report status: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.paydia.com/api/devices/v1/deployments/device/deployments/b5741723-a919-45e5-86d4-7b4911cabb54/status: dial tcp 192.168.1.135:443: connect: network is unreachable
2019-06-17 23:57:30 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
2019-06-17 23:59:05 +0000 UTC info: Running Mender version 2.0.0
2019-06-17 23:59:05 +0000 UTC error: Update was interrupted in state: update-commit
Not really any additional insights, other than confirming that yes, it looks like the client restarts in between those lines. There is a good minute and a half in between those lines, so maybe the whole system restarted.
We needed to connect to a VPN to get to our mender server and were having the same problem. We ended up adding a Sync Enter script that basically waits for the tunnel to come up (which is handled by a different systemd service). See below for example. This lives at “/etc/mender/scripts/Sync_Enter_00_netup-wait”. Make sure its executable too
#!/bin/sh
maxtm=60
cnt=0
set -e
while [ $(ifconfig | grep tun0 | wc -l) -eq 0 ]; do
cnt=$((($cnt + 1)))
if [ $cnt -gt $maxtm ]; then
echo "[$(date)] Reached max timeout! ($maxtm)" >> /data/mender/netlog
exit 1
fi
echo $cnt
sleep 1
done
exit 0
Thanks nick. mender log says a version file has to be present like /etc/mender/script/version, I am not sure what should be put inside it? what is the purpose of this file?
Indeed, after I put a /etc/mender/scripts/ArtifactReboot_Leave_00 in the Artifact file, the update process succeeded without any problem, the script is actually from latest master branch commit
1 #!/bin/sh
2 # Wait until the network is connected and the time is synced
3 # before commiting an update
4
5 RETRY_LATER=21
6 OK=0
7
8 ping 8.8.8.8 -c1 > /dev/null 2>&1
9 if [ ? != 0 ]; then
10 exit {RETRY_LATER}
11 fi
12
13 TIME_SYNC_STATUS=$(timedatectl | grep NTP | awk '{print NF}')
14 if [ "{TIME_SYNC_STATUS}" != “yes” ] && [ "{TIME_SYNC_STATUS}" != "active" ]; then
15 exit {RETRY_LATER}
16 fi
17
18 exit ${OK}
Thanks Nick for the hint, make the script executable is another hiccup during the testing :-).
I put 2 inside the /etc/mender/script/version, and it works with mender 2.0 client, not sure if the version file has to match the mender client version, I didn’t find any documentation on it. But hey it works.
Thanks for all your helpful replies, I appreciate them.