Hi,
Our devices lack a hardware clock, so a network-based time sync is required on each boot to bring the system time forward from the Unix Epoch. In some rare cases when rebooting to newly-installed firmware during a deployment, the system clock will not update quickly enough (due to uncontrollable network conditions) - which can result in the maximum number of attempts for status report submission to be hit across the nine minutes following boot and a firmware rollback to be triggered.
We have planned adjustments to our Polling Intervals (previously set to 180 seconds) to help with these situations: InventoryPollingInterval to 3600s, UpdatePollingInterval to 600s, RetryPollingInterval to 600s.
Our logs indicate that the maximum number of retries being attempted are three. Can anyone advise on the procedure to increasing the number of status report retries? Is this a function of the StateScriptRetryInterval and StateScriptRetryTimeout values? Should there be concern of the StateScriptRetryTimeout being triggered when the system clock jumps from Unix Epoch to the actual time?
We’ve made other efforts to await timesync prior to launching the mender service, but I would greatly appreciate any other guidance on best practices for improving mender update behavior in systems lacking a hardware clock.
Here is an example log from a system which rolled back when the system clock did not sync to actual time before hitting the maximum number of status report retries:
2020-07-07 14:06:54 +0000 UTC debug: handle update fetch state
2020-07-07 14:06:55 +0000 UTC debug: status reported, response &{204 No Content 204 HTTP/2.0 2 0 map[Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] X-Frame-Options:[DENY] X-Content-Type-Options:[nosniff] X-Xss-Protection:[1; mode=block] Pragma:[no-cache] Date:[Tue, 07 Jul 2020 14:06:55 GMT] X-Men-Requestid:[3a51ef80-64e4-43ef-89d5-4b4d7ec44740] Vary:[Accept-Encoding] X-Deployments-Version:[unknown] Cache-Control:[no-cache, no-store] Server:[openresty/1.13.6.2] Content-Encoding:[gzip]] 0x4420106d80 0 [] false false map[] 0x442046c800 0x44200a48f0}
2020-07-07 14:06:55 +0000 UTC debug: Received fetch update response &{200 OK 200 HTTP/1.1 1 1 map[X-Amz-Id-2:[o9anXSxLoS3Mlmz+horocsmkklXufThtJIaKYpUGue4A4HzxtexguUuP/s9OS0cR+QLWLvMj2Fk=] X-Amz-Request-Id:[6AFCBF68B8C4CA3B] Last-Modified:[Thu, 28 May 2020 14:53:30 GMT] Accept-Ranges:[bytes] Date:[Tue, 07 Jul 2020 14:06:56 GMT] Etag:["15ef3ea5de189c45bbfd657bf2584cdb"] Content-Type:[application/vnd.mender-artifact] Content-Length:[714749440] Server:[AmazonS3]] 0x44201703a0 714749440 [] false false map[] 0x4420064b00 0x4420252420}+
2020-07-07 14:06:55 +0000 UTC info: State transition: update-fetch [Download] -> update-store [Download]
2020-07-07 14:06:55 +0000 UTC debug: handle update install state
2020-07-07 14:06:55 +0000 UTC debug: status reported, response &{204 No Content 204 HTTP/2.0 2 0 map[Date:[Tue, 07 Jul 2020 14:06:55 GMT] Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] X-Frame-Options:[DENY] X-Xss-Protection:[1; mode=block] Pragma:[no-cache] Server:[openresty/1.13.6.2] Vary:[Accept-Encoding] X-Deployments-Version:[unknown] X-Men-Requestid:[3a366de0-50d8-4190-b87d-c1ffc712b239] X-Content-Type-Options:[nosniff] Cache-Control:[no-cache, no-store] Content-Encoding:[gzip]] 0x4420170c40 0 [] false false map[] 0x442000a800 0x44200a48f0}
2020-07-07 14:06:55 +0000 UTC debug: Read data from device manifest file: device_type=dragonboard-410c
2020-07-07 14:06:55 +0000 UTC debug: Found needed line: device_type=dragonboard-410c
2020-07-07 14:06:55 +0000 UTC debug: Current manifest data: dragonboard-410c
2020-07-07 14:06:55 +0000 UTC debug: checking if device [dragonboard-410c] is on compatibile device list: [dragonboard-410c]
2020-07-07 14:06:55 +0000 UTC debug: installing update rpb-console-image-dragonboard-410c-20200528133351-IMM.LE.1.0-8x16_36900.rootfs.ext4 of size 3078619136
2020-07-07 14:06:55 +0000 UTC debug: Trying to install update of size: 3078619136
2020-07-07 14:06:55 +0000 UTC debug: Have U-Boot variable: mender_boot_part=12
2020-07-07 14:06:55 +0000 UTC debug: List of U-Boot variables:map[mender_boot_part:12]
2020-07-07 14:06:55 +0000 UTC debug: Setting active partition from mount candidate: /dev/mmcblk0p12
2020-07-07 14:06:55 +0000 UTC debug: Detected inactive partition /dev/mmcblk0p11, based on active partition /dev/mmcblk0p12
2020-07-07 14:06:55 +0000 UTC info: opening device /dev/mmcblk0p11 for writing
2020-07-07 14:06:55 +0000 UTC info: partition /dev/mmcblk0p11 size: 3221225472
2020-07-07 14:19:18 +0000 UTC info: wrote 3078619136/3078619136 bytes of update to device /dev/mmcblk0p11
2020-07-07 14:19:19 +0000 UTC debug: installer: successfully read artifact [name: Miku_03.04.14; version: 2; compatible devices: [dragonboard-410c]]
2020-07-07 14:19:19 +0000 UTC debug: status reported, response &{204 No Content 204 HTTP/2.0 2 0 map[Content-Encoding:[gzip] Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] X-Frame-Options:[DENY] X-Xss-Protection:[1; mode=block] Cache-Control:[no-cache, no-store] Server:[openresty/1.13.6.2] Date:[Tue, 07 Jul 2020 14:19:19 GMT] Vary:[Accept-Encoding] X-Deployments-Version:[unknown] X-Men-Requestid:[9ea46fa3-8dfb-40cc-b86a-0334ca047b92] X-Content-Type-Options:[nosniff] Pragma:[no-cache]] 0x442019f940 0 [] false false map[] 0x442046c200 0x44200a4630}
2020-07-07 14:19:19 +0000 UTC info: State transition: update-store [Download] -> update-install [ArtifactInstall]
2020-07-07 14:19:19 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:19 +0000 UTC debug: executing script: Download_Leave_03_pipe-copy
2020-07-07 14:19:19 +0000 UTC debug: executing script: Download_Leave_04_net-copy
2020-07-07 14:19:19 +0000 UTC debug: executing script: Download_Leave_05_update-request
2020-07-07 14:19:25 +0000 UTC debug: executing script: Download_Leave_06_install-uboot-gpio-toggle
2020-07-07 14:19:25 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:25 +0000 UTC debug: status reported, response &{204 No Content 204 HTTP/2.0 2 0 map[Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] X-Xss-Protection:[1; mode=block] Pragma:[no-cache] Content-Encoding:[gzip] Vary:[Accept-Encoding] X-Deployments-Version:[unknown] X-Frame-Options:[DENY] X-Content-Type-Options:[nosniff] Cache-Control:[no-cache, no-store] Server:[openresty/1.13.6.2] Date:[Tue, 07 Jul 2020 14:19:25 GMT] X-Men-Requestid:[6293daeb-baf4-4723-8abf-d2283faec275]] 0x4420190ea0 0 [] false false map[] 0x4420065800 0x44200a4630}
2020-07-07 14:19:25 +0000 UTC debug: Inactive partition: /dev/mmcblk0p11
2020-07-07 14:19:25 +0000 UTC debug: Marking inactive partition (/dev/mmcblk0p11) as the new boot candidate.
2020-07-07 14:19:25 +0000 UTC info: Enabling partition with new image installed to be a boot candidate: 11
2020-07-07 14:19:25 +0000 UTC debug: Marking inactive partition as a boot candidate successful.
2020-07-07 14:19:25 +0000 UTC info: State transition: update-install [ArtifactInstall] -> reboot [ArtifactReboot_Enter]
2020-07-07 14:19:25 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:25 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:25 +0000 UTC debug: handling reboot state
2020-07-07 14:19:25 +0000 UTC debug: status reported, response &{204 No Content 204 HTTP/2.0 2 0 map[Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] X-Frame-Options:[DENY] X-Xss-Protection:[1; mode=block] Server:[openresty/1.13.6.2] Vary:[Accept-Encoding] X-Deployments-Version:[unknown] X-Men-Requestid:[d26a63f6-a90b-4cf2-8274-3c9f7417bf39] X-Content-Type-Options:[nosniff] Cache-Control:[no-cache, no-store] Pragma:[no-cache] Date:[Tue, 07 Jul 2020 14:19:25 GMT] Content-Encoding:[gzip]] 0x4420170680 0 [] false false map[] 0x4420065b00 0x44200a4630}
2020-07-07 14:19:25 +0000 UTC info: rebooting device
1970-01-01 00:00:12 +0000 UTC info: Running Mender version 2.1.2
1970-01-01 00:00:12 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
1970-01-01 00:00:12 +0000 UTC info: State transition: init [none] -> after-reboot [ArtifactReboot_Leave]
1970-01-01 00:00:12 +0000 UTC debug: Have U-Boot variable: upgrade_available=1
1970-01-01 00:00:12 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:1]
1970-01-01 00:00:12 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> after-reboot [ArtifactReboot_Leave]
1970-01-01 00:00:12 +0000 UTC debug: handling state after reboot
1970-01-01 00:00:12 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> update-commit [ArtifactCommit_Enter]
1970-01-01 00:00:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:00:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:00:12 +0000 UTC debug: handle update commit state
1970-01-01 00:00:12 +0000 UTC error: failed to report status: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:00:12 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:00:12 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:00:12 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
1970-01-01 00:03:12 +0000 UTC debug: wait complete
1970-01-01 00:03:12 +0000 UTC info: State transition: update-pre-commit-status-report-retry [ArtifactCommit_Enter] -> update-commit [ArtifactCommit_Enter]
1970-01-01 00:03:12 +0000 UTC debug: handle update commit state
1970-01-01 00:03:12 +0000 UTC error: failed to report status: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:03:12 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:03:12 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:03:12 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
1970-01-01 00:06:12 +0000 UTC debug: wait complete
1970-01-01 00:06:12 +0000 UTC info: State transition: update-pre-commit-status-report-retry [ArtifactCommit_Enter] -> update-commit [ArtifactCommit_Enter]
1970-01-01 00:06:12 +0000 UTC debug: handle update commit state
1970-01-01 00:06:12 +0000 UTC error: failed to report status: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:06:12 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:06:12 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:06:12 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
1970-01-01 00:09:12 +0000 UTC debug: wait complete
1970-01-01 00:09:12 +0000 UTC info: State transition: update-pre-commit-status-report-retry [ArtifactCommit_Enter] -> update-commit [ArtifactCommit_Enter]
1970-01-01 00:09:12 +0000 UTC debug: handle update commit state
1970-01-01 00:09:12 +0000 UTC error: failed to report status: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:09:12 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:09:12 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:09:12 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
1970-01-01 00:09:12 +0000 UTC error: transient error: Tried sending status report maximum number of times.
1970-01-01 00:09:12 +0000 UTC info: State transition: update-pre-commit-status-report-retry [ArtifactCommit_Enter] -> rollback [ArtifactRollback]
1970-01-01 00:09:12 +0000 UTC debug: transitioning to error state
1970-01-01 00:09:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:09:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:09:12 +0000 UTC info: performing rollback
1970-01-01 00:09:12 +0000 UTC debug: Have U-Boot variable: upgrade_available=1
1970-01-01 00:09:12 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:1]
1970-01-01 00:09:12 +0000 UTC debug: Active partition: /dev/mmcblk0p11
1970-01-01 00:09:12 +0000 UTC debug: Detected inactive partition /dev/mmcblk0p12, based on active partition /dev/mmcblk0p11
1970-01-01 00:09:12 +0000 UTC debug: Marking inactive partition (/dev/mmcblk0p12) as the new boot candidate.
1970-01-01 00:09:12 +0000 UTC info: setting partition for rollback: 12
1970-01-01 00:09:12 +0000 UTC debug: Marking inactive partition as a boot candidate successful.
1970-01-01 00:09:12 +0000 UTC debug: will try to rollback reboot the device
1970-01-01 00:09:12 +0000 UTC info: State transition: rollback [ArtifactRollback] -> rollback-reboot [ArtifactRollbackReboot_Enter]
1970-01-01 00:09:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:09:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:09:12 +0000 UTC info: rebooting device(s) after rollback
1970-01-01 00:09:12 +0000 UTC info: Mender rebooting from inactive partition: /dev/mmcblk0p11
2020-07-07 14:19:31 +0000 UTC debug: handling state after reboot
2020-07-07 14:19:31 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> update-verify [ArtifactReboot_Leave]
2020-07-07 14:19:31 +0000 UTC debug: handle update verify state
2020-07-07 14:19:31 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2020-07-07 14:19:31 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2020-07-07 14:19:31 +0000 UTC error: update info for deployment bc642fc6-86a4-4636-8420-94dea3975cf6 present, but update flag is not set; running rollback image (previous active partition)
2020-07-07 14:19:31 +0000 UTC info: State transition: update-verify [ArtifactReboot_Leave] -> rollback [ArtifactRollback]
2020-07-07 14:19:31 +0000 UTC debug: transitioning to error state
2020-07-07 14:19:31 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:31 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:31 +0000 UTC info: performing rollback
2020-07-07 14:19:31 +0000 UTC info: State transition: rollback [ArtifactRollback] -> update-error [ArtifactFailure]
2020-07-07 14:19:31 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:31 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:31 +0000 UTC debug: handle update error state
2020-07-07 14:19:31 +0000 UTC info: State transition: update-error [ArtifactFailure] -> update-status-report [none]
2020-07-07 14:19:31 +0000 UTC debug: handle update status report state
2020-07-07 14:19:31 +0000 UTC error: failed to report status: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
2020-07-07 14:19:31 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
2020-07-07 14:19:31 +0000 UTC error: failed to send status to server: transient error: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
2020-07-07 14:19:31 +0000 UTC info: State transition: update-status-report [ArtifactFailure] -> update-retry-report [none]
2020-07-07 14:32:05 +0000 UTC debug: wait complete
2020-07-07 14:32:05 +0000 UTC info: State transition: update-retry-report [ArtifactFailure] -> update-status-report [ArtifactFailure]
2020-07-07 14:32:05 +0000 UTC debug: handle update status report state
2020-07-07 14:32:05 +0000 UTC debug: status reported, response &{204 No Content 204 HTTP/2.0 2 0 map[Content-Encoding:[gzip] X-Deployments-Version:[unknown] X-Frame-Options:[DENY] X-Content-Type-Options:[nosniff] Cache-Control:[no-cache, no-store] Pragma:[no-cache] Server:[openresty/1.13.6.2] Date:[Tue, 07 Jul 2020 14:32:05 GMT] Vary:[Accept-Encoding] X-Men-Requestid:[9b0cc3bb-5a77-4015-8fb2-37b6363532ae] Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] X-Xss-Protection:[1; mode=block]] 0x442043c2c0 0 [] false false map[] 0x4420428400 0x44200a2630}
2020-07-07 14:32:05 +0000 UTC debug: attempting to upload deployment logs for failed update