Rollback from time mismatch (no hardware clock) before time sync

Hi,

Our devices lack a hardware clock, so a network-based time sync is required on each boot to bring the system time forward from the Unix Epoch. In some rare cases when rebooting to newly-installed firmware during a deployment, the system clock will not update quickly enough (due to uncontrollable network conditions) - which can result in the maximum number of attempts for status report submission to be hit across the nine minutes following boot and a firmware rollback to be triggered.

We have planned adjustments to our Polling Intervals (previously set to 180 seconds) to help with these situations: InventoryPollingInterval to 3600s, UpdatePollingInterval to 600s, RetryPollingInterval to 600s.

Our logs indicate that the maximum number of retries being attempted are three. Can anyone advise on the procedure to increasing the number of status report retries? Is this a function of the StateScriptRetryInterval and StateScriptRetryTimeout values? Should there be concern of the StateScriptRetryTimeout being triggered when the system clock jumps from Unix Epoch to the actual time?

We’ve made other efforts to await timesync prior to launching the mender service, but I would greatly appreciate any other guidance on best practices for improving mender update behavior in systems lacking a hardware clock.

Here is an example log from a system which rolled back when the system clock did not sync to actual time before hitting the maximum number of status report retries:

2020-07-07 14:06:54 +0000 UTC debug: handle update fetch state
2020-07-07 14:06:55 +0000 UTC debug: status reported, response &{204 No Content 204 HTTP/2.0 2 0 map[Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] X-Frame-Options:[DENY] X-Content-Type-Options:[nosniff] X-Xss-Protection:[1; mode=block] Pragma:[no-cache] Date:[Tue, 07 Jul 2020 14:06:55 GMT] X-Men-Requestid:[3a51ef80-64e4-43ef-89d5-4b4d7ec44740] Vary:[Accept-Encoding] X-Deployments-Version:[unknown] Cache-Control:[no-cache, no-store] Server:[openresty/1.13.6.2] Content-Encoding:[gzip]] 0x4420106d80 0 [] false false map[] 0x442046c800 0x44200a48f0}
2020-07-07 14:06:55 +0000 UTC debug: Received fetch update response &{200 OK 200 HTTP/1.1 1 1 map[X-Amz-Id-2:[o9anXSxLoS3Mlmz+horocsmkklXufThtJIaKYpUGue4A4HzxtexguUuP/s9OS0cR+QLWLvMj2Fk=] X-Amz-Request-Id:[6AFCBF68B8C4CA3B] Last-Modified:[Thu, 28 May 2020 14:53:30 GMT] Accept-Ranges:[bytes] Date:[Tue, 07 Jul 2020 14:06:56 GMT] Etag:["15ef3ea5de189c45bbfd657bf2584cdb"] Content-Type:[application/vnd.mender-artifact] Content-Length:[714749440] Server:[AmazonS3]] 0x44201703a0 714749440 [] false false map[] 0x4420064b00 0x4420252420}+
2020-07-07 14:06:55 +0000 UTC info: State transition: update-fetch [Download] -> update-store [Download]
2020-07-07 14:06:55 +0000 UTC debug: handle update install state
2020-07-07 14:06:55 +0000 UTC debug: status reported, response &{204 No Content 204 HTTP/2.0 2 0 map[Date:[Tue, 07 Jul 2020 14:06:55 GMT] Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] X-Frame-Options:[DENY] X-Xss-Protection:[1; mode=block] Pragma:[no-cache] Server:[openresty/1.13.6.2] Vary:[Accept-Encoding] X-Deployments-Version:[unknown] X-Men-Requestid:[3a366de0-50d8-4190-b87d-c1ffc712b239] X-Content-Type-Options:[nosniff] Cache-Control:[no-cache, no-store] Content-Encoding:[gzip]] 0x4420170c40 0 [] false false map[] 0x442000a800 0x44200a48f0}
2020-07-07 14:06:55 +0000 UTC debug: Read data from device manifest file: device_type=dragonboard-410c
2020-07-07 14:06:55 +0000 UTC debug: Found needed line: device_type=dragonboard-410c
2020-07-07 14:06:55 +0000 UTC debug: Current manifest data: dragonboard-410c
2020-07-07 14:06:55 +0000 UTC debug: checking if device [dragonboard-410c] is on compatibile device list: [dragonboard-410c]
2020-07-07 14:06:55 +0000 UTC debug: installing update rpb-console-image-dragonboard-410c-20200528133351-IMM.LE.1.0-8x16_36900.rootfs.ext4 of size 3078619136
2020-07-07 14:06:55 +0000 UTC debug: Trying to install update of size: 3078619136
2020-07-07 14:06:55 +0000 UTC debug: Have U-Boot variable: mender_boot_part=12
2020-07-07 14:06:55 +0000 UTC debug: List of U-Boot variables:map[mender_boot_part:12]
2020-07-07 14:06:55 +0000 UTC debug: Setting active partition from mount candidate: /dev/mmcblk0p12
2020-07-07 14:06:55 +0000 UTC debug: Detected inactive partition /dev/mmcblk0p11, based on active partition /dev/mmcblk0p12
2020-07-07 14:06:55 +0000 UTC info: opening device /dev/mmcblk0p11 for writing
2020-07-07 14:06:55 +0000 UTC info: partition /dev/mmcblk0p11 size: 3221225472
2020-07-07 14:19:18 +0000 UTC info: wrote 3078619136/3078619136 bytes of update to device /dev/mmcblk0p11
2020-07-07 14:19:19 +0000 UTC debug: installer: successfully read artifact [name: Miku_03.04.14; version: 2; compatible devices: [dragonboard-410c]]
2020-07-07 14:19:19 +0000 UTC debug: status reported, response &{204 No Content 204 HTTP/2.0 2 0 map[Content-Encoding:[gzip] Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] X-Frame-Options:[DENY] X-Xss-Protection:[1; mode=block] Cache-Control:[no-cache, no-store] Server:[openresty/1.13.6.2] Date:[Tue, 07 Jul 2020 14:19:19 GMT] Vary:[Accept-Encoding] X-Deployments-Version:[unknown] X-Men-Requestid:[9ea46fa3-8dfb-40cc-b86a-0334ca047b92] X-Content-Type-Options:[nosniff] Pragma:[no-cache]] 0x442019f940 0 [] false false map[] 0x442046c200 0x44200a4630}
2020-07-07 14:19:19 +0000 UTC info: State transition: update-store [Download] -> update-install [ArtifactInstall]
2020-07-07 14:19:19 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:19 +0000 UTC debug: executing script: Download_Leave_03_pipe-copy
2020-07-07 14:19:19 +0000 UTC debug: executing script: Download_Leave_04_net-copy
2020-07-07 14:19:19 +0000 UTC debug: executing script: Download_Leave_05_update-request
2020-07-07 14:19:25 +0000 UTC debug: executing script: Download_Leave_06_install-uboot-gpio-toggle
2020-07-07 14:19:25 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:25 +0000 UTC debug: status reported, response &{204 No Content 204 HTTP/2.0 2 0 map[Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] X-Xss-Protection:[1; mode=block] Pragma:[no-cache] Content-Encoding:[gzip] Vary:[Accept-Encoding] X-Deployments-Version:[unknown] X-Frame-Options:[DENY] X-Content-Type-Options:[nosniff] Cache-Control:[no-cache, no-store] Server:[openresty/1.13.6.2] Date:[Tue, 07 Jul 2020 14:19:25 GMT] X-Men-Requestid:[6293daeb-baf4-4723-8abf-d2283faec275]] 0x4420190ea0 0 [] false false map[] 0x4420065800 0x44200a4630}
2020-07-07 14:19:25 +0000 UTC debug: Inactive partition: /dev/mmcblk0p11
2020-07-07 14:19:25 +0000 UTC debug: Marking inactive partition (/dev/mmcblk0p11) as the new boot candidate.
2020-07-07 14:19:25 +0000 UTC info: Enabling partition with new image installed to be a boot candidate: 11
2020-07-07 14:19:25 +0000 UTC debug: Marking inactive partition as a boot candidate successful.
2020-07-07 14:19:25 +0000 UTC info: State transition: update-install [ArtifactInstall] -> reboot [ArtifactReboot_Enter]
2020-07-07 14:19:25 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:25 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:25 +0000 UTC debug: handling reboot state
2020-07-07 14:19:25 +0000 UTC debug: status reported, response &{204 No Content 204 HTTP/2.0 2 0 map[Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] X-Frame-Options:[DENY] X-Xss-Protection:[1; mode=block] Server:[openresty/1.13.6.2] Vary:[Accept-Encoding] X-Deployments-Version:[unknown] X-Men-Requestid:[d26a63f6-a90b-4cf2-8274-3c9f7417bf39] X-Content-Type-Options:[nosniff] Cache-Control:[no-cache, no-store] Pragma:[no-cache] Date:[Tue, 07 Jul 2020 14:19:25 GMT] Content-Encoding:[gzip]] 0x4420170680 0 [] false false map[] 0x4420065b00 0x44200a4630}
2020-07-07 14:19:25 +0000 UTC info: rebooting device
1970-01-01 00:00:12 +0000 UTC info: Running Mender version 2.1.2
1970-01-01 00:00:12 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
1970-01-01 00:00:12 +0000 UTC info: State transition: init [none] -> after-reboot [ArtifactReboot_Leave]
1970-01-01 00:00:12 +0000 UTC debug: Have U-Boot variable: upgrade_available=1
1970-01-01 00:00:12 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:1]
1970-01-01 00:00:12 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> after-reboot [ArtifactReboot_Leave]
1970-01-01 00:00:12 +0000 UTC debug: handling state after reboot
1970-01-01 00:00:12 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> update-commit [ArtifactCommit_Enter]
1970-01-01 00:00:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:00:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:00:12 +0000 UTC debug: handle update commit state
1970-01-01 00:00:12 +0000 UTC error: failed to report status: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:00:12 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:00:12 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:00:12 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
1970-01-01 00:03:12 +0000 UTC debug: wait complete
1970-01-01 00:03:12 +0000 UTC info: State transition: update-pre-commit-status-report-retry [ArtifactCommit_Enter] -> update-commit [ArtifactCommit_Enter]
1970-01-01 00:03:12 +0000 UTC debug: handle update commit state
1970-01-01 00:03:12 +0000 UTC error: failed to report status: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:03:12 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:03:12 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:03:12 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
1970-01-01 00:06:12 +0000 UTC debug: wait complete
1970-01-01 00:06:12 +0000 UTC info: State transition: update-pre-commit-status-report-retry [ArtifactCommit_Enter] -> update-commit [ArtifactCommit_Enter]
1970-01-01 00:06:12 +0000 UTC debug: handle update commit state
1970-01-01 00:06:12 +0000 UTC error: failed to report status: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:06:12 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:06:12 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:06:12 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
1970-01-01 00:09:12 +0000 UTC debug: wait complete
1970-01-01 00:09:12 +0000 UTC info: State transition: update-pre-commit-status-report-retry [ArtifactCommit_Enter] -> update-commit [ArtifactCommit_Enter]
1970-01-01 00:09:12 +0000 UTC debug: handle update commit state
1970-01-01 00:09:12 +0000 UTC error: failed to report status: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:09:12 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:09:12 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
1970-01-01 00:09:12 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
1970-01-01 00:09:12 +0000 UTC error: transient error: Tried sending status report maximum number of times.
1970-01-01 00:09:12 +0000 UTC info: State transition: update-pre-commit-status-report-retry [ArtifactCommit_Enter] -> rollback [ArtifactRollback]
1970-01-01 00:09:12 +0000 UTC debug: transitioning to error state
1970-01-01 00:09:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:09:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:09:12 +0000 UTC info: performing rollback
1970-01-01 00:09:12 +0000 UTC debug: Have U-Boot variable: upgrade_available=1
1970-01-01 00:09:12 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:1]
1970-01-01 00:09:12 +0000 UTC debug: Active partition: /dev/mmcblk0p11
1970-01-01 00:09:12 +0000 UTC debug: Detected inactive partition /dev/mmcblk0p12, based on active partition /dev/mmcblk0p11
1970-01-01 00:09:12 +0000 UTC debug: Marking inactive partition (/dev/mmcblk0p12) as the new boot candidate.
1970-01-01 00:09:12 +0000 UTC info: setting partition for rollback: 12
1970-01-01 00:09:12 +0000 UTC debug: Marking inactive partition as a boot candidate successful.
1970-01-01 00:09:12 +0000 UTC debug: will try to rollback reboot the device
1970-01-01 00:09:12 +0000 UTC info: State transition: rollback [ArtifactRollback] -> rollback-reboot [ArtifactRollbackReboot_Enter]
1970-01-01 00:09:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:09:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:09:12 +0000 UTC info: rebooting device(s) after rollback
1970-01-01 00:09:12 +0000 UTC info: Mender rebooting from inactive partition: /dev/mmcblk0p11
2020-07-07 14:19:31 +0000 UTC debug: handling state after reboot
2020-07-07 14:19:31 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> update-verify [ArtifactReboot_Leave]
2020-07-07 14:19:31 +0000 UTC debug: handle update verify state
2020-07-07 14:19:31 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2020-07-07 14:19:31 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2020-07-07 14:19:31 +0000 UTC error: update info for deployment bc642fc6-86a4-4636-8420-94dea3975cf6 present, but update flag is not set; running rollback image (previous active partition)
2020-07-07 14:19:31 +0000 UTC info: State transition: update-verify [ArtifactReboot_Leave] -> rollback [ArtifactRollback]
2020-07-07 14:19:31 +0000 UTC debug: transitioning to error state
2020-07-07 14:19:31 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:31 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:31 +0000 UTC info: performing rollback
2020-07-07 14:19:31 +0000 UTC info: State transition: rollback [ArtifactRollback] -> update-error [ArtifactFailure]
2020-07-07 14:19:31 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:31 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 60 seconds
2020-07-07 14:19:31 +0000 UTC debug: handle update error state
2020-07-07 14:19:31 +0000 UTC info: State transition: update-error [ArtifactFailure] -> update-status-report [none]
2020-07-07 14:19:31 +0000 UTC debug: handle update status report state
2020-07-07 14:19:31 +0000 UTC error: failed to report status: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
2020-07-07 14:19:31 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
2020-07-07 14:19:31 +0000 UTC error: failed to send status to server: transient error: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/bc642fc6-86a4-4636-8420-94dea3975cf6/status: dial tcp: lookup mender.mikucloud.com on [::1]:53: server misbehaving
2020-07-07 14:19:31 +0000 UTC info: State transition: update-status-report [ArtifactFailure] -> update-retry-report [none]
2020-07-07 14:32:05 +0000 UTC debug: wait complete
2020-07-07 14:32:05 +0000 UTC info: State transition: update-retry-report [ArtifactFailure] -> update-status-report [ArtifactFailure]
2020-07-07 14:32:05 +0000 UTC debug: handle update status report state
2020-07-07 14:32:05 +0000 UTC debug: status reported, response &{204 No Content 204 HTTP/2.0 2 0 map[Content-Encoding:[gzip] X-Deployments-Version:[unknown] X-Frame-Options:[DENY] X-Content-Type-Options:[nosniff] Cache-Control:[no-cache, no-store] Pragma:[no-cache] Server:[openresty/1.13.6.2] Date:[Tue, 07 Jul 2020 14:32:05 GMT] Vary:[Accept-Encoding] X-Men-Requestid:[9b0cc3bb-5a77-4015-8fb2-37b6363532ae] Strict-Transport-Security:[max-age=63072000; includeSubdomains; preload] X-Xss-Protection:[1; mode=block]] 0x442043c2c0 0 [] false false map[] 0x4420428400 0x44200a2630}
2020-07-07 14:32:05 +0000 UTC debug: attempting to upload deployment logs for failed update
1 Like

Hi @Thomas_Love,

I think the best approach here is to use a state script to delay the server connections until the time has synced. You can see an approach for this here: https://github.com/mendersoftware/meta-mender/tree/master/meta-mender-core/recipes-mender/mender-wait-for-timesync

Drew

Thanks Drew, am I correct in my reading that this state script adds a delay of up to 50s for time sync?

Do you know of a way to increase the number of retry attempts, as well?

Yes, that particular script has a timeout at 50s but that can obviously be tweaked for your specific use case.

I’m not sure about the number of attempts, it may be hard coded. @kacf can you confirm?

Drew

I have also checked out our environment, the ArtifactCommit_Enter_10_wait-for-timesync script is already present in our build, as in the source you have linked.

IIRC, I believe that the formula for retry count is: ceil(log₂(RetryPollIntervalSeconds/60)) × 3 + 3, where RetryPollIntervalSeconds comes from mender.conf. If you set it higher, it will try more times.

I should add that it always starts with short intervals, and increases them exponentially up to RetryPollIntervalSeconds, so it will give reasonably quick turnaround even when it’s set high.

I found exactly the same issue on some production devices. In particular on some of these devices this issue happens very frequently and I had to try to release multiple time the same version without success.

Which value of RetryPollIntervalSeconds do you suggest?

Is there a way (for instance with systemd) to delay mender at boot to be sure to have a valid system date?

It may be possible to create another systemd service that monitors the time sync and then touches a stamp file somewhere. You can (I think) have systemd delay the Mender service until that stamp file exists.

Drew

1 Like

So we’ve encountered a related issue, shown in the following log.

I believe that this log shows the mender client failing to report status (due to the system time being at Epoch) one time. Before any other report attempts, a time sync appears to take place. However, the mender client then determines that a rollback is necessary and carries out such a procedure. I suspect/worry that this rollback is being caused due to the delta between system time before and after the time sync - which exceeds the StateScriptRetryTimeout.

Please take a look:

2020-07-14 15:41:06 +0000 UTC info: Running Mender version 2.1.2
2020-07-14 15:41:06 +0000 UTC debug: handle update fetch state
2020-07-14 15:41:06 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:41:06 +0000 UTC debug: Received fetch update response &{200 OK 200 HTTP/1.1 1 1 map[Date:[Tue, 14 Jul 2020 15:41:07 GMT] Last-Modified:[Tue, 07 Jul 2020 15:06:31 GMT] Content-Type:[application/vnd.mender-artifact] Content-Length:[759485440] Server:[AmazonS3] X-Amz-Id-2:[3mTgc1BM+4V/AE/iEg5V9rMq6HbrFkdKEPH9Y1xvRIW7/Oz7mVs/GkmJTEhfS7/QWNaGQcvRQaE=] X-Amz-Request-Id:[1D12A831781A22C2] Etag:["422f835621fea5beb7e03e5f8608a147"] Accept-Ranges:[bytes]] 0x44203f0380 759485440 [] false false map[] 0x442042c300 0x44203fc210}+
2020-07-14 15:41:06 +0000 UTC info: State transition: update-fetch [Download_Enter] -> update-store [Download_Enter]
2020-07-14 15:41:06 +0000 UTC debug: handle update install state
2020-07-14 15:41:07 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:41:07 +0000 UTC debug: Read data from device manifest file: device_type=dragonboard-410c
2020-07-14 15:41:07 +0000 UTC debug: Current manifest data: dragonboard-410c
2020-07-14 15:41:07 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2020-07-14 15:41:07 +0000 UTC info: installer: authenticated digital signature of artifact
2020-07-14 15:41:07 +0000 UTC debug: checking if device [dragonboard-410c] is on compatible device list: [dragonboard-410c]
2020-07-14 15:41:07 +0000 UTC debug: installer: successfully read artifact [name: Miku_03.04.15; version: 2; compatible devices: [dragonboard-410c]]
2020-07-14 15:41:07 +0000 UTC debug: Trying to install update of size: 3078619136
2020-07-14 15:41:07 +0000 UTC debug: Active partition: /dev/mmcblk0p11
2020-07-14 15:41:07 +0000 UTC debug: Detected inactive partition /dev/mmcblk0p12, based on active partition /dev/mmcblk0p11
2020-07-14 15:41:07 +0000 UTC info: native sector size of block device /dev/mmcblk0p12 is 512, we will write in chunks of 1048576
2020-07-14 15:41:07 +0000 UTC info: opening device /dev/mmcblk0p12 for writing
2020-07-14 15:41:07 +0000 UTC info: partition /dev/mmcblk0p12 size: 3221225472
2020-07-14 15:50:51 +0000 UTC info: wrote 3078619136/3078619136 bytes of update to device /dev/mmcblk0p12
2020-07-14 15:50:52 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:50:52 +0000 UTC info: State transition: update-store [Download_Enter] -> update-after-store [Download_Leave]
2020-07-14 15:50:52 +0000 UTC info: State transition: update-after-store [Download_Leave] -> update-install [ArtifactInstall]
2020-07-14 15:50:52 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:50:52 +0000 UTC debug: start executing script: Download_Leave_03_pipe-copy
2020-07-14 15:50:52 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:50:52 +0000 UTC debug: start executing script: Download_Leave_04_net-copy
2020-07-14 15:50:52 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:50:52 +0000 UTC debug: start executing script: Download_Leave_05_update-request
2020-07-14 15:50:52 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:50:58 +0000 UTC debug: start executing script: Download_Leave_06_install-uboot-gpio-toggle
2020-07-14 15:50:58 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:50:58 +0000 UTC debug: start executing script: Download_Leave_07_all-copy
2020-07-14 15:50:58 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:50:58 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:50:58 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:50:59 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:50:59 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:50:59 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:50:59 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:50:59 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:50:59 +0000 UTC debug: Inactive partition: /dev/mmcblk0p12
2020-07-14 15:50:59 +0000 UTC debug: Marking inactive partition (/dev/mmcblk0p12) as the new boot candidate.
2020-07-14 15:50:59 +0000 UTC info: Enabling partition with new image installed to be a boot candidate: 12
2020-07-14 15:50:59 +0000 UTC debug: Marking inactive partition as a boot candidate successful.
2020-07-14 15:50:59 +0000 UTC info: State transition: update-install [ArtifactInstall] -> reboot [ArtifactReboot_Enter]
2020-07-14 15:50:59 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:50:59 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:50:59 +0000 UTC debug: handling reboot state
2020-07-14 15:50:59 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:50:59 +0000 UTC info: rebooting device(s)
2020-07-14 15:50:59 +0000 UTC info: Mender rebooting from active partition: /dev/mmcblk0p11
1970-01-01 00:00:12 +0000 UTC info: Running Mender version 2.1.2
1970-01-01 00:00:12 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
1970-01-01 00:00:12 +0000 UTC info: State transition: init [none] -> after-reboot [ArtifactReboot_Leave]
1970-01-01 00:00:12 +0000 UTC debug: Have U-Boot variable: upgrade_available=1
1970-01-01 00:00:12 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:1]
1970-01-01 00:00:12 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> after-reboot [ArtifactReboot_Leave]
1970-01-01 00:00:12 +0000 UTC debug: handling state after reboot
1970-01-01 00:00:12 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> update-commit [ArtifactCommit_Enter]
1970-01-01 00:00:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:00:12 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
1970-01-01 00:00:12 +0000 UTC debug: handle update commit state
1970-01-01 00:00:12 +0000 UTC error: failed to report status: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/a6c69a4a-019f-4cd2-b9f7-3ce2b0edd4a7/status: x509: certificate has expired or is not yet valid
1970-01-01 00:00:12 +0000 UTC error: error reporting update status: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/a6c69a4a-019f-4cd2-b9f7-3ce2b0edd4a7/status: x509: certificate has expired or is not yet valid
1970-01-01 00:00:12 +0000 UTC error: Failed to send status report to server: transient error: reporting status failed: Put https://mender.mikucloud.com/api/devices/v1/deployments/device/deployments/a6c69a4a-019f-4cd2-b9f7-3ce2b0edd4a7/status: x509: certificate has expired or is not yet valid
1970-01-01 00:00:12 +0000 UTC info: State transition: update-commit [ArtifactCommit_Enter] -> update-pre-commit-status-report-retry [ArtifactCommit_Enter]
2020-07-14 15:51:07 +0000 UTC info: Running Mender version 2.1.2
2020-07-14 15:51:07 +0000 UTC error: Update was interrupted in state: update-commit
2020-07-14 15:51:07 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2020-07-14 15:51:07 +0000 UTC info: State transition: init [none] -> rollback [ArtifactRollback]
2020-07-14 15:51:07 +0000 UTC debug: transitioning to error state
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC info: performing rollback
2020-07-14 15:51:07 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2020-07-14 15:51:07 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2020-07-14 15:51:07 +0000 UTC debug: will try to rollback reboot the device
2020-07-14 15:51:07 +0000 UTC info: State transition: rollback [ArtifactRollback] -> rollback-reboot [ArtifactRollbackReboot_Enter]
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC info: rebooting device(s) after rollback
2020-07-14 15:51:07 +0000 UTC info: Mender rebooting from inactive partition: /dev/mmcblk0p11
2020-07-14 15:51:06 +0000 UTC info: Running Mender version 2.1.2
2020-07-14 15:51:06 +0000 UTC info: Update was interrupted in state: rollback-reboot
2020-07-14 15:51:06 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2020-07-14 15:51:06 +0000 UTC info: State transition: init [none] -> verify-rollback-reboot [ArtifactRollbackReboot_Leave]
2020-07-14 15:51:06 +0000 UTC debug: transitioning to error state
2020-07-14 15:51:07 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2020-07-14 15:51:07 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2020-07-14 15:51:07 +0000 UTC info: State transition: verify-rollback-reboot [ArtifactRollbackReboot_Leave] -> after-rollback-reboot [ArtifactRollbackReboot_Leave]
2020-07-14 15:51:07 +0000 UTC debug: handling state after rollback reboot
2020-07-14 15:51:07 +0000 UTC info: State transition: after-rollback-reboot [ArtifactRollbackReboot_Leave] -> update-error [ArtifactFailure]
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC debug: handle update error state
2020-07-14 15:51:07 +0000 UTC info: State transition: update-error [ArtifactFailure] -> cleanup [Error]
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC debug: Handling Cleanup state
2020-07-14 15:51:07 +0000 UTC info: State transition: cleanup [Error] -> update-status-report [none]
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC debug: handle update status report state
2020-07-14 15:51:06 +0000 UTC info: Running Mender version 2.1.2
2020-07-14 15:51:06 +0000 UTC error: Update was interrupted in state: update-status-report
2020-07-14 15:51:06 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2020-07-14 15:51:06 +0000 UTC info: State transition: init [none] -> rollback [ArtifactRollback]
2020-07-14 15:51:06 +0000 UTC debug: transitioning to error state
2020-07-14 15:51:06 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:06 +0000 UTC info: performing rollback
2020-07-14 15:51:07 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2020-07-14 15:51:07 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2020-07-14 15:51:07 +0000 UTC debug: will try to rollback reboot the device
2020-07-14 15:51:07 +0000 UTC info: State transition: rollback [ArtifactRollback] -> rollback-reboot [ArtifactRollbackReboot_Enter]
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC info: rebooting device(s) after rollback
2020-07-14 15:51:07 +0000 UTC info: Mender rebooting from inactive partition: /dev/mmcblk0p11
2020-07-14 15:51:07 +0000 UTC info: Running Mender version 2.1.2
2020-07-14 15:51:07 +0000 UTC info: Update was interrupted in state: rollback-reboot
2020-07-14 15:51:07 +0000 UTC info: Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2020-07-14 15:51:07 +0000 UTC info: State transition: init [none] -> verify-rollback-reboot [ArtifactRollbackReboot_Leave]
2020-07-14 15:51:07 +0000 UTC debug: transitioning to error state
2020-07-14 15:51:07 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2020-07-14 15:51:07 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2020-07-14 15:51:07 +0000 UTC info: State transition: verify-rollback-reboot [ArtifactRollbackReboot_Leave] -> after-rollback-reboot [ArtifactRollbackReboot_Leave]
2020-07-14 15:51:07 +0000 UTC debug: handling state after rollback reboot
2020-07-14 15:51:07 +0000 UTC info: State transition: after-rollback-reboot [ArtifactRollbackReboot_Leave] -> update-error [ArtifactFailure]
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC debug: handle update error state
2020-07-14 15:51:07 +0000 UTC info: State transition: update-error [ArtifactFailure] -> cleanup [Error]
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC debug: Handling Cleanup state
2020-07-14 15:51:07 +0000 UTC info: State transition: cleanup [Error] -> update-status-report [none]
2020-07-14 15:51:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-07-14 15:51:07 +0000 UTC debug: handle update status report state
2020-07-14 15:51:07 +0000 UTC debug: status reported, response 204 No Content
2020-07-14 15:51:07 +0000 UTC debug: attempting to upload deployment logs for failed update

Hi,
what do you think about ‘systemd-time-wait-sync’?
https://manpages.debian.org/testing/systemd/systemd-time-wait-sync.service.8.en.html

Is it good to modify mender systemd service to start only after systemd-timesyncd.service and systemd-time-wait-sync? In this way the system date should be always up to date.

If modifying the mender service to be preempted by a timesync service:

Is there reason to be concerned about firmware-specific non-functional networking preventing the mender service from launching, and thus causing the device to not rollback and become “stuck” on a firmware with non-functional networking?

Exactly, I have the same concern.
But I want to wait @drewmoseley to get more info about this topic.

I do not think so. From Golang documentation:

time.Since(start), time.Until(deadline), and time.Now().Before(deadline), are similarly robust against wall clock resets.

time.Since() is what we use to measure the time for retry attempts.

I do not think so either. Great care has been taken to make the client independent of network services, and never depends on them, exactly so that it can roll back if they are broken.

However, notice the Running Mender version 2.1.2 message. This message is only printed when Mender starts up, indicating that a reboot has likely happened. And it looked like this happened, and killed Mender, while Mender was waiting. Are you able to see the terminal output from the device?

If you do use the mechanism I suggested above (ie not allowing Mender to start until the timesync competes) then you would indeed have a possibility of not rolling back since Mender would never start in the case that networking is dead. So yeah, that’s not a good solution.

The terminal output from the device would be the next best step here.

Drew

Ok, I removed the temporary workaroud to wait timesync.

Basically, do you want the log from a device to understand what it’s happening?