Mender rootfs group update issue

I am performing an ota rootfs update on the system. I have the following configuration. Six jetson tx2 mender clients are connected through a single network interface sharing the network. These mender clients are running on yocto based image.

I’m performing updates via hosted.mender.io UI. Three devices updates are successful and remaining three updates are failed. I had to create another deployment to update failed updates and I have all of my jetsons updated. This is the constant behavior for all updates I have tried.

I couldn’t make a group update successful at one deployment.

Hi @anishmonachan all device updates are completely independent with Mender. We do support retries at the deployment level so that if a deployment fails, it will automatically be retried. We have found that with larger device fleets, it is likely there will be spurious failures due to power issues and such, requiring retries to get the entire fleet updated. That said, updating 6 devices should not have these kinds of issues. I suspect there is some other reason that the devices are failing to update. Do you have client logs (journalctl -u mender-client) from the failed deployments?

Drew

There is not much information about the cause of failure with the mender logs. Today, all updates are failed while I reproduce the errors. Mender logs are below

– Logs begin at Fri 2020-10-16 07:39:36 UTC. –
– Logs begin at Fri 2020-10-16 07:39:36 UTC, end at Fri 2020-10-16 07:44:54 UTC. –
Oct 16 07:39:39 j140-tx2 systemd[1]: Started Mender OTA update service.
Oct 16 07:39:44 j140-tx2 mender[4850]: time=“2020-10-16T07:39:44Z” level=info msg=“Loaded configuration file: /var/lib
/mender/mender.conf” module=config
Oct 16 07:39:44 j140-tx2 mender[4850]: time=“2020-10-16T07:39:44Z” level=info msg=“Loaded configuration file: /etc/men
der/mender.conf” module=config
Oct 16 07:39:44 j140-tx2 mender[4850]: time=“2020-10-16T07:39:44Z” level=info msg=“Mender running on partition: /dev/m
mcblk0p1” module=cli
Oct 16 07:39:44 j140-tx2 mender[4850]: time=“2020-10-16T07:39:44Z” level=info msg=“State transition: init [none] → in
it [none]” module=mender
Oct 16 07:39:44 j140-tx2 mender[4850]: time=“2020-10-16T07:39:44Z” level=info msg=“State transition: init [none] → id
le [Idle]” module=mender
Oct 16 07:39:44 j140-tx2 mender[4850]: time=“2020-10-16T07:39:44Z” level=info msg=“authorization data present and vali
d” module=mender
Oct 16 07:39:44 j140-tx2 mender[4850]: time=“2020-10-16T07:39:44Z” level=info msg=“State transition: idle [Idle] → ch
eck-wait [Idle]” module=mender
Oct 16 07:39:44 j140-tx2 mender[4850]: time=“2020-10-16T07:39:44Z” level=info msg=“State transition: check-wait [Idle]
→ inventory-update [Sync]” module=mender
Oct 16 07:39:45 j140-tx2 mender[4850]: time=“2020-10-16T07:39:45Z” level=info msg=“State transition: inventory-update
[Sync] → check-wait [Idle]” module=mender
Oct 16 07:39:45 j140-tx2 mender[4850]: time=“2020-10-16T07:39:45Z” level=info msg=“State transition: check-wait [Idle]
→ update-check [Sync]” module=mender
Oct 16 07:39:45 j140-tx2 mender[4850]: time=“2020-10-16T07:39:45Z” level=info msg=“Correct request for getting image f
rom: https://s3.amazonaws.com” module=client_update

############## Logs from UI

2020-10-16 07:40:03 +0000 UTC info: Running Mender version 2.2.0b1
2020-10-16 07:40:03 +0000 UTC debug: handle update fetch state
2020-10-16 07:40:03 +0000 UTC debug: status reported, response 204 No Content
2020-10-16 07:40:04 +0000 UTC debug: Received fetch update response &{200 OK 200 HTTP/1.1 1 1 map[Accept-Ranges:[bytes] Content-Length:[806929408] Content-Type:[application/vnd.mender-artifact] Date:[Fri, 16 Oct 2020 07:40:04 GMT] Etag:[“fb0c2697919ac2684b5d6234d87575bd-77”] Expires:[Tue, 13 Oct 2020 13:51:06 GMT] Last-Modified:[Tue, 13 Oct 2020 13:41:07 GMT] Server:[AmazonS3] X-Amz-Id-2:[a4hOUccdvbluWmFqLdQ7gf9eU7vtTA/KsEaDhYRJX23YnTeGQkWq2TUoK5CxSzQ7/f0ru5OVkvg=] X-Amz-Request-Id:[A816970751C77AB3]] 0x40003606a0 806929408 false false map 0x400049a500 0x4000566d10}+
2020-10-16 07:40:04 +0000 UTC info: State transition: update-fetch [Download_Enter] → update-store [Download_Enter]
2020-10-16 07:40:04 +0000 UTC debug: handle update install state
2020-10-16 07:40:04 +0000 UTC debug: status reported, response 204 No Content
2020-10-16 07:40:04 +0000 UTC debug: Read data from device manifest file: device_type=j140-tx2
2020-10-16 07:40:04 +0000 UTC debug: Current manifest data: j140-tx2
2020-10-16 07:40:04 +0000 UTC info: no public key was provided for authenticating the artifact
2020-10-16 07:40:04 +0000 UTC info: Update Module path “/usr/share/mender/modules/v3” could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2020-10-16 07:40:04 +0000 UTC debug: checking if device [j140-tx2] is on compatible device list: [j140-tx2]
2020-10-16 07:40:04 +0000 UTC debug: installer: processing script: ArtifactReboot_Enter_50
2020-10-16 07:40:04 +0000 UTC debug: installer: successfully read artifact [name: v5.2.0; version: 3; compatible devices: [j140-tx2]]
2020-10-16 07:40:04 +0000 UTC debug: Active partition: /dev/mmcblk0p1
2020-10-16 07:40:04 +0000 UTC debug: Detected inactive partition /dev/mmcblk0p30, based on active partition /dev/mmcblk0p1
2020-10-16 07:40:04 +0000 UTC info: opening device /dev/mmcblk0p30 for writing
2020-10-16 07:40:04 +0000 UTC debug: Open block-device for installing update of size: 3078619136
2020-10-16 07:40:04 +0000 UTC debug: Device: /dev/mmcblk0p30 is a ubi device: false
2020-10-16 07:40:04 +0000 UTC info: native sector size of block device /dev/mmcblk0p30 is 512, we will write in chunks of 1048576
2020-10-16 07:40:04 +0000 UTC debug: Opening device: /dev/mmcblk0p30 for writing with flag: 2
2020-10-16 07:41:58 +0000 UTC info: Running Mender version 2.2.0b1
2020-10-16 07:41:58 +0000 UTC error: Update was interrupted in state: update-store
2020-10-16 07:41:58 +0000 UTC info: Update Module path “/usr/share/mender/modules/v3” could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available
2020-10-16 07:41:58 +0000 UTC info: State transition: init [none] → cleanup [Error]
2020-10-16 07:41:58 +0000 UTC debug: transitioning to error state
2020-10-16 07:41:58 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-10-16 07:41:58 +0000 UTC debug: Handling Cleanup state
2020-10-16 07:41:58 +0000 UTC info: State transition: cleanup [Error] → update-status-report [none]
2020-10-16 07:41:58 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2020-10-16 07:41:58 +0000 UTC debug: handle update status report state
2020-10-16 07:41:59 +0000 UTC debug: status reported, response 204 No Content
2020-10-16 07:41:59 +0000 UTC debug: attempting to upload deployment logs for failed update

Update

I think it is due to timesync issue. I was restarting all mender client to force mender service to check for update. So mender service was starting before timesync. The mender update had been successful with normal updatepollinginterval behavior.

Doing this during an update cycle will cause the Mender client to think that it got interrupted during the deployment (e.g same as a powerless/reset of device).

This is also the error you see:

2020-10-16 07:41:58 +0000 UTC error: Update was interrupted in state: update-store
1 Like

Thank you @mirzak

@mirzak just a follow-up question. From your reply I understood update fails at boot if we create deployment while the mender client is offline. So is that necessary to create deployment during the mender client is up? Mender has any solution to update at boot?