Letting devices reboot between update steps with pauses

Hello,

We have devices out there with doing updates with mender. Doing a full image update works in general. Lately we tried to use pauses to let devices download updates already but let us trigger the installation later.

The devices will not be always on. The customer has control over when a device is turned on and goes online. Our idea was to download updates when the device is on in the background. Then later unpause the deployment when a device is online.

What’s happening though is that a device will download an update and pause. After rebooting the device the paused deployment will fail with this log:

2022-11-17 14:48:31 +0000 UTC info: Running Mender client version: 3.3.0
2022-11-17 14:48:32 +0000 UTC info: State transition: update-fetch [Download_Enter] -> update-store [Download_Enter]
2022-11-17 14:48:32 +0000 UTC info: No public key was provided for authenticating the artifact
2022-11-17 14:51:24 +0000 UTC info: State transition: update-store [Download_Enter] -> update-after-store [Download_Leave]
2022-11-17 14:51:24 +0000 UTC info: State transition: update-after-store [Download_Leave] -> mender-update-control-refresh-maps [none]
2022-11-17 14:51:24 +0000 UTC info: State transition: mender-update-control-refresh-maps [none] -> mender-update-control [none]
2022-11-17 14:51:24 +0000 UTC info: Update Control: Pausing before entering update-install state
2022-11-17 14:51:24 +0000 UTC info: State transition: mender-update-control [none] -> mender-update-control-pause [none]
2022-11-17 14:51:24 +0000 UTC info: Next update refresh from the server in: 29m59.745082468s
2022-11-17 14:51:24 +0000 UTC info: Forced wake-up from sleep
2022-11-17 14:51:24 +0000 UTC info: State transition: mender-update-control-pause [none] -> mender-update-control [none]
2022-11-17 14:51:24 +0000 UTC info: Update Control: Pausing before entering update-install state
2022-11-17 14:51:24 +0000 UTC info: State transition: mender-update-control [none] -> mender-update-control-pause [none]
2022-11-17 14:51:24 +0000 UTC info: Next update refresh from the server in: 29m59.73520869s
2022-11-17 14:54:42 +0000 UTC error: error forwarding from client to backend: websocket: close 1006 (abnormal closure): unexpected EOF
2022-11-17 14:54:42 +0000 UTC warning: error while sending close message: write tcp 127.0.0.1:45711->127.0.0.1:43968: use of closed network connection
2022-11-17 14:54:42 +0000 UTC info: Daemon terminated with SIGTERM
2022-11-17 14:57:19 +0000 UTC info: Running Mender client version: 3.3.0
2022-11-17 14:57:19 +0000 UTC error: Mender shut down in state: update-after-store
2022-11-17 14:57:19 +0000 UTC info: State transition: init [none] -> cleanup [Error]
2022-11-17 14:57:19 +0000 UTC info: State transition: cleanup [Error] -> update-status-report [none]
2022-11-17 14:57:19 +0000 UTC info: Device unauthorized; attempting reauthorization
2022-11-17 14:57:20 +0000 UTC info: successfully received new authorization data from server https://hosted.mender.io
2022-11-17 14:57:20 +0000 UTC info: Local proxy started
2022-11-17 14:57:20 +0000 UTC info: Reauthorization successful

Is seems to me that any error encountered by a device while updating will cause it to fail the update and rebooting in between is considered an error. Is what we are trying to do possible with mender somehow?

Thanks!

Best Regards

Markus

Hi @diekleinekuh,

There is a difference between a coordinated reboot (as initiated by Mender) and a spontaneous reboot (waiting until user reboots). I know that this is being looked into at the moment, I’ll see if I can find information and get back to you.

Greetz,
Josef

Hi @diekleinekuh,

Finally spotted the problem. Support for spontaneous reboots was added in 3.4/3.3.1, while your client is 3.3.0. So a small version bump should actually fix that nicely!

Please note though that this only applies to the pause-before-reboot stage if an artifact has already be installed. Reboots during all other pauses will exhibit the described error.

Greetz,
Josef

Hi Josef,

Thanks for your quick reply. I’ll update the client and give it a try.

So if I put a pause before reboot and set a lot of retries for the update it will start over from the beginning if the device is switched off until it finally reaches the pause. Then I can keep it there and the device can be switched on/off many times until I finally unpause. The controlled reboot then will cause a transition into the reboot and commit states.

Is that correct?

Also if the mender client is restarted after download happened but before reaching the reboot state, will it keep the downloaded artifact or start over from the beginning?

What I’m trying to archive is limit the time a device is blocked for an ota update while still making sure it will eventually succeed. Download and installation might take a while and if it would start over from the beginning this lowers the probably of success.

Many thanks!

Markus

Also: I’m using the stable apt repository for Ubuntu Bionic. The latest version there is still 3.3.0. I cannot see any 3.3.1. Do I have to install from source?!

Hi @diekleinekuh,

Actually 3.4 should be in the repos. Can you please provide

apt update && apt list --upgradable | grep mender-client

so we can dig a little deeper?

Thanks,
Josef

sudo apt update && apt list --upgradable | grep mender-client
Hit:1 http://ports.ubuntu.com/ubuntu-ports bionic InRelease
Hit:2 http://ports.ubuntu.com/ubuntu-ports bionic-updates InRelease
Hit:3 http://ports.ubuntu.com/ubuntu-ports bionic-backports InRelease

Hit:8 https://downloads.mender.io/repos/debian ubuntu/bionic/stable InRelease

Reading package lists… Done
Building dependency tree
Reading state information… Done
199 packages can be upgraded. Run ‘apt list --upgradable’ to see them.

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Also:

apt-cache policy mender-client
mender-client:
Installed: 3.3.0-1+ubuntu+bionic
Candidate: 3.3.0-1+ubuntu+bionic
Version table:
*** 3.3.0-1+ubuntu+bionic 500
500 https://downloads.mender.io/repos/debian ubuntu/bionic/stable/main arm64 Packages
100 /var/lib/dpkg/status

I omitted some lines related to internal repositories.

When using the experimental repository I get 3.4. But I assumed 3.3.1 to be a bugfixing release that should be available.

Hi @diekleinekuh,

You are right! There should have been an error publishing mender-client 3.4.0 package in Ubuntu Bionic for ARM64 and the latest there is 3.3.0.

My preliminary research shows that the other Ubuntu distributions (and the other architectures for Bionic) are correctly updated, so something went wrong that affected only this one.

Please give me some time to investigate and fix

Hello @diekleinekuh,

It should be fine now, could you please try again on your end?

The bug was actually pretty simple to identify, but I took my time doing some improvements to our automation scripts and writing tests. I hope to be able to catch these kind of issues earlier next time :slight_smile:

Hi @lluiscampos,

I can see mender 3.4.0-1 now in the repository. Thanks for fixing that :slight_smile:

EDIT by @TheYoctoJester: put remark about 3.3.1 into next comment

A post was split to a new topic: Previous client versions missing from apt repositories

To conclude this thread here:

I upgraded to mender-client 3.4. I cannot test pauses anymore since I upgraded from the free plan to basic which doesn’t have this feature anymore :frowning:

I was putting a state script for ArtifactReboot_Enter that blocked the reboot indefinitely. I can manually reboot the system and this still counts as a successful deployment. :+1: