Handling random reboots

dhoomakethu · May 18, 2020, 4:55am

Description

Hi we are using mender update-module to update some of the binaries on our systems. Basically mender downloads the artifact and updates our agent running over DBus of the availability of the update and the update is handled by our custom update agent ,who puts the system in to recovery mode and updates the binary files and reboots. The update module has NeedsArtifactReboot to yes and would verify the content of the file to see if the update is succesfull.

In the normal positive flow all works good but some time when the device is rebooted after Download state, the ArtifactVerifyReboot state fails due to the verification happening . Is there a way to identify the reboots caused by the actual update against the random reboots the user does ?

kacf · May 18, 2020, 6:30am

Let me just see if I have understood the question correctly: Mender finds an update, the update module sends the info to the custom updater, then Mender proceeds to ArtifactReboot, where it reboots and enters recovery mode. In this mode, your custom update agent is working, and while it is working, the user pulls the plug and reboots the device?

If the ArtifactVerifyReboot section is detecting this as an invalid install, isn’t this more or less the same as detecting a random reboot? It should never happen if the update is installed correctly, so it must be because the device rebooted before it was supposed to.

dhoomakethu · May 18, 2020, 9:53am

ok, here’s the update module for reference. As could be seen in the ArtifactReboot state mender makes a blocking call to custom update agent. The custom update agent would show a notification on the UI and wait for user to accept the update and then proceeds with the update and reboot. Note, mender as such does nothing in this stage except making the blocking call to exe. When the system reboots , ArtifactVerifyReboot Is used to verify the update status by comparing the artifact_info file.

#!/system/bin/sh
set -e
STATE="$1"
FILES="$2"
DEST="/var/downloads"
SYS_ARTIFACT_FILE="/system/bin/artifact_info"
case "$STATE" in
    NeedsArtifactReboot)
        echo "Yes"
        ;;
    ArtifactReboot)
        echo "Files: $FILES, Dest: $DEST"
        cp "$FILES"/files/*.zip $DEST/ota.zip
        cp "$FILES"/files/artifact_info $DEST/artifact_info
        ARTIFACT=`basename "$FILES"/files/*.zip` #extract artifact name from file present in artifact
        VERSION=$(cat $DEST/artifact_info | cut -d "=" -f 2)
    
        echo "ARTIFACT: $ARTIFACT, VERSION: $VERSION"
        /system/bin/call-updater -artifactFile $DEST/$ARTIFACT -artifactVersion $VERSION
        ;;
        ArtifactVerifyReboot)
        cmp -s $DEST/artifact_info $SYS_ARTIFACT_FILE || exit 1
        ;;
esac

Now , if a reboot happens when mender is in ArtifactReboot state (not triggered by the custom agent), let’s say user just reboots the system ArtifactVerifyReboot state fails and the deployment is marked failed. What I would want is a way by which unless the user consents and the agent reboots ,all other reboots should be kind of ignored(?) and the mender should resume at the blocking call after reboot.

Here are the possible solutions I can think of

Use exit code 21 in ArtifactVerifyReboot if the comparison fails, this would enable the retry the stage but the notification to custom agent would still be missing(may be I should call the updater here as well ?)
Move the call to custom agent to Download stage and handle validation in later stages(?)

Any other idea’s or even a hint at doing this in a better would be much appreciated.

Sanjay

kacf · May 19, 2020, 6:52am

Now I understand the problem. Unfortunately I don’t think what you’re asking is possible. As can be seen in the state diagram for power loss, a spontaneous reboot always leads to an error state except the very final ArtifactCommit state. It is basically just the simplest handling of spontaneous reboots: Assume it means failure.

It’s something we would like to improve, and we have been discussing some solutions. However, it’s still early stage and I can’t give you any time estimate for when this would be implemented.

A possible solution for you is to use the server API to automatically schedule new deployments if an old one has failed due to a spontaneous reboot. The device will have to re-download the image though.

dhoomakethu · May 19, 2020, 7:12am

Thanks @kacf for the response. Is automatic re-deployment available as part of mender professional service ?

oleorhagen · May 19, 2020, 8:00am

Here is the feature matrix for all Mender plans.
TLDR; Yes

Topic		Replies	Views
Post-update reboot General Discussions	1	537	December 2, 2019
Trigger reboot after artifact install General Discussions	6	58	May 9, 2025
Letting devices reboot between update steps with pauses General Discussions	11	863	December 2, 2022
Device reapproval after update General Discussions	7	540	November 22, 2022
Reboot remotely Update Modules	13	2588	February 17, 2025

Handling random reboots

Description

Related topics