Rpi fails to reboot to update and execute state script

I’ve been working with Mender for a couple of weeks now, and I have been unable to perform a successful update. I’ve used mender-artifact to place a script (ArtifactReboot_Leave_00_waitForNetwork) in /etc/mender/scripts/ in the artifact that waits for a successful ping of our DNS server. However, this script never executes. The scripts I’ve added to the rootfs execute without issue. Any help to get beyond this issue would be appreciated.

Pretty sure this is relevant:

error: transient error: Reboot to new update failed. Expected “upgrade_available” flag to be true but it was false

But even before that, the log shows the state AfterReboot_Leave, but the script didn’t execute.

Here is an excerpt from the deployment log:

2019-12-04 23:33:12 +0000 UTC info: State transition: update-fetch [Download_Enter] → update-store [Download_Enter]
2019-12-04 23:33:12 +0000 UTC debug: handle update install state
2019-12-04 23:33:12 +0000 UTC debug: status reported, response 204 No Content
2019-12-04 23:33:12 +0000 UTC debug: Read data from device manifest file: device_type=raspberrypi3
2019-12-04 23:33:12 +0000 UTC debug: Current manifest data: raspberrypi3
2019-12-04 23:33:12 +0000 UTC info: no public key was provided for authenticating the artifact
2019-12-04 23:33:12 +0000 UTC debug: checking if device [raspberrypi3] is on compatible device list: [raspberrypi3]
2019-12-04 23:33:12 +0000 UTC debug: installer: successfully read artifact [name: gg2; version: 3; compatible devices: [raspberrypi3]]
2019-12-04 23:33:12 +0000 UTC debug: Trying to install update of size: 8984199168
2019-12-04 23:33:12 +0000 UTC debug: Active partition: /dev/mmcblk0p2
2019-12-04 23:33:12 +0000 UTC debug: Detected inactive partition /dev/mmcblk0p3, based on active partition /dev/mmcblk0p2
2019-12-04 23:33:12 +0000 UTC info: native sector size of block device /dev/mmcblk0p3 is 512, we will write in chunks of 1048576
2019-12-04 23:33:13 +0000 UTC info: opening device /dev/mmcblk0p3 for writing
2019-12-04 23:33:13 +0000 UTC info: partition /dev/mmcblk0p3 size: 8984199168
2019-12-05 00:06:37 +0000 UTC info: wrote 8984199168/8984199168 bytes of update to device /dev/mmcblk0p3
2019-12-05 00:06:39 +0000 UTC debug: status reported, response 204 No Content
2019-12-05 00:06:39 +0000 UTC info: State transition: update-store [Download_Enter] → update-after-store [Download_Leave]
2019-12-05 00:06:39 +0000 UTC info: State transition: update-after-store [Download_Leave] → update-install [ArtifactInstall]
2019-12-05 00:06:39 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-05 00:06:39 +0000 UTC debug: start executing script: Download_Leave_00_downloadLeave
2019-12-05 00:06:39 +0000 UTC debug: status reported, response 204 No Content
2019-12-05 00:06:40 +0000 UTC debug: status reported, response 204 No Content
2019-12-05 00:06:40 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-05 00:06:40 +0000 UTC debug: status reported, response 204 No Content
2019-12-05 00:06:40 +0000 UTC debug: Inactive partition: /dev/mmcblk0p3
2019-12-05 00:06:40 +0000 UTC debug: Marking inactive partition (/dev/mmcblk0p3) as the new boot candidate.
2019-12-05 00:06:40 +0000 UTC info: Enabling partition with new image installed to be a boot candidate: 3
2019-12-05 00:06:40 +0000 UTC debug: Have U-Boot variable: mender_check_saveenv_canary=1
2019-12-05 00:06:40 +0000 UTC debug: List of U-Boot variables:map[mender_check_saveenv_canary:1]
2019-12-05 00:06:40 +0000 UTC debug: Have U-Boot variable: mender_saveenv_canary=1
2019-12-05 00:06:40 +0000 UTC debug: List of U-Boot variables:map[mender_saveenv_canary:1]
2019-12-05 00:06:40 +0000 UTC debug: Marking inactive partition as a boot candidate successful.
2019-12-05 00:06:40 +0000 UTC info: State transition: update-install [ArtifactInstall] → reboot [ArtifactReboot_Enter]
2019-12-05 00:06:40 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-05 00:06:40 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-05 00:06:40 +0000 UTC debug: handling reboot state
2019-12-05 00:06:40 +0000 UTC debug: status reported, response 204 No Content
2019-12-05 00:06:40 +0000 UTC info: rebooting device(s)
2019-12-05 00:06:40 +0000 UTC info: Mender rebooting from active partition: /dev/mmcblk0p2
2019-12-05 00:06:55 +0000 UTC info: Running Mender version 2.1.1
2019-12-05 00:06:55 +0000 UTC info: State transition: init [none] → after-reboot [ArtifactReboot_Leave]
2019-12-05 00:06:55 +0000 UTC debug: Have U-Boot variable: mender_check_saveenv_canary=1
2019-12-05 00:06:55 +0000 UTC debug: List of U-Boot variables:map[mender_check_saveenv_canary:1]
2019-12-05 00:06:55 +0000 UTC debug: Have U-Boot variable: mender_saveenv_canary=1
2019-12-05 00:06:55 +0000 UTC debug: List of U-Boot variables:map[mender_saveenv_canary:1]
2019-12-05 00:06:55 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2019-12-05 00:06:55 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2019-12-05 00:06:55 +0000 UTC error: transient error: Reboot to new update failed. Expected “upgrade_available” flag to be true but it was false
2019-12-05 00:06:55 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] → rollback [ArtifactRollback]
2019-12-05 00:06:55 +0000 UTC debug: transitioning to error state
2019-12-05 00:06:55 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-05 00:06:55 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-05 00:06:55 +0000 UTC info: performing rollback
2019-12-05 00:06:55 +0000 UTC debug: Have U-Boot variable: mender_check_saveenv_canary=1
2019-12-05 00:06:55 +0000 UTC debug: List of U-Boot variables:map[mender_check_saveenv_canary:1]
2019-12-05 00:06:55 +0000 UTC debug: Have U-Boot variable: mender_saveenv_canary=1
2019-12-05 00:06:55 +0000 UTC debug: List of U-Boot variables:map[mender_saveenv_canary:1]
2019-12-05 00:06:55 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2019-12-05 00:06:55 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2019-12-05 00:06:55 +0000 UTC debug: will try to rollback reboot the device
2019-12-05 00:06:55 +0000 UTC info: State transition: rollback [ArtifactRollback] → rollback-reboot [ArtifactRollbackReboot_Enter]
2019-12-05 00:06:55 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-05 00:06:55 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-05 00:06:55 +0000 UTC info: rebooting device(s) after rollback
2019-12-05 00:06:55 +0000 UTC info: Mender rebooting from inactive partition: /dev/mmcblk0p2
2019-12-05 00:07:07 +0000 UTC info: Running Mender version 2.1.1
2019-12-05 00:07:07 +0000 UTC info: Update was interrupted in state: rollback-reboot
2019-12-05 00:07:07 +0000 UTC info: State transition: init [none] → verify-rollback-reboot [ArtifactRollbackReboot_Leave]
2019-12-05 00:07:07 +0000 UTC debug: transitioning to error state
2019-12-05 00:07:07 +0000 UTC debug: Have U-Boot variable: mender_check_saveenv_canary=1
2019-12-05 00:07:07 +0000 UTC debug: List of U-Boot variables:map[mender_check_saveenv_canary:1]
2019-12-05 00:07:07 +0000 UTC debug: Have U-Boot variable: mender_saveenv_canary=1
2019-12-05 00:07:07 +0000 UTC debug: List of U-Boot variables:map[mender_saveenv_canary:1]
2019-12-05 00:07:07 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2019-12-05 00:07:07 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2019-12-05 00:07:07 +0000 UTC info: State transition: verify-rollback-reboot [ArtifactRollbackReboot_Leave] → after-rollback-reboot [ArtifactRollbackReboot_Leave]
2019-12-05 00:07:07 +0000 UTC debug: handling state after rollback reboot
2019-12-05 00:07:07 +0000 UTC info: State transition: after-rollback-reboot [ArtifactRollbackReboot_Leave] → update-error [ArtifactFailure]
2019-12-05 00:07:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-05 00:07:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-05 00:07:07 +0000 UTC debug: handle update error state
2019-12-05 00:07:07 +0000 UTC info: State transition: update-error [ArtifactFailure] → cleanup [Error]
2019-12-05 00:07:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-05 00:07:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-05 00:07:07 +0000 UTC debug: Handling Cleanup state
2019-12-05 00:07:07 +0000 UTC info: State transition: cleanup [Error] → update-status-report [none]
2019-12-05 00:07:07 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-05 00:07:07 +0000 UTC debug: handle update status report state
2019-12-05 00:07:07 +0000 UTC error: failed to report status: Put https://ec2-***-***-***-***.compute-1.amazonaws.com/api/devices/v1/deployments/device/deployments/0cc289f8-9e09-41cd-b169-5cc0bd36b263/status: dial tcp: lookup ec2-***-***-***-***.compute-1.amazonaws.com on 10.9.8.8:53: dial udp 10.9.8.8:53: connect: network is unreachable
2019-12-05 00:07:07 +0000 UTC error: error reporting update status: reporting status failed: Put https://ec2-***-***-***-***.compute-1.amazonaws.com/api/devices/v1/deployments/device/deployments/0cc289f8-9e09-41cd-b169-5cc0bd36b263/status: dial tcp: lookup ec2-***-***-***-***.compute-1.amazonaws.com on 10.9.8.8:53: dial udp 10.9.8.8:53: connect: network is unreachable
2019-12-05 00:07:07 +0000 UTC error: failed to send status to server: transient error: reporting status failed: Put https://ec2-***-***-***-***.compute-1.amazonaws.com/api/devices/v1/deployments/device/deployments/0cc289f8-9e09-41cd-b169-5cc0bd36b263/status: dial tcp: lookup ec2-***-***-***-***.compute-1.amazonaws.com on 10.9.8.8:53: dial udp 10.9.8.8:53: connect: network is unreachable
2019-12-05 00:07:07 +0000 UTC info: State transition: update-status-report [none] → update-retry-report [none]
2019-12-05 00:10:18 +0000 UTC debug: wait complete
2019-12-05 00:10:18 +0000 UTC info: State transition: update-retry-report [none] → update-status-report [none]
2019-12-05 00:10:18 +0000 UTC debug: handle update status report state
2019-12-05 00:10:19 +0000 UTC debug: status reported, response 204 No Content
2019-12-05 00:10:19 +0000 UTC debug: attempting to upload deployment logs for failed update

Scripts with the Artifact* prefix should not go in /etc/mender/scripts/ and instead they should be embedded in to the Mender Artifact (.mender file).

This is done at time of creation by using the -s/--script option,

$ mender-artifact write rootfs-image --help
NAME:
   mender-artifact write rootfs-image - Writes Mender artifact containing rootfs image

USAGE:
   mender-artifact write rootfs-image [command options] <image path>

OPTIONS:
   --file FILE, -f FILE             Payload FILE.
   --device-type value, -t value    Type of device(s) supported by the Artifact. You can specify multiple compatible devices providing this parameter multiple times.
   --artifact-name value, -n value  Name of the artifact
   --output-path value, -o value    Full path to output artifact file.
   --version value, -v value        Version of the artifact. (default: 3)
   --key value, -k value            Full path to the private key that will be used to verify the artifact signature.
   --script value, -s value         Full path to the state script(s). You can specify multiple scripts providing this parameter multiple times.

There is some additional information about this in the docs,

https://docs.mender.io/2.2/artifacts/state-scripts#root-file-system-and-artifact-scripts

Thanks for the info. I’ve created a new artifact using mender-artifact, and embedded the desired scripts. Now, when I create a deployment with that artifact, the Next button remains disabled. Any suggestions?

Edit: To clarify, I’ve been using mender-convert to create .mender and .sdimg files. When I create an artifact using mender-convert, I can create and start a deployment without an issue. However, I end up with the failure defined in the log I posted. But when I create an artifact with mender-artifact, I can’t even successfully create the deployment on the server. That is where the Next button remains disabled.

Probably a mismatch in device_type information. Can you share the mender-artifact command you used?

This morning, when I was trying to recall the error I received, I tried again and did not experience the same issue. I think my new artifact was too large - I had forgotten to use PiShrink to shrink the source image for the new artifact. When I tried this morning, I received the message that it was too large for the partition. I shrunk the image, reran mender-artifact, and am now able to create and start a deployment with the new artifact. However, despite some of my Artifact state scripts executing, not all of them did, and I ended up with the same failure I reported originally. An excerpt:

2019-12-12 19:06:20 +0000 UTC debug: start executing script: ArtifactReboot_Enter_00_artifactRebootEnter
2019-12-12 19:06:20 +0000 UTC debug: status reported, response 204 No Content
2019-12-12 19:06:20 +0000 UTC debug: status reported, response 204 No Content
2019-12-12 19:06:20 +0000 UTC debug: handling reboot state
2019-12-12 19:06:20 +0000 UTC debug: status reported, response 204 No Content
2019-12-12 19:06:20 +0000 UTC info: rebooting device(s)
2019-12-12 19:06:20 +0000 UTC info: Mender rebooting from active partition: /dev/mmcblk0p2
2019-12-12 19:06:32 +0000 UTC info: Running Mender version 2.1.1
2019-12-12 19:06:32 +0000 UTC info: State transition: init [none] → after-reboot [ArtifactReboot_Leave]
2019-12-12 19:06:32 +0000 UTC debug: Have U-Boot variable: mender_check_saveenv_canary=1
2019-12-12 19:06:32 +0000 UTC debug: List of U-Boot variables:map[mender_check_saveenv_canary:1]
2019-12-12 19:06:32 +0000 UTC debug: Have U-Boot variable: mender_saveenv_canary=1
2019-12-12 19:06:32 +0000 UTC debug: List of U-Boot variables:map[mender_saveenv_canary:1]
2019-12-12 19:06:32 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2019-12-12 19:06:33 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2019-12-12 19:06:34 +0000 UTC error: transient error: Reboot to new update failed. Expected “upgrade_available” flag to be true but it was false
2019-12-12 19:06:34 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] → rollback [ArtifactRollback]
2019-12-12 19:06:34 +0000 UTC debug: transitioning to error state
2019-12-12 19:06:34 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-12 19:06:34 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1h0m0s seconds
2019-12-12 19:06:34 +0000 UTC info: performing rollback

It’s still reporting that the reboot failed. The ArtifactReboot_Enter script executes, but the ArtifactReboot_Leave script I have also included does not.

What are you using as input image? What exactly do your state-scripts do?

I started with Raspbian, one of the versions mentioned as working here. I loaded it on the Pi so I could change some config and turn on ssh. I used dd to create an image of the SD card; we’ll call this image START.

I loaded up Raspbian again, updated all packages, and added a file called UPDATED to the home directory. Then I used dd again to create an image of this updated instance of Raspbian; let’s call this image END. I used PiShrink to shrink both START and END.

Then I used START as input for mender-convert to create an sdimg, flashed it to an SD card, and loaded it up on the Pi. I used END as input for mender-artifact to create just the artifact with the embedded Artifact state scripts.

All the scripts except for ArtifactReboot_Leave_00 simply concatenate a string to a file in /data. These all execute. ArtifactReboot_Leave_00 loops while attempting to reach 10.9.8.8:53 with netcat (I can’t remember the exact command, and I’m not near my work laptop to check, but the command executes fine while testing manually in Raspbian). When the command returns success, it exits the loop, all while writing status updates to /data/wait.log, but it never starts execution.

Even before I started working with the state scripts, I was getting:

error reporting update status: reporting status failed: Put https://ec2-***-***-***-***.compute-1.amazonaws.com/api/devices/v1/deployments/device/deployments/0cc289f8-9e09-41cd-b169-5cc0bd36b263/status: dial tcp: lookup ec2-***-***-***-***.compute-1.amazonaws.com on 10.9.8.8:53: dial udp 10.9.8.8:53: connect: network is unreachable

The ArtifactReboot_Leave script was an attempt to address that, without realizing that things were failing and rolling back before that point.

2019-12-12 19:06:32 +0000 UTC info: State transition: init [none] → after-reboot [ArtifactReboot_Leave]
2019-12-12 19:06:32 +0000 UTC debug: Have U-Boot variable: mender_check_saveenv_canary=1
2019-12-12 19:06:32 +0000 UTC debug: List of U-Boot variables:map[mender_check_saveenv_canary:1]
2019-12-12 19:06:32 +0000 UTC debug: Have U-Boot variable: mender_saveenv_canary=1
2019-12-12 19:06:32 +0000 UTC debug: List of U-Boot variables:map[mender_saveenv_canary:1]
2019-12-12 19:06:32 +0000 UTC debug: Have U-Boot variable: upgrade_available=0
2019-12-12 19:06:33 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:0]
2019-12-12 19:06:34 +0000 UTC error: transient error: Reboot to new update failed. Expected “upgrade_available” flag to be true but it was false
2019-12-12 19:06:34 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] → rollback [ArtifactRollback]

This seems counter intuitive, since the documentation say Including state scripts in Artifacts and disk images and point to a github recipe. I am trying to get my ArtifactInstall_Leave_01 script to work using this example but looks like the state-scripts.bb is never included by mender.

Here is the example I am using:
https://github.com/mendersoftware/meta-mender/blob/master/meta-mender-demo/recipes-
mender/example-state-scripts/example-state-scripts_1.0.bb

So based on this comment it looks like the ArtifactInstall_Leave_01 should be used with the mender-artifact command, which is not documented here: https://docs.mender.io/2.2/artifacts/state-scripts#root-file-system-and-artifact-scripts

I will see if this method works. Thanks

Update: Looks like the script was installed at /var/lib/mender/scripts/ArtifactInstall_Leave_01 so no need to run the mender-artifact with -s it seems.