Problems booting from version 2 artifact following successful download to device

Hi @mirzak,

Been trying to follow through that mega thread rossbcan created and the subsequent integration page and have made great progress, integrating mender with the nanopineo2 with musl, which I have you both to thank for.
I am now stuck in the final phase - updating my device with an artifact from the server.

This being a first go with anything embedded can you correct my logic/advise?

It seems that the update writes to a new partition, boot candidate is set as the new partition but then the old partition is used for boot. I cant figure if this is because the artifact lacks config for reaching the server, if device goes offline for too long or if there is some problem with the uboot config.
“List of U-Boot variables:map[upgrade_available:1]”

Many thanks in advance

2019-08-23 15:39:05 +0000 UTC debug: handle update fetch state
2019-08-23 15:39:05 +0000 UTC debug: status reported, response 204 No Content
2019-08-23 15:39:05 +0000 UTC debug: Received fetch update response &{200 OK 200 HTTP/1.1 1 1 map[Accept-Ranges:[bytes] Content-Length:[15380992] Content-Type:[application/vnd.mender-artifact] Date:[Fri, 23 Aug 2019 15:39:06 GMT] Etag:["7d89ee0fddd2265417deceed7a01dba7"] Last-Modified:[Thu, 22 Aug 2019 12:13:24 GMT] Server:[AmazonS3] X-Amz-Id-2:[iDYJoZ0klV/sft8D+eVco3r+At5bIDnwGtYG0NxD26E0NSVxSDj2oZ0ZpqCD9CACQEkmzikntHg=] X-Amz-Request-Id:[2D9A659479878D1D]] 0x40003b00a0 15380992 [] false false map[] 0x40004a0400 0x40002ea000}+
2019-08-23 15:39:05 +0000 UTC info: State transition: update-fetch [Download] -> update-store [Download]
2019-08-23 15:39:05 +0000 UTC debug: handle update install state
2019-08-23 15:39:05 +0000 UTC debug: status reported, response 204 No Content
2019-08-23 15:39:05 +0000 UTC debug: Read data from device manifest file: device_type=mender-nanopi-neo2-openwrt
2019-08-23 15:39:05 +0000 UTC debug: Found needed line: device_type=mender-nanopi-neo2-openwrt
2019-08-23 15:39:05 +0000 UTC debug: Current manifest data: mender-nanopi-neo2-openwrt
2019-08-23 15:39:05 +0000 UTC info: no public key was provided for authenticating the artifact
2019-08-23 15:39:05 +0000 UTC debug: checking if device [mender-nanopi-neo2-openwrt] is on compatibile device list: [mender-nanopi-neo2-openwrt]
2019-08-23 15:39:05 +0000 UTC debug: installing update mender-nanopi-neo2-openwrt-ext4-rootfs.img of size 3827302400
2019-08-23 15:39:05 +0000 UTC debug: Trying to install update of size: 3827302400
2019-08-23 15:39:05 +0000 UTC debug: Active partition: /dev/mmcblk0p2
2019-08-23 15:39:05 +0000 UTC debug: Detected inactive partition /dev/mmcblk0p3, based on active partition /dev/mmcblk0p2
2019-08-23 15:39:05 +0000 UTC info: native sector size of block device /dev/mmcblk0p3 is 512, we will write in chunks of 1048576
2019-08-23 15:39:05 +0000 UTC info: opening device /dev/mmcblk0p3 for writing
2019-08-23 15:39:05 +0000 UTC info: partition /dev/mmcblk0p3 size: 3827302400
2019-08-23 15:44:52 +0000 UTC info: wrote 3827302400/3827302400 bytes of update to device /dev/mmcblk0p3
2019-08-23 15:44:53 +0000 UTC debug: installer: successfully read artifact [name: artifactv2; version: 2; compatible devices: [mender-nanopi-neo2-openwrt]]
2019-08-23 15:44:53 +0000 UTC debug: status reported, response 204 No Content
2019-08-23 15:44:53 +0000 UTC info: State transition: update-store [Download] -> update-install [ArtifactInstall]
2019-08-23 15:44:53 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1m0s seconds
2019-08-23 15:44:53 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1m0s seconds
2019-08-23 15:44:53 +0000 UTC debug: status reported, response 204 No Content
2019-08-23 15:44:53 +0000 UTC debug: Inactive partition: /dev/mmcblk0p3
2019-08-23 15:44:53 +0000 UTC debug: Marking inactive partition (/dev/mmcblk0p3) as the new boot candidate.
2019-08-23 15:44:53 +0000 UTC info: Enabling partition with new image installed to be a boot candidate: 3
2019-08-23 15:44:53 +0000 UTC debug: Have U-Boot variable: mender_check_saveenv_canary=1
2019-08-23 15:44:53 +0000 UTC debug: List of U-Boot variables:map[mender_check_saveenv_canary:1]
2019-08-23 15:44:53 +0000 UTC debug: Have U-Boot variable: mender_saveenv_canary=1
2019-08-23 15:44:53 +0000 UTC debug: List of U-Boot variables:map[mender_saveenv_canary:1]
2019-08-23 15:44:53 +0000 UTC debug: Marking inactive partition as a boot candidate successful.
2019-08-23 15:44:53 +0000 UTC info: State transition: update-install [ArtifactInstall] -> reboot [ArtifactReboot_Enter]
2019-08-23 15:44:53 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1m0s seconds
2019-08-23 15:44:53 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1m0s seconds
2019-08-23 15:44:53 +0000 UTC debug: handling reboot state
2019-08-23 15:44:53 +0000 UTC debug: status reported, response 204 No Content
2019-08-23 15:44:53 +0000 UTC info: rebooting device
2019-08-23 15:44:53 +0000 UTC info: Mender rebooting from active partition: /dev/mmcblk0p2
2019-08-22 08:27:44 +0000 UTC debug: handling state after reboot
2019-08-22 08:27:44 +0000 UTC info: State transition: after-reboot [ArtifactReboot_Leave] -> update-verify [ArtifactReboot_Leave]
2019-08-22 08:27:44 +0000 UTC debug: handle update verify state
2019-08-22 08:27:44 +0000 UTC debug: Have U-Boot variable: mender_check_saveenv_canary=1
2019-08-22 08:27:44 +0000 UTC debug: List of U-Boot variables:map[mender_check_saveenv_canary:1]
2019-08-22 08:27:44 +0000 UTC debug: Have U-Boot variable: mender_saveenv_canary=1
2019-08-22 08:27:44 +0000 UTC debug: List of U-Boot variables:map[mender_saveenv_canary:1]
2019-08-22 08:27:44 +0000 UTC debug: Have U-Boot variable: upgrade_available=1
2019-08-22 08:27:44 +0000 UTC debug: List of U-Boot variables:map[upgrade_available:1]
2019-08-22 08:27:44 +0000 UTC info: State transition: update-verify [ArtifactReboot_Leave] -> update-commit [ArtifactCommit]
2019-08-22 08:27:44 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1m0s seconds
2019-08-22 08:27:44 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1m0s seconds
2019-08-22 08:27:44 +0000 UTC debug: handle update commit state
2019-08-22 08:27:44 +0000 UTC debug: Read data from device manifest file: artifact_name=mender-nanopi-neo2-openwrt-20190822
2019-08-22 08:27:44 +0000 UTC debug: Found needed line: artifact_name=mender-nanopi-neo2-openwrt-20190822
2019-08-22 08:27:44 +0000 UTC debug: Current manifest data: mender-nanopi-neo2-openwrt-20190822
2019-08-22 08:27:44 +0000 UTC error: running with image mender-nanopi-neo2-openwrt-20190822, expected updated image artifactv2
2019-08-22 08:27:44 +0000 UTC info: State transition: update-commit [ArtifactCommit] -> rollback [ArtifactRollback]
2019-08-22 08:27:44 +0000 UTC debug: transitioning to error state
2019-08-22 08:27:44 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1m0s seconds
2019-08-22 08:27:44 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1m0s seconds
2019-08-22 08:27:44 +0000 UTC info: performing rollback
2019-08-22 08:27:44 +0000 UTC debug: will try to rollback reboot the device
2019-08-22 08:27:44 +0000 UTC info: State transition: rollback [ArtifactRollback] -> rollback-reboot [ArtifactRollbackReboot_Enter]
2019-08-22 08:27:44 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1m0s seconds
2019-08-22 08:27:44 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1m0s seconds
2019-08-22 08:27:44 +0000 UTC info: rebooting device after rollback
2019-08-22 08:27:44 +0000 UTC info: Mender rebooting from active partition: /dev/mmcblk0p3
2019-08-23 15:38:17 +0000 UTC debug: handling state after rollback reboot
2019-08-23 15:38:17 +0000 UTC info: State transition: after-rollback-reboot [ArtifactRollbackReboot_Leave] -> update-error [ArtifactFailure]
2019-08-23 15:38:17 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1m0s seconds
2019-08-23 15:38:17 +0000 UTC debug: statescript: timeout for executing scripts is not defined; using default of 1m0s seconds
2019-08-23 15:38:17 +0000 UTC debug: handle update error state
2019-08-23 15:38:17 +0000 UTC info: State transition: update-error [ArtifactFailure] -> update-status-report [none]
2019-08-23 15:38:17 +0000 UTC debug: handle update status report state
2019-08-23 15:38:18 +0000 UTC debug: status reported, response 204 No Content
2019-08-23 15:38:18 +0000 UTC debug: attempting to upload deployment logs for failed update

2019-08-22 08:27:44 +0000 UTC error: running with image mender-nanopi-neo2-openwrt-20190822, expected updated image artifactv2

This seems to be the problem, the information in the artifact does not match the information in rootfs (/etc/mender/artifact_info) and a rollback is performed

Hey @mirzak, thanks for the speedy response.

So maybe I misunderstand you as I agree that this is the issue but am not sure how to resolve it. Please can you explain a little more?

From my understanding the inactive partition (/dev/mmcblk0p3) is loading with the expected updated image artifactv2 but despite the log Enabling partition with new image installed to be a boot candidate and Marking inactive partition (/dev/mmcblk0p3) as the new boot candidate it is remaining inactive, not being used on reboot.
Instead (/dev/mmcblk0p2) the active partition is never changed on reboot.
I cant understand why the boot partition is not changing.

Can you share how you created the Mender Artifact?

Based on the logs it does boot /dev/mmcblk0p3, and fails on a sanity check (the thing i mentioned earlier) and then goes back to /dev/mmcblk0p2.

Sure.

from rossbcan’s mender_flashdisk script:

mender-artifact write rootfs-image -n $BASE_NAME-$DATE -t $BASE_NAME -f $ROOTFS_IMAGE -o $BASE_NAME-ext4-rootfs-$DATE.mender
ie
mender-artifact write rootfs-image -v 2 -n mender-nanopi-neo2-openwrt-20190822 -t mender-nanopi-neo2-openwrt -f mender-nanopi-neo2-openwrt-ext4-rootfs.img -o mender-nanopi-neo2-openwrt-ext4-rootfs-20190822_v2.mender

only difference being that I use -v 2 to mark it as a version 2 artifact for compatibility with the mender client. (I verify the version change with mender-artifact)

Can you share the output of:

mender-artifact cat mender-nanopi-neo2-openwrt-ext4-rootfs-20190822_v2.mender:/etc/mender/artifact_info

Hey @mirzak,

So I no longer have that artifact to hand but I did try and recreate the situation.

Strangely - the process has begun to work. I’ve only tried with unsigned artifacts currently but ive managed to update the artifact to another on a mender enabled device twice now - firstly with a significantly different payload, then just with the same payload with a modified artifact name.

So this is great news but leaves us in the dark as to why errors were occurring before.

Something that has happened on each occasion before a successful update is a failure caused by invalid entrypoint into the state machine: state: update-store. do you know why this false start may be occurring?

I’ll keep on in the meantime, my current deployment process certainly isn’t airtight yet.

Thanks again

Not sure about this without the complete deployment log.

I would recommend you to run trough the integration checklist (you might need to tweak it a bit as it is for Yocto, but most steps should apply),

https://docs.mender.io/2.0/devices/yocto-project/bootloader-support/u-boot/integration-checklist

This should catch any integration problems, and isolates where there might problems.